comparable and orderable CalendarInterval

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

comparable and orderable CalendarInterval

Enrico Minack
Hi Devs,

I would like to know what is the current roadmap of making
CalendarInterval comparable and orderable again (SPARK-29679,
SPARK-29385, #26337).

With #27262, this got reverted but SPARK-30551 does not mention how to
go forward in this matter. I have found SPARK-28494, but this seems to
be stale.

While I find it useful to compare such intervals, I cannot find a way to
work around the missing comparability. Is there a way to get, e.g. the
seconds that an interval represents to be able to compare intervals? In
org.apache.spark.sql.catalyst.util.IntervalUtils there are methods like
getEpoch or getDuration, which I cannot see are exposed to SQL or in the
org.apache.spark.sql.functions package.

Thanks for the insights,
Enrico


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: comparable and orderable CalendarInterval

cloud0fan
What's your use case to compare intervals? It's tricky in Spark as there is only one interval type and you can't really compare one month with 30 days.

On Wed, Feb 12, 2020 at 12:01 AM Enrico Minack <[hidden email]> wrote:
Hi Devs,

I would like to know what is the current roadmap of making
CalendarInterval comparable and orderable again (SPARK-29679,
SPARK-29385, #26337).

With #27262, this got reverted but SPARK-30551 does not mention how to
go forward in this matter. I have found SPARK-28494, but this seems to
be stale.

While I find it useful to compare such intervals, I cannot find a way to
work around the missing comparability. Is there a way to get, e.g. the
seconds that an interval represents to be able to compare intervals? In
org.apache.spark.sql.catalyst.util.IntervalUtils there are methods like
getEpoch or getDuration, which I cannot see are exposed to SQL or in the
org.apache.spark.sql.functions package.

Thanks for the insights,
Enrico


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: comparable and orderable CalendarInterval

Joseph Torres
In reply to this post by Enrico Minack
The problem is that there isn't a consistent number of seconds an interval represents - as Wenchen mentioned, a month interval isn't a fixed number of days. If your use case can account for that, maybe you could add the interval to a fixed reference date and then compare the result.

On Tue, Feb 11, 2020 at 8:01 AM Enrico Minack <[hidden email]> wrote:
Hi Devs,

I would like to know what is the current roadmap of making
CalendarInterval comparable and orderable again (SPARK-29679,
SPARK-29385, #26337).

With #27262, this got reverted but SPARK-30551 does not mention how to
go forward in this matter. I have found SPARK-28494, but this seems to
be stale.

While I find it useful to compare such intervals, I cannot find a way to
work around the missing comparability. Is there a way to get, e.g. the
seconds that an interval represents to be able to compare intervals? In
org.apache.spark.sql.catalyst.util.IntervalUtils there are methods like
getEpoch or getDuration, which I cannot see are exposed to SQL or in the
org.apache.spark.sql.functions package.

Thanks for the insights,
Enrico


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: comparable and orderable CalendarInterval

Enrico Minack
In reply to this post by cloud0fan
I compute the difference of two timestamps and compare them with a constant interval:

Seq(("2019-01-02 12:00:00", "2019-01-02 13:30:00"))
  .toDF("start", "end")
  .select($"start".cast(TimestampType), $"end".cast(TimestampType))
  .select($"start", $"end", ($"end" - $"start").as("diff"))
  .where($"diff" < lit("INTERVAL 2 HOUR").cast(CalendarIntervalType))
  .show

Coming from timestamps, the interval should have correct hours (millisecond component), so comparing it with the "right kinds of intervals" should always be correct.

Enrico


Am 11.02.20 um 17:06 schrieb Wenchen Fan:
What's your use case to compare intervals? It's tricky in Spark as there is only one interval type and you can't really compare one month with 30 days.

On Wed, Feb 12, 2020 at 12:01 AM Enrico Minack <[hidden email]> wrote:
Hi Devs,

I would like to know what is the current roadmap of making
CalendarInterval comparable and orderable again (SPARK-29679,
SPARK-29385, #26337).

With #27262, this got reverted but SPARK-30551 does not mention how to
go forward in this matter. I have found SPARK-28494, but this seems to
be stale.

While I find it useful to compare such intervals, I cannot find a way to
work around the missing comparability. Is there a way to get, e.g. the
seconds that an interval represents to be able to compare intervals? In
org.apache.spark.sql.catalyst.util.IntervalUtils there are methods like
getEpoch or getDuration, which I cannot see are exposed to SQL or in the
org.apache.spark.sql.functions package.

Thanks for the insights,
Enrico


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]