-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Comparison Operators to Date Columns #230
Add Comparison Operators to Date Columns #230
Conversation
Codecov Report
@@ Coverage Diff @@
## main #230 +/- ##
==========================================
+ Coverage 96.37% 96.39% +0.02%
==========================================
Files 53 53
Lines 827 831 +4
Branches 12 10 -2
==========================================
+ Hits 797 801 +4
Misses 30 30
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job, only a couple of things to make it spark more ✨
df.show() | ||
|
||
test[Date]("date") | ||
test[LocalDate]("localDate") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing that the test is failing for spark 2.4, split this test in two, one for Date
and Timestamp
as you already have, and the ones for LocalDate
and Instant
have to go on the spark_3.0_3.1_3.2
folder. We have a few examples doing it, but any doubt let us know! Also, if you want to try against 2.4 on your local computer, launching sbt -DsparkVersion=2.4
or if you use IntelliJ, in sbt settings put the spark version you want to test against:
We will have to improve our documentation for new contributors 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😱 I hadn't even thought I might need to run the tests differently for the different spark versions. Thanks for the heads up!
I believe I've done what you said, I just have one question on your preferences: I implemented the trait DateOperationsSpec
which contains the testing logic for the comparison operators. We could reimplement this in the spark_3.0_3.1_3.2
folder, however it would look identical. This has me thinking that maybe I could extract this trait out to something along the lines of DateOperationsFixture
and create a new folder called fixtures
. We could then use DateOperationsFixture
in both the 2.4.x tests and the 3.x tests. Or of course, we could just leave the trait DateOperationsSpec
and import it in DateColumns3xSpec
. Do you have any thoughts or preferences on this? 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe implement the common logic in a trait, and extend it in the base scala
folder(it runs for all the versions) for Date and Timestamp, and for the 3.x versions extend again the trait and only call for LocalDate and instant (yeah is a mess, but is the simplest way we found).
As you can imagine, the scala version in src and these execute in all versions and it is reachable from the rest of the specialized folders. And then we have the specialized folders that contain the name with what versions are going to be executed. Sbt checks the version and gets the folders it needs to build it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've hopefully addressed this in 551fdfb. Please let me know if you have any further comments or feedback! 😄
Thanks again for the million times. If you are interested in any other collaboration we are open to all the help, ideas and suggestions. |
Of course! This has been a lot of fun so I'd definitely be interested in helping/contributing in any way I can! |
Description
I've recently been trying to migrate one of our Spark jobs from standard Spark to Doric. However, in doing this migration I noticed that comparison operators on dates currently aren't present in Doric when they are in Spark. This PR adds the comparison operators
<
,>
,<=
,>=
to the DoricDateColumn
.I have tried to keep things in the same style and as similar in implementation to the
NumericOperations
class however if any parts don't meet a required format/style please let me know and I will happily changeRelated Issue
#229
Motivation and Context
Currently it is not possible to compare dates. For example, should you want to filter a DataFrame for all dates before x date that cannot currently be done using the doric
DateColumn
How Has This Been Tested?