-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get sdk integration tests working with Spark #1190
Get sdk integration tests working with Spark #1190
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple things I want to check on!
@@ -27,7 +27,7 @@ func NewSparkJobManager(conf *SparkJobManagerConfig) (*SparkJobManager, error) { | |||
|
|||
session, err := livyClient.CreateSession(&spark.CreateSessionRequest{ | |||
Kind: "pyspark", | |||
HeartbeatTimeoutInSecond: 600, | |||
HeartbeatTimeoutInSecond: 10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this was a necessary change to get it working with the test suite? This seems like quite a large numeric change - what are the ramifications of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I originally had the Heartbeat timeout set to 10 min for debugging purposes. It left the livy-created spark session alive so I could look at the logs. Realized we can actually check the logs of completed Spark sessions/applications via the Spark UI, so there isn't a need to keep these sessions alive. This also improves memory usage, since each SparkSession is allocated certain amount of disk space on the driver/worker nodes it uses.
Describe your changes and why you are making these changes
Modifies tests to work with Spark. Most of the changes involve adding a case in the test functions to use Pyspark Dataframe specific code. I also add the
pytest.mark.skip_for_spark_engines()
fixture to skip tests that don't work with Spark.Related issue number (if any)
Loom demo (if any)
Checklist before requesting a review
python3 scripts/run_linters.py -h
for usage).run_integration_test
: Runs integration testsskip_integration_test
: Skips integration tests (Should be used when changes are ONLY documentation/UI)