Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get sdk integration tests working with Spark #1190

Merged
merged 11 commits into from
Apr 11, 2023

Conversation

hsubbaraj-spiral
Copy link
Contributor

Describe your changes and why you are making these changes

Modifies tests to work with Spark. Most of the changes involve adding a case in the test functions to use Pyspark Dataframe specific code. I also add the pytest.mark.skip_for_spark_engines() fixture to skip tests that don't work with Spark.

Related issue number (if any)

Loom demo (if any)

Checklist before requesting a review

  • I have created a descriptive PR title. The PR title should complete the sentence "This PR...".
  • I have performed a self-review of my code.
  • I have included a small demo of the changes. For the UI, this would be a screenshot or a Loom video.
  • If this is a new feature, I have added unit tests and integration tests.
  • I have run the integration tests locally and they are passing.
  • I have run the linter script locally (See python3 scripts/run_linters.py -h for usage).
  • All features on the UI continue to work correctly.
  • Added one of the following CI labels:
    • run_integration_test: Runs integration tests
    • skip_integration_test: Skips integration tests (Should be used when changes are ONLY documentation/UI)

@hsubbaraj-spiral hsubbaraj-spiral added the run_integration_test Triggers integration tests label Apr 10, 2023
Copy link
Contributor

@kenxu95 kenxu95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple things I want to check on!

integration_tests/sdk/aqueduct_tests/checks_test.py Outdated Show resolved Hide resolved
integration_tests/sdk/aqueduct_tests/flow_test.py Outdated Show resolved Hide resolved
integration_tests/sdk/conftest.py Show resolved Hide resolved
@@ -27,7 +27,7 @@ func NewSparkJobManager(conf *SparkJobManagerConfig) (*SparkJobManager, error) {

session, err := livyClient.CreateSession(&spark.CreateSessionRequest{
Kind: "pyspark",
HeartbeatTimeoutInSecond: 600,
HeartbeatTimeoutInSecond: 10,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this was a necessary change to get it working with the test suite? This seems like quite a large numeric change - what are the ramifications of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I originally had the Heartbeat timeout set to 10 min for debugging purposes. It left the livy-created spark session alive so I could look at the logs. Realized we can actually check the logs of completed Spark sessions/applications via the Spark UI, so there isn't a need to keep these sessions alive. This also improves memory usage, since each SparkSession is allocated certain amount of disk space on the driver/worker nodes it uses.

@hsubbaraj-spiral hsubbaraj-spiral merged commit a3e2206 into main Apr 11, 2023
@vsreekanti vsreekanti deleted the eng-2614-get-sdk-integration-tests-working-with branch April 18, 2023 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run_integration_test Triggers integration tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants