Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes Airflow to Aqueduct syncing bug #1347

Merged
merged 5 commits into from
May 22, 2023

Conversation

saurav-c
Copy link
Contributor

@saurav-c saurav-c commented May 22, 2023

Describe your changes and why you are making these changes

This PR fixes a bug with syncing DAG results from Airflow to Aqueduct. The bug was a race condition introduced when multiple requests were trying to fetch the latest DAG results from Airflow and check if these had been previously populated into Aqueduct. The fix is to wrap this logic inside of a DB transaction, such that conflicting writes do not occur.

It also:

  • Adds error checking to make sure that an Airflow DAG is unpaused before triggering a new run. Previously this would have failed silently. There is a follow up task already filed to make this more robust.
  • Updates the Airflow DAG template to reflect changes to the LoadOperatorSpec. There is a task filed to make these changes backwards compatible with existing Airflow clusters going forward.
  • Updates the Airflow DAG template to automatically set the DAG to be unpaused

Tests

  • Successfully passed a basic integration test using Airflow as the compute layer
======================================= PASSES ========================================
____________________ test_basic_flow[test_snowflake-test_airflow] _____________________
[gw3] darwin -- Python 3.10.10 /Library/Frameworks/Python.framework/Versions/3.10/bin/python3.10
-------------------------------- Captured stdout call ---------------------------------
The Airflow DAG file has been downloaded to: test_f8684ebc8b62482dadece3fc6c245615_airflow.py.
                    Please copy it to your Airflow server to begin execution.
Url:  http://localhost:8080/workflow/efaaa1dc-e3ac-43d7-9d8c-522e72b4566c
Workflow registration succeeded. Workflow ID efaaa1dc-e3ac-43d7-9d8c-522e72b4566c. Name: test_f8684ebc8b62482dadece3fc6c245615
Sleeping for 30s to wait for Airflow scheduler to pick up new DAG file
Workflow efaaa1dc-e3ac-43d7-9d8c-522e72b4566c was created and ran successfully at least 1 times!
============================ 1 passed in 519.96s (0:08:39) ============================

Related issue number (if any)

ENG 2599

Loom demo (if any)

Checklist before requesting a review

  • I have created a descriptive PR title. The PR title should complete the sentence "This PR...".
  • I have performed a self-review of my code.
  • I have included a small demo of the changes. For the UI, this would be a screenshot or a Loom video.
  • If this is a new feature, I have added unit tests and integration tests.
  • I have run the integration tests locally and they are passing.
  • I have run the linter script locally (See python3 scripts/run_linters.py -h for usage).
  • All features on the UI continue to work correctly.
  • Added one of the following CI labels:
    • run_integration_test: Runs integration tests
    • skip_integration_test: Skips integration tests (Should be used when changes are ONLY documentation/UI)

@saurav-c saurav-c added the run_integration_test Triggers integration tests label May 22, 2023
@saurav-c saurav-c marked this pull request as ready for review May 22, 2023 06:17
@saurav-c saurav-c requested review from kenxu95 and likawind May 22, 2023 06:17
@saurav-c saurav-c merged commit d14bbdc into main May 22, 2023
@saurav-c saurav-c deleted the eng-2599-fix-airflow-to-aqueduct-syncing branch May 22, 2023 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run_integration_test Triggers integration tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants