Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest/redash): Limit size of RedashSourceReport #9873

Merged
merged 1 commit into from
Feb 21, 2024

Conversation

atjones0011
Copy link
Contributor

@atjones0011 atjones0011 commented Feb 16, 2024

Addressing the problem described in #9575

For large Redash deployments, the filtered and timing fields can grow to be very large. By using the LossyList and LossyDict data structures in a similar way as used in other SourceReports, we will limit the number of lines printed during an ingestion run which makes the logs more usable and improves performance for ingestion against Redash deployments with a large number of queries filtered.

Closes #9575

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

Addressing the problem described in datahub-project#9575

For large Redash deployments, the `filtered` and `timing` fields can grow to be very
large. By using the LossyList and LossyDict data structures in a similar way as used
in other SourceReports, we will limit the number of lines printed during an ingestion
run which makes the logs more usable and improves performance for ingestion against
Redash deployments with a large number of queries filtered.
@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Feb 16, 2024
@atjones0011
Copy link
Contributor Author

The precedent for this change has been set by ingestion jobs such as Looker and SQL

@hsheth2 hsheth2 added the merge-pending-ci A PR that has passed review and should be merged once CI is green. label Feb 16, 2024
@hsheth2 hsheth2 changed the title fix(ingestion/redash): Limit size of RedashSourceReport fix(ingest/redash): Limit size of RedashSourceReport Feb 21, 2024
@hsheth2 hsheth2 merged commit 66871cb into datahub-project:master Feb 21, 2024
54 of 55 checks passed
dushayntAW pushed a commit to dushayntAW/datahub that referenced this pull request Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata merge-pending-ci A PR that has passed review and should be merged once CI is green.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Redash ingestion logs overwhelming on large Redash deployments
2 participants