-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(dbt-sync): Migrate legacy endpoints to fix virtual dataset creation #232
Conversation
if "sql" not in kwargs: | ||
return self.create_resource("dataset", **kwargs) | ||
|
||
# Check if the dataset creation supports sql directly | ||
not_legacy = self.validate_key_in_resource_schema( | ||
"dataset", | ||
"sql", | ||
keys=["add_columns"], | ||
) | ||
not_legacy = not_legacy["add_columns"] | ||
if not_legacy: | ||
return self.create_resource("dataset", **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about doing the validation first, and then change line 567 to if "sql" not in kwargs or not_legacy
so that we would only have a single return
, but then I realized most commonly the integration syncs physical datasets, and this change would always execute validate_key_in_resource_schema
which is another API request. With that in mind, I decided to have a second block for this, so that we only call this other endpoint if needed (trying to not increase the dataset sync duration).
@@ -84,7 +84,7 @@ def dbt_core( # pylint: disable=too-many-arguments, too-many-locals | |||
profiles: Optional[str] = None, | |||
exposures: Optional[str] = None, | |||
import_db: bool = False, | |||
disallow_edits: bool = True, | |||
disallow_edits: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed this was set to True
here even tough the default state is False
, so I updated only for consistency but it doesn't really change the result. It should be totally fine to revert it to True
if we want to.
if "MANAGER_URL" not in ctx.obj and disallow_edits: | ||
warnings.warn( | ||
( | ||
"The managed externally feature was only introduced in Superset v1.5." | ||
"Make sure you are running a compatible version." | ||
), | ||
category=UserWarning, | ||
stacklevel=2, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought about only including is_managed_externally
to the update payloads in case --disallow-edits
flag was used, but then I realized that if a user wanted to trigger a new sync without this flag to set it to False
, that wouldn't work. So I only added this warning which in addition to the error faced for legacy instances should be pretty straightforward (the error would actually happens even if --disallow-edits
isn't included).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing! I left a few suggestions.
Co-authored-by: Beto Dealmeida <roberto@dealmeida.net>
This PR updates the
_run_query
function to use the new API endpoint by default, but maintains backwards compatibility in case the new endpoint is not available (returns 404) (for older Superset versions).By default, the virtual dataset creation would include running the query, but this PR also updates the default behavior since now the dataset endpoint supports virtual dataset creation directly (without
columns
information). This change was also made with backwards compatibility in mind for older Superset versions.This PR fixes #231 and #219.