Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(oauth): adding necessary changes to support bigquery oauth #30674

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

fisjac
Copy link
Contributor

@fisjac fisjac commented Oct 22, 2024

SUMMARY

Adding needed changes to support future implementation of OAuth2 functionality for BigQuery.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@@ -225,6 +225,10 @@ def ping(engine: Engine) -> bool:
# bubble up the exception to return proper status code
raise
except Exception as ex:
if database.is_oauth2_enabled() and database.db_engine_spec.needs_oauth2(
Copy link
Contributor Author

@fisjac fisjac Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BigQuery raises an error when running create_engine() necessitating another needs_oauth2(ex) check for the proper oauth2 exceptions to trigger the Oauth dance. Most DB's allow for an engine to be created without valid creds, and instead raises the exception on engine.connect()

@@ -1691,10 +1691,13 @@ def select_star( # pylint: disable=too-many-arguments
return sql

@classmethod
def estimate_statement_cost(cls, statement: str, cursor: Any) -> dict[str, Any]:
def estimate_statement_cost(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Database is required to create the BigQuery Client when using OAuth2

@@ -351,7 +351,16 @@ def get_allow_cost_estimate(cls, extra: dict[str, Any]) -> bool:
return True

@classmethod
def estimate_statement_cost(cls, statement: str, cursor: Any) -> dict[str, Any]:
def estimate_statement_cost(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aligning other specs to have database, and adding missing docstring

@@ -365,9 +365,12 @@ def get_schema_from_engine_params(
return parse.unquote(database.split("/")[1])

@classmethod
def estimate_statement_cost(cls, statement: str, cursor: Any) -> dict[str, Any]:
def estimate_statement_cost(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding database param

effective_username,
access_token,
)
# Checking if the function signature can accept database as a param
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, here BigQuery needs the Database object to build the client. Checks if the function signature has database as a param, then passes it.

@@ -192,3 +192,4 @@ class OAuth2ClientConfigSchema(Schema):
)
authorization_request_uri = fields.String(required=True)
token_request_uri = fields.String(required=True)
project_id = fields.String(required=False)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BigQuery takes project_id as a param in its OAuth2 parameters

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideally we'd want to store the project id in the URI, to make it compatible with non-oauth2 use cases. And then later when you need you can grab it from database.sqlalchemy_uri.database.

Copy link

codecov bot commented Oct 22, 2024

Codecov Report

Attention: Patch coverage is 85.18519% with 4 lines in your changes missing coverage. Please review.

Project coverage is 70.82%. Comparing base (76d897e) to head (25cdd3a).
Report is 882 commits behind head on master.

Files with missing lines Patch % Lines
superset/db_engine_specs/bigquery.py 50.00% 2 Missing ⚠️
...Modal/DatabaseConnectionForm/OAuth2ClientField.tsx 50.00% 0 Missing and 1 partial ⚠️
superset/commands/database/test_connection.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #30674       +/-   ##
===========================================
+ Coverage   60.48%   70.82%   +10.33%     
===========================================
  Files        1931     1987       +56     
  Lines       76236    80190     +3954     
  Branches     8568     9170      +602     
===========================================
+ Hits        46114    56792    +10678     
+ Misses      28017    21166     -6851     
- Partials     2105     2232      +127     
Flag Coverage Δ
hive 48.91% <56.00%> (-0.25%) ⬇️
javascript 58.61% <50.00%> (+0.89%) ⬆️
mysql 76.74% <80.00%> (?)
postgres 76.87% <80.00%> (?)
presto 53.38% <56.00%> (-0.42%) ⬇️
python 83.91% <88.00%> (+20.42%) ⬆️
sqlite 76.32% <80.00%> (?)
unit 60.87% <84.00%> (+3.24%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@michael-s-molina michael-s-molina added review:draft review:checkpoint Last PR reviewed during the daily review standup labels Oct 22, 2024
@fisjac fisjac marked this pull request as ready for review October 22, 2024 20:09
@dosubot dosubot bot added authentication:sso Single Sign On data:connect:googlebigquery Related to BigQuery labels Oct 22, 2024
Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but wondering if we need to modify _get_client here to take a database object.

superset/models/core.py Outdated Show resolved Hide resolved
superset/models/core.py Outdated Show resolved Hide resolved
@michael-s-molina michael-s-molina removed review:checkpoint Last PR reviewed during the daily review standup review:draft labels Oct 23, 2024
@pull-request-size pull-request-size bot added size/L and removed size/M labels Oct 24, 2024
@@ -106,6 +108,15 @@ export const OAuth2ClientField = ({ changeMethods, db }: FieldPropTypes) => {
onChange={handleChange('scope')}
/>
</FormItem>
{db.engine === Engines.BigQuery && (
<FormItem label="Project ID">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's localize the label

)
# PR #30674 changed the signature of the method to include database.
# This ensures that the change is backwards compatible
sig = signature(self.db_engine_spec.update_impersonation_config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have sufficient test coverage for this case?

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I left a few comments on the handling of project_id. Happy to hop on a meeting to discuss it in more detail.


interface OAuth2ClientInfo {
id: string;
secret: string;
authorization_request_uri: string;
token_request_uri: string;
scope: string;
project_id?: string;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is BigQuery specific, it's better to not add it here, otherwise the interface loses its purpose. Ideally we want to keep only the common information that every oauth2 client has.

Can we have this in a separate attribute instead?

@@ -192,3 +192,4 @@ class OAuth2ClientConfigSchema(Schema):
)
authorization_request_uri = fields.String(required=True)
token_request_uri = fields.String(required=True)
project_id = fields.String(required=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideally we'd want to store the project id in the URI, to make it compatible with non-oauth2 use cases. And then later when you need you can grab it from database.sqlalchemy_uri.database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants