Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sdk): autogenerate urn types #9257

Merged
merged 31 commits into from
Nov 30, 2023
Merged

Conversation

hsheth2
Copy link
Collaborator

@hsheth2 hsheth2 commented Nov 17, 2023

Also fixes an issue where the models documentation wasn't showing up properly with sphinx.

Remaining items:

  • Make this work with custom packages
  • Add a note to about updating datahub
  • Change the canonical import path to datahub.metadata.urns

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Nov 17, 2023
metadata-ingestion/scripts/avro_codegen.py Outdated Show resolved Hide resolved
metadata-ingestion/scripts/avro_codegen.py Show resolved Hide resolved
metadata-ingestion/scripts/avro_codegen.py Show resolved Hide resolved
metadata-ingestion/scripts/avro_codegen.py Show resolved Hide resolved
URN_TYPES: Dict[str, Type["_SpecificUrn"]] = {}


def _split_entity_id(entity_id: str) -> List[str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some doctests / examples of what this is doing would be nice. I eventually got it but took a sec. Also kinda odd to me that it basically ignores parentheses, besides checking that there's the right amount of them in an appropriate order.

It's less efficient but think it'd be clearer to break this up:

PARENS = ["(", ")"]
def _split_entity_id(entity_id: str) -> List[str]:
    if not (entity_id.startswith("(") and entity_id.endswith(")")):
        return [entity_id]
    parens = [v for v in entity_id if v == "(" or v == ")"]
    if not _parens_valid(parens):
        raise InvalidUrnError(...)
    return entity_id[1:-1].split(",")

I think that's what this function is doing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was copied from the existing urn code, so I'm inclined to leave it as is since it seems to work

metadata-ingestion/src/datahub/utilities/urns/_urn_base.py Outdated Show resolved Hide resolved
metadata-ingestion/src/datahub/utilities/urns/_urn_base.py Outdated Show resolved Hide resolved

# TODO: Add handling for url encoded urns e.g. urn%3A ...

if not urn_str.startswith("urn:li:"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use the constants above here? Or that's just overkill

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine

@hsheth2 hsheth2 added the merge-pending-ci A PR that has passed review and should be merged once CI is green. label Nov 30, 2023
@hsheth2 hsheth2 changed the title feat(ingest): autogenerate urn types feat(sdk): autogenerate urn types Nov 30, 2023
@hsheth2 hsheth2 merged commit a7dc9c9 into datahub-project:master Nov 30, 2023
54 checks passed
@hsheth2 hsheth2 deleted the autogen-urns branch November 30, 2023 23:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata merge-pending-ci A PR that has passed review and should be merged once CI is green.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants