Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TransformAndCatalog flow #720

Merged
merged 36 commits into from
Aug 30, 2023
Merged

Add TransformAndCatalog flow #720

merged 36 commits into from
Aug 30, 2023

Conversation

Rafalz13
Copy link
Contributor

@Rafalz13 Rafalz13 commented Jul 14, 2023

Summary

It is a need to use DBT in viadot 1.0. DBTTask and CloneRepo tasks are migrated from viadot 2.0. Also, TransformAndCatalog flow is built using viadot 1.0 structure.

Importance

To be able to use the DBT using viadot 1.0.

Checklist

This PR:

  • follows the guidelines laid out in CONTRIBUTING.md
  • links relevant issue(s)
  • adds/updates tests (if appropriate)
  • adds/updates docstrings (if appropriate)
  • adds an entry in CHANGELOG.md

Copy link
Contributor

@angelika233 angelika233 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments

requirements.txt Outdated
azure-identity==1.7.1
great-expectations==0.14.12
azure-identity>=1.10.0, <2.0.0
great-expectations>=0.15.50, <0.16.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you specifying a range of versions? wouldn't it be better to set version to specific one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed great-expectations to 0.15.50


def test_luma_ingest():
task = LumaIngest(
credentials_secret="luma-dev",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of the time we specify credentials in uppercase

Args:
dbt_project_path (str): The path to the dbt project (the directory containing
the `dbt_project.yml` file).
name (str): The name of the Flow. Defaults to "Transform and Catalog".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifying flow name won't cause later issues? for example, it's easier to overwrite the flow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

RUN pip install --upgrade pip
RUN pip install -r requirements.txt
RUN pip install -r requirements-dev.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are both requirements files needed or the dev one is used only for certain cases? if so you can add if statement for certain tag ie dev

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requirements-dev line is removed from Dockerfile

viadot/flows/transform_and_catalog.py Show resolved Hide resolved
stream_output=True,
).bind(flow=self)

test_select = self.dbt_selects.get("test", run_select)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe change the name from "test" to sth different, test suggests that it's something just fore testing not production

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was migrated from 2.0. I will ask how it can be changed.

from prefect.utilities.logging import get_logger
from prefect.utilities.tasks import defaults_from_attrs

from viadot.task_utils import *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to avoid using *?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was an issue with the circular import

@trymzet
Copy link
Contributor

trymzet commented Aug 2, 2023

@Rafalz13 I think we also need the _viadot_source column to be added as part of metadata fields added to each dataset ingested by viadot in to_df() method of the base Source class (eg. like the decorator approach in viadot 2).

@trymzet trymzet changed the title Migrate DBT related code to viadot 1.0 Add TransformAndCatalog flow Aug 4, 2023
@Rafalz13
Copy link
Contributor Author

Rafalz13 commented Aug 9, 2023

@trymzet function is added to utils (#735). Users can add a decorator to source class function. For now, adding this decorator to Source class is impossible because of the different structure of the code.

@trymzet
Copy link
Contributor

trymzet commented Aug 10, 2023

@trymzet function is added to utils (#735). Users can add a decorator to source class function. For now, adding this decorator to Source class is impossible because of the different structure of the code.

Hm I didn't udnerstand that.

You can still add it to the base class; then only the custom sources will need to be modified to use the util function (or just use the to_df() api).

The way you did this, you need to modify all Source subclasses, even the standard ones] with the to_df() method.

@trymzet
Copy link
Contributor

trymzet commented Aug 11, 2023

@trymzet function is added to utils (#735). Users can add a decorator to source class function. For now, adding this decorator to Source class is impossible because of the different structure of the code.

Ok created issue to document this #737

viadot/utils.py Outdated
@@ -18,7 +20,7 @@
from .exceptions import APIError
from .signals import SKIP

logger = logging.get_logger(__name__)
logger = get_logger(__name__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be a Prefect logger? These utils are a the viadot level; Prefect utils are in a separate file.

viadot/tasks/luma.py Outdated Show resolved Hide resolved
viadot/tasks/luma.py Outdated Show resolved Hide resolved
tests/integration/tasks/test_luma.py Show resolved Hide resolved
viadot/flows/transform_and_catalog.py Outdated Show resolved Hide resolved
viadot/flows/transform_and_catalog.py Outdated Show resolved Hide resolved
viadot/flows/transform_and_catalog.py Outdated Show resolved Hide resolved
viadot/flows/transform_and_catalog.py Outdated Show resolved Hide resolved
viadot/flows/transform_and_catalog.py Outdated Show resolved Hide resolved
@Rafalz13 Rafalz13 requested a review from trymzet August 30, 2023 14:17
Copy link
Contributor

@trymzet trymzet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Rafalz13 Rafalz13 merged commit c900eb1 into dev Aug 30, 2023
3 checks passed
@trymzet trymzet deleted the dbt_to_viadot_1 branch March 19, 2024 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants