Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update supported versions matrix #137

Merged
merged 13 commits into from
Jul 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 30 additions & 31 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,32 +22,30 @@ jobs:
fail-fast: false
matrix:
python-version:
- '3.12'
- '3.11'
- '3.10'
- '3.9'
- '3.8'
airflow-version:
- '2.7.2'
- '2.6.3'
- '2.5.3'
- '2.4.3'
- '2.9.2'
- '2.8.4'
- '2.7.3'
Comment on lines 30 to +33
Copy link
Owner

@tomasfarias tomasfarias Jun 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll probably want a policy for Airflow version support.

For Python versions it's easy: We aim to support whatever dbt supports, which currently has only a >=3.8. restriction.

For Airflow, I'd say we definitely want to test against latest official minor version (2.9.x), and ideally also latest MWAA-supported version (2.8.1). Astronomer supports new Airflow versions pretty much as soon as they come out, so we are covered there already.

Any other hosted service we should consider prioritizing? If not, I'd consider just supporting 2.9 and 2.8.1 here. There is a tradeoff between as much support as possible and CI times and dependency resolution. So, I'd try to keep the number of "tested" supported versions as small as possible, to avoid the test matrix exploding too much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to support Airflow 2.7.x as it is still in use by Cloud Composer

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind mentioning in the docs?

We mention testing against latest MWAA version in https://github.com/tomasfarias/airflow-dbt-python/blob/master/docs/development.rst?plain=1#L40, so we could extend that with a few extra words: "... latest version available in AWS MWAA and in Cloud Composer" + a link to the Cloud composer docs.

dbt-version:
- 1.8
- 1.7
Comment on lines 34 to 36
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 latest dbt versions is a fair policy 👍

- 1.6
- 1.5
- 1.4
exclude:
# Incompatible combinations
- python-version: 3.11
airflow-version: '2.4.3'
- python-version: 3.12
airflow-version: '2.8.4'

- python-version: 3.11
airflow-version: '2.5.3'
- python-version: 3.12
airflow-version: '2.7.3'

runs-on: ubuntu-latest
steps:
- name: Harden Runner
uses: step-security/harden-runner@v2.6.1
uses: step-security/harden-runner@v2.8.1
with:
egress-policy: block
allowed-endpoints: >
Expand All @@ -73,34 +71,35 @@ jobs:
sudo apt-get update
sudo apt-get install --yes --no-install-recommends postgresql

- uses: actions/checkout@v4.1.1
- uses: actions/checkout@v4.1.7
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4.7.1
uses: actions/setup-python@v5.1.0
with:
python-version: ${{ matrix.python-version }}

- name: Install Poetry
uses: abatilo/actions-poetry@v2.3.0
with:
poetry-version: 1.7.0
poetry-version: 1.8.3

- name: Install airflow-dbt-python with Poetry
run: poetry install -E postgres --with dev

- name: Install Airflow with constraints
- name: Install Airflow & dbt
run: |
wget https://github.com/raw/apache/airflow/constraints-${{ matrix.airflow-version }}/constraints-${{ matrix.python-version }}.txt -O constraints.txt
poetry run pip install apache-airflow==${{ matrix.airflow-version }} apache-airflow-providers-amazon apache-airflow-providers-ssh -c constraints.txt
poetry run pip install "dbt-core~=${{ matrix.dbt-version }}.0" "dbt-postgres~=${{ matrix.dbt-version }}.0"
poetry run airflow db init
poetry env use ${{ matrix.python-version }}
poetry add "apache-airflow==${{ matrix.airflow-version }}" \
"dbt-core~=${{ matrix.dbt-version }}.0" \
"dbt-postgres~=${{ matrix.dbt-version }}.0" \
--python ${{ matrix.python-version }}
poetry install -E postgres --with dev
poetry run airflow db migrate
poetry run airflow connections create-default-connections

- name: Linting with ruff
run: poetry run ruff .
run: poetry run ruff check .

- name: Static type checking with mypy
# We only run mypy on the latest supported versions of Airflow & dbt,
# so it is currently impossible to write conditions for that depend on package versions.
if: matrix.airflow-version == '2.7.2' && matrix.dbt-version == '1.7'
if: matrix.python-version == '3.12' && matrix.airflow-version == '2.9.2' && matrix.dbt-version == '1.8'
run: poetry run mypy .

- name: Code formatting with black
Expand Down Expand Up @@ -131,7 +130,7 @@ jobs:

steps:
- name: Harden Runner
uses: step-security/harden-runner@v2.6.0
uses: step-security/harden-runner@v2.8.1
with:
egress-policy: block
allowed-endpoints: >
Expand All @@ -140,15 +139,15 @@ jobs:
api.github.com:443
pypi.org:443

- uses: actions/checkout@v4.1.1
- uses: actions/setup-python@v4.7.1
- uses: actions/checkout@v4.1.7
- uses: actions/setup-python@v5.1.0
with:
python-version: '3.11'
python-version: '3.12'

- name: Install Poetry
uses: abatilo/actions-poetry@v2.3.0
with:
poetry-version: 1.7.0
poetry-version: 1.8.3

- name: Install airflow-dbt-python with Poetry
run: poetry install --with dev -E airflow-providers
Expand Down Expand Up @@ -179,7 +178,7 @@ jobs:
path: htmlcov

- name: "Make coverage badge"
uses: schneegans/dynamic-badges-action@v1.6.0
uses: schneegans/dynamic-badges-action@v1.7.0
if: github.event_name != 'pull_request'
with:
auth: ${{ secrets.GIST_TOKEN }}
Expand Down
16 changes: 11 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@ Read the [documentation](https://airflow-dbt-python.readthedocs.io) for examples
## Requirements

Before using *airflow-dbt-python*, ensure you meet the following requirements:
* A *dbt* project using [dbt-core](https://pypi.org/project/dbt-core/) version 1.4.0 or later.
* An Airflow environment using version 2.2 or later.
* A *dbt* project using [dbt-core](https://pypi.org/project/dbt-core/) version 1.7.5 or later.
* An Airflow environment using version 2.7 or later.

* If using any managed service, like AWS MWAA, ensure your environment is created with a supported version of Airflow.
* If using any managed service, like AWS MWAA or GCP Cloud Composer 2/3, ensure your environment is created with a supported version of Airflow.
* If self-hosting, Airflow installation instructions can be found in their [official documentation](https://airflow.apache.org/docs/apache-airflow/stable/installation/index.html).

* Running Python 3.8 or later in your Airflow environment.
Expand All @@ -29,7 +29,7 @@ Before using *airflow-dbt-python*, ensure you meet the following requirements:

> **Note**
>
> Older versions of Airflow and *dbt* may work with *airflow-dbt-python*, although we cannot guarantee this. Our testing pipeline runs the latest *dbt-core* with the latest Airflow release, and the latest version supported by [AWS MWAA](https://aws.amazon.com/managed-workflows-for-apache-airflow/).
> Older versions of Airflow and *dbt* may work with *airflow-dbt-python*, although we cannot guarantee this. Our testing pipeline runs the latest *dbt-core* with the latest Airflow release, and the latest version supported by [AWS MWAA](https://aws.amazon.com/managed-workflows-for-apache-airflow/) and [GCP Cloud Composer 2/3](https://aws.amazon.com/managed-workflows-for-apache-airflow/).

## From PyPI

Expand Down Expand Up @@ -66,6 +66,12 @@ Add *airflow-dbt-python* to your `requirements.txt` file and edit your Airflow e

Read the [documentation](https://airflow-dbt-python.readthedocs.io/en/latest/getting_started.html#installing-in-mwaa) for more a more detailed AWS MWAA installation breakdown.

## In GCP Cloud Composer
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Great updates!


Add *airflow-dbt-python* to your PyPI packages list.

Read the [documentation](https://cloud.google.com/composer/docs/composer-2/install-python-dependencies#install-pypi) for more a more detailed GCP Cloud Composer 2 installation breakdown.

## In other managed services

*airflow-dbt-python* should be compatible with most or all Airflow managed services. Consult the documentation specific to your provider.
Expand Down Expand Up @@ -119,7 +125,7 @@ See an example DAG [here](examples/airflow_connection_target_dag.py).

Although [`dbt`](https://docs.getdbt.com/) is meant to be installed and used as a CLI, we may not have control of the environment where Airflow is running, disallowing us the option of using *dbt* as a CLI.

This is exactly what happens when using [Amazon's Managed Workflows for Apache Airflow](https://aws.amazon.com/managed-workflows-for-apache-airflow/) or MWAA: although a list of Python requirements can be passed, the CLI cannot be found in the worker's PATH.
This is exactly what happens when using [Amazon's Managed Workflows for Apache Airflow](https://aws.amazon.com/managed-workflows-for-apache-airflow/) (aka MWAA): although a list of Python requirements can be passed, the CLI cannot be found in the worker's PATH.

There is a workaround which involves using Airflow's `BashOperator` and running Python from the command line:

Expand Down
1 change: 1 addition & 0 deletions airflow_dbt_python/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
"""Provides an Airflow operator and hooks to run all or most dbtcommands."""

from .__version__ import __author__, __copyright__, __title__, __version__
3 changes: 2 additions & 1 deletion airflow_dbt_python/__version__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
"""The module's version information."""

__author__ = "Tomás Farías Santana"
__copyright__ = "Copyright 2021 Tomás Farías Santana"
__title__ = "airflow-dbt-python"
__version__ = "2.0.1"
__version__ = "2.1.0"
73 changes: 26 additions & 47 deletions airflow_dbt_python/hooks/dbt.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""Provides a hook to interact with a dbt project."""

from __future__ import annotations

import json
Expand All @@ -24,14 +25,13 @@
from airflow.hooks.base import BaseHook
from airflow.models.connection import Connection

from airflow_dbt_python.utils.version import DBT_INSTALLED_LESS_THAN_1_5
from airflow_dbt_python.utils.version import DBT_INSTALLED_GTE_1_8

if sys.version_info >= (3, 11):
from contextlib import chdir as chdir_ctx
else:
from contextlib_chdir import chdir as chdir_ctx


if TYPE_CHECKING:
from dbt.contracts.results import RunResult
from dbt.task.base import BaseTask
Expand Down Expand Up @@ -227,18 +227,11 @@ def run_dbt_task(
A tuple containing a boolean indicating success and optionally the results
of running the dbt command.
"""
from dbt.adapters.factory import register_adapter
from dbt.adapters.factory import adapter_management
from dbt.task.base import get_nearest_project_dir
from dbt.task.clean import CleanTask
from dbt.task.deps import DepsTask

from airflow_dbt_python.utils.version import DBT_INSTALLED_LESS_THAN_1_5

if DBT_INSTALLED_LESS_THAN_1_5:
from dbt.main import adapter_management, track_run # type: ignore
else:
from dbt.adapters.factory import adapter_management
from dbt.tracking import track_run
from dbt.tracking import track_run

config = self.get_dbt_task_config(command, **kwargs)
extra_target = self.get_dbt_target_from_connection(config.target)
Expand All @@ -252,36 +245,24 @@ def run_dbt_task(
) as dbt_dir:
# When creating tasks via from_args, dbt switches to the project directory.
# We have to do that here as we are not using from_args.
if DBT_INSTALLED_LESS_THAN_1_5:
# For compatibility with older versions of dbt, as the signature
# of move_to_nearest_project_dir changed in dbt-core 1.5 to take
# just the project_dir.
nearest_project_dir = get_nearest_project_dir(config) # type: ignore
else:
nearest_project_dir = get_nearest_project_dir(config.project_dir)
nearest_project_dir = get_nearest_project_dir(config.project_dir)

with chdir_ctx(nearest_project_dir):
config.dbt_task.pre_init_hook(config)
self.ensure_profiles(config)

task, runtime_config = config.create_dbt_task(
extra_target, write_perf_info
)
requires_profile = isinstance(task, (CleanTask, DepsTask))

self.setup_dbt_logging(task, config.debug)
with adapter_management():
task, runtime_config = config.create_dbt_task(
extra_target, write_perf_info
)
requires_profile = isinstance(task, (CleanTask, DepsTask))

if runtime_config is not None and not requires_profile:
# The deps command installs the dependencies, which means they may
# not exist before deps runs and the following would raise a
# CompilationError.
runtime_config.load_dependencies()
self.setup_dbt_logging(task, config.debug)

results = None
with adapter_management():
if not requires_profile:
if runtime_config is not None:
register_adapter(runtime_config)
if runtime_config is not None and not requires_profile:
# The deps command installs the dependencies, which means they
# may not exist before deps runs and the following would raise a
# CompilationError.
runtime_config.load_dependencies()

with track_run(task):
results = task.run()
Expand Down Expand Up @@ -419,19 +400,17 @@ def setup_dbt_logging(self, task: BaseTask, debug: Optional[bool]):
default_stdout. As these are initialized by the CLI app, we need to
initialize them here.
"""
from dbt.events.functions import setup_event_logger

log_path = None
if task.config is not None:
log_path = getattr(task.config, "log_path", None)

if DBT_INSTALLED_LESS_THAN_1_5:
setup_event_logger(log_path or "logs")
if DBT_INSTALLED_GTE_1_8:
from dbt.events.logging import setup_event_logger
else:
from dbt.flags import get_flags
from dbt.events.functions import ( # type: ignore[no-redef]
setup_event_logger,
)

from dbt.flags import get_flags

flags = get_flags()
setup_event_logger(flags)
flags = get_flags()
setup_event_logger(flags)

configured_file = logging.getLogger("configured_file")
file_log = logging.getLogger("file_log")
Expand All @@ -458,7 +437,7 @@ def ensure_profiles(self, config: BaseConfig):
if not profiles_path.exists():
profiles_path.parent.mkdir(exist_ok=True)
with profiles_path.open("w", encoding="utf-8") as f:
f.write("config:\n send_anonymous_usage_stats: false\n")
f.write("flags:\n send_anonymous_usage_stats: false\n")

def get_dbt_target_from_connection(
self, target: Optional[str]
Expand Down
3 changes: 2 additions & 1 deletion airflow_dbt_python/hooks/git.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""A concrete DbtRemoteHook for git repositories with dulwich."""

import datetime as dt
from typing import Callable, Optional, Tuple, Union

Expand Down Expand Up @@ -90,7 +91,7 @@ def upload(

repo.stage(str(f.relative_to(source)))

ts = dt.datetime.utcnow()
ts = dt.datetime.now(dt.timezone.utc)
repo.do_commit(
self.commit_msg.format(ts=ts).encode(), self.commit_author.encode()
)
Expand Down
1 change: 1 addition & 0 deletions airflow_dbt_python/hooks/localfs.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

Intended to be used only when running Airflow with a LocalExceutor.
"""

from __future__ import annotations

import shutil
Expand Down
1 change: 1 addition & 0 deletions airflow_dbt_python/hooks/remote.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

Currently, only AWS S3 and the local filesystem are supported as remotes.
"""

from abc import ABC, abstractmethod
from pathlib import Path
from typing import Optional, Type
Expand Down
1 change: 1 addition & 0 deletions airflow_dbt_python/hooks/s3.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""An implementation for an S3 remote for dbt."""

from __future__ import annotations

from typing import Iterable, Optional
Expand Down
Loading
Loading