Migrate Off Circle CI / To Github Actions + dagger.io (#923)

* Add Github action for integration test * Update tox * Fetch spark from https link * Use Spark version 3.1.2 * Seperate running Spark session and thrift * Use Spark 3.1.2 and Hadoop 3.2 * Reset tox.ini * Remove base pythons in tox.ini * Fix reference to Docker compose file * Remove timeout * Remove artifact steps * Bump Spark and Hadoop versions * Reset Spark and Hadoop version * Update comment * Add changie * add databricks and PR execution protections * use single quotes * remove `_target` suffix * add comment to test * specify container user as root * formatting * remove python setup for pre-existing container * download simba * fix curl call * fix curl call * fix curl call * fix curl call * fix curl call * fix curl call * fix db test naming * confirm ODBC driver installed * add odbc driver env var * add odbc driver env var * specify platform * check odbc driver integrity * add dbt user env var * add dbt user env var * fix host_name env var * try removing architecture arg * swap back to pull_request_target * try running on host instead of container * Update .github/workflows/integration.yml Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com> * try running odbcinst -j * remove bash * add sudo * add sudo * update odbc.ini * install libsasl2-modules-gssapi-mit * install libsasl2-modules-gssapi-mit * set -e on odbc install * set -e on odbc install * set -e on odbc install * sudo echo odbc.inst * remove postgres components * remove release related items * remove irrelevant output * move long bash script into its own file * update integration.yml to align with other adapters * revert name change * revert name change * combine databricks and spark tests * combine databricks and spark tests * Add dagger * remove platform * add dagger setup * add dagger setup * set env vars * install requirements * install requirements * add DEFAULT_ENV_VARS and test_path arg * remove circle ci * formatting * update changie * Update .changes/unreleased/Under the Hood-20230929-161218.yaml Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com> * formatting fixes and simplify env_var handling * remove tox, update CONTRIBUTING.md and cleanup GHA workflows * remove tox, update CONTRIBUTING.md and cleanup GHA workflows * install test reqs in main.yml * install test reqs in main.yml * formatting * remove tox from dev-requirements.txt and Makefile * clarify spark crt instantiation * add comments on python-version --------- Co-authored-by: Cor Zuurmond <jczuurmond@protonmail.com> Co-authored-by: Florian Eiden <florian.eiden@fleid.fr> Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com> Co-authored-by: Mike Alfare <13974384+mikealfare@users.noreply.github.com> Co-authored-by: Mike Alfare <mike.alfare@dbtlabs.com>
dbt-labs · Jan 10, 2024 · f9f75e9 · f9f75e9
1 parent 5210d0a
commit f9f75e9
Show file tree

Hide file tree

Showing 20 changed files with 408 additions and 242 deletions.
diff --git a/.changes/unreleased/Under the Hood-20230929-161218.yaml b/.changes/unreleased/Under the Hood-20230929-161218.yaml
@@ -0,0 +1,6 @@
+kind: Under the Hood
+body: Add GitHub action for integration testing and use dagger-io to run tests. Remove CircleCI workflow.
+time: 2023-09-29T16:12:18.968755+02:00
+custom:
+  Author: JCZuurmond, colin-rogers-dbt
+  Issue: "719"
diff --git a/.circleci/config.yml b/.circleci/config.yml
diff --git a/.github/scripts/update_dbt_core_branch.sh b/.github/scripts/update_dbt_core_branch.sh
@@ -0,0 +1,17 @@
+#!/bin/bash -e
+set -e
+
+git_branch=$1
+target_req_file="dev-requirements.txt"
+core_req_sed_pattern="s|dbt-core.git.*#egg=dbt-core|dbt-core.git@${git_branch}#egg=dbt-core|g"
+tests_req_sed_pattern="s|dbt-core.git.*#egg=dbt-tests|dbt-core.git@${git_branch}#egg=dbt-tests|g"
+if [[ "$OSTYPE" == darwin* ]]; then
+ # mac ships with a different version of sed that requires a delimiter arg
+ sed -i "" "$core_req_sed_pattern" $target_req_file
+ sed -i "" "$tests_req_sed_pattern" $target_req_file
+else
+ sed -i "$core_req_sed_pattern" $target_req_file
+ sed -i "$tests_req_sed_pattern" $target_req_file
+fi
+core_version=$(curl "https://github.com/raw/dbt-labs/dbt-core/${git_branch}/core/dbt/version.py" | grep "__version__ = *"|cut -d'=' -f2)
+bumpversion --allow-dirty --new-version "$core_version" major
diff --git a/.github/workflows/integration.yml b/.github/workflows/integration.yml
@@ -0,0 +1,112 @@
+# **what?**
+# Runs integration tests.
+
+# **why?**
+# Ensure code runs as expected.
+
+# **when?**
+# This will run for all PRs, when code is pushed to a release
+# branch, and when manually triggered.
+
+name: Adapter Integration Tests
+
+on:
+  push:
+    branches:
+      - "main"
+      - "*.latest"
+
+  pull_request_target:
+    paths-ignore:
+      - ".changes/**"
+      - ".flake8"
+      - ".gitignore"
+      - "**.md"
+
+  workflow_dispatch:
+    inputs:
+      dbt-core-branch:
+        description: "branch of dbt-core to use in dev-requirements.txt"
+        required: false
+        type: string
+
+# explicitly turn off permissions for `GITHUB_TOKEN`
+permissions: read-all
+
+# will cancel previous workflows triggered by the same event and for the same ref for PRs or same SHA otherwise
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event_name }}-${{ contains(github.event_name, 'pull_request_target') && github.event.pull_request.head.ref || github.sha }}
+  cancel-in-progress: true
+
+defaults:
+  run:
+    shell: bash
+
+jobs:
+
+  test:
+    name: ${{ matrix.test }}
+    runs-on: ubuntu-latest
+
+    strategy:
+      fail-fast: false
+      matrix:
+        test:
+          - "apache_spark"
+          - "spark_session"
+          - "databricks_sql_endpoint"
+          - "databricks_cluster"
+          - "databricks_http_cluster"
+
+    env:
+      DBT_INVOCATION_ENV: github-actions
+      DD_CIVISIBILITY_AGENTLESS_ENABLED: true
+      DD_API_KEY: ${{ secrets.DATADOG_API_KEY }}
+      DD_SITE: datadoghq.com
+      DD_ENV: ci
+      DD_SERVICE: ${{ github.event.repository.name }}
+      DBT_DATABRICKS_CLUSTER_NAME: ${{ secrets.DBT_DATABRICKS_CLUSTER_NAME }}
+      DBT_DATABRICKS_HOST_NAME: ${{ secrets.DBT_DATABRICKS_HOST_NAME }}
+      DBT_DATABRICKS_ENDPOINT: ${{ secrets.DBT_DATABRICKS_ENDPOINT }}
+      DBT_DATABRICKS_TOKEN: ${{ secrets.DBT_DATABRICKS_TOKEN }}
+      DBT_DATABRICKS_USER: ${{ secrets.DBT_DATABRICKS_USERNAME }}
+      DBT_TEST_USER_1: "buildbot+dbt_test_user_1@dbtlabs.com"
+      DBT_TEST_USER_2: "buildbot+dbt_test_user_2@dbtlabs.com"
+      DBT_TEST_USER_3: "buildbot+dbt_test_user_3@dbtlabs.com"
+
+    steps:
+      - name: Check out the repository
+        if: github.event_name != 'pull_request_target'
+        uses: actions/checkout@v3
+        with:
+          persist-credentials: false
+
+      # explicitly checkout the branch for the PR,
+      # this is necessary for the `pull_request` event
+      - name: Check out the repository (PR)
+        if: github.event_name == 'pull_request_target'
+        uses: actions/checkout@v3
+        with:
+          persist-credentials: false
+          ref: ${{ github.event.pull_request.head.sha }}
+
+      # the python version used here is not what is used in the tests themselves
+      - name: Set up Python for dagger
+        uses: actions/setup-python@v4
+        with:
+          python-version: "3.11"
+
+      - name: Install python dependencies
+        run: |
+          python -m pip install --user --upgrade pip
+          python -m pip --version
+          python -m pip install -r dagger/requirements.txt
+
+      - name: Update dev_requirements.txt
+        if: inputs.dbt-core-branch != ''
+        run: |
+          pip install bumpversion
+          ./.github/scripts/update_dbt_core_branch.sh ${{ inputs.dbt-core-branch }}
+
+      - name: Run tests for ${{ matrix.test }}
+        run: python dagger/run_dbt_spark_tests.py --profile ${{ matrix.test }}
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
@@ -19,7 +19,6 @@ on:
     branches:
       - "main"
       - "*.latest"
-      - "releases/*"
   pull_request:
   workflow_dispatch:
 
@@ -81,10 +80,6 @@ jobs:
       matrix:
         python-version: ["3.8", "3.9", "3.10", "3.11"]
 
-    env:
-      TOXENV: "unit"
-      PYTEST_ADDOPTS: "-v --color=yes --csv unit_results.csv"
-
     steps:
       - name: Check out the repository
         uses: actions/checkout@v3
@@ -100,10 +95,12 @@ jobs:
           sudo apt-get install libsasl2-dev
           python -m pip install --user --upgrade pip
           python -m pip --version
-          python -m pip install tox
-          tox --version
-      - name: Run tox
-        run: tox
+          python -m pip install -r requirements.txt
+          python -m pip install -r dev-requirements.txt
+          python -m pip install -e .
+
+      - name: Run unit tests
+        run: python -m pytest --color=yes --csv unit_results.csv -v tests/unit
 
       - name: Get current date
         if: always()

diff --git a/.gitignore b/.gitignore
@@ -44,3 +44,5 @@ test.env
 .hive-metastore/
 .spark-warehouse/
 dbt-integration-tests
+/.tool-versions
+/.hypothesis/*
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -65,11 +65,27 @@ $EDITOR test.env
 ### Test commands
 There are a few methods for running tests locally.
 
-#### `tox`
-`tox` takes care of managing Python virtualenvs and installing dependencies in order to run tests. You can also run tests in parallel, for example you can run unit tests for Python 3.8, Python 3.9, and `flake8` checks in parallel with `tox -p`. Also, you can run unit tests for specific python versions with `tox -e py38`. The configuration of these tests are located in `tox.ini`.
+#### dagger
+To run functional tests we rely on [dagger](https://dagger.io/). This launches a virtual container or containers to test against.
 
-#### `pytest`
-Finally, you can also run a specific test or group of tests using `pytest` directly. With a Python virtualenv active and dev dependencies installed you can do things like:
+```sh
+pip install -r dagger/requirements.txt
+python dagger/run_dbt_spark_tests.py --profile databricks_sql_endpoint --test-path tests/functional/adapter/test_basic.py::TestSimpleMaterializationsSpark::test_base
+```
+
+`--profile`: required, this is the kind of spark connection to test against
+
+_options_:
+  - "apache_spark"
+  - "spark_session"
+  - "databricks_sql_endpoint"
+  - "databricks_cluster"
+  - "databricks_http_cluster"
+
+`--test-path`: optional, this is the path to the test file you want to run. If not specified, all tests will be run.
+
+#### pytest
+Finally, you can also run a specific test or group of tests using `pytest` directly (if you have all the dependencies set up on your machine). With a Python virtualenv active and dev dependencies installed you can do things like:
 
 ```sh
 # run all functional tests

diff --git a/Makefile b/Makefile
@@ -3,7 +3,7 @@
 .PHONY: dev
 dev: ## Installs adapter in develop mode along with development dependencies
 	@\
-	pip install -e . -r requirements.txt -r dev-requirements.txt && pre-commit install
+	pip install -e . -r requirements.txt -r dev-requirements.txt -r dagger/requirements.txt && pre-commit install
 
 .PHONY: dev-uninstall
 dev-uninstall: ## Uninstalls all packages while maintaining the virtual environment
@@ -40,12 +40,13 @@ linecheck: ## Checks for all Python lines 100 characters or more
 .PHONY: unit
 unit: ## Runs unit tests with py38.
 	@\
-	tox -e py38
+	python -m pytest tests/unit
 
 .PHONY: test
 test: ## Runs unit tests with py38 and code checks against staged changes.
 	@\
-	tox -p -e py38; \
+	python -m pytest tests/unit; \
+	python dagger/run_dbt_spark_tests.py --profile spark_session \
 	pre-commit run black-check --hook-stage manual | grep -v "INFO"; \
 	pre-commit run flake8-check --hook-stage manual | grep -v "INFO"; \
 	pre-commit run mypy-check --hook-stage manual | grep -v "INFO"

diff --git a/README.md b/README.md
@@ -5,9 +5,6 @@
   <a href="https://github.com/dbt-labs/dbt-spark/actions/workflows/main.yml">
     <img src="https://github.com/dbt-labs/dbt-spark/actions/workflows/main.yml/badge.svg?event=push" alt="Unit Tests Badge"/>
   </a>
-  <a href="https://circleci.com/gh/dbt-labs/dbt-spark/?branch=main">
-    <img src="https://circleci.com/gh/dbt-labs/dbt-spark/tree/main.svg?style=shield" alt="Integration Tests Badge"/>
-  </a>
 </p>
 
 **[dbt](https://www.getdbt.com/)** enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

diff --git a/dagger/requirements.txt b/dagger/requirements.txt
@@ -0,0 +1,2 @@
+dagger-io~=0.8.0
+python-dotenv