Add new macros for diff calculation, and unit tests (#99) (#101)

* Add new macros for diff calculation, and unit tests (#99) * Add macro for new hash-based comparison strategy * split out SF-focused version of macro * Fix change to complex object * Fix overuse of star * switch from compare rels to compare queries * provide wrapping parens * switch to array of columns for PK * split unit tests into own files, change unit tests to array pk * tidy up get_comp_bounds * fix arg rename * add quick_are_queries_identical and unit tests * Move data tests into own directory * Add test for multiple PKs * fix incorrect unit test configs * make data types for id and id_2 big enough nums * Mock event_time response * fix hardcoded value in quick_are_qs_identical * Add unit tests for null handling (still broken) * Rename columsn to be more unique * Steal surrogate key macro from utils * Use generated surrogate key across the board in place of PK * rm my profile reference * Update quick_are_queries_identical.sql * Add diagram explaining comparison bounds * Add comments explaining warehouse-specific optimisations * cross-db support * subq * no postgres or redshift for a sec * add default var values for compare wrappers * avoid lateral alias reference for BQ * BQ doesn't support count(arg1, arg2) * re-enable redshift * Alias subq for redshift * remove extra comma * add row status of nonunique_pk * remove redundant test and wrapper model * Create json-y tests for snowflake * Add workaround for redshift to support count num rows in status * skip incompatible tests * Fix redshift lack of bool_or support in window funcs * add skip exclusions for everything else * fix incorrect skip tag application * Move user configs to project.yml from profiles * Temporarily disable unpassable redshift tests * add temp skip to circle's config.yml * forgot tag: method * Temporarily skip reworked_compare_all_statuses_different_column_set * Skip another test redshift * disable unsupported tests BQ * postgres too? * Fixes for postgres * namespace macros * It's a postgres problem, not a redshift problem * Handle postgres 63 char limit * Add databricks * Rename tests to data_tests * Found a better workaround for missing count distinct window * actually call the macro * disable syntax-failing tests on dbx * try to install core from main to get sorting fix * Revert "try to install core from main to get sorting fix" This reverts commit d28f3e1. * Audit helper code review changes * add BQ support for qucik are queries identical * explain why using dense_rank * remove the compile step to avoid compilation error * Don't throw incompatible quick compare error during parse * add where clause to check we're not assuming its absence * enable first basic struct tests * Skip raising exception during parsing * json_build_object doesn't work on rs * changed behaviour redshift * skip complex structs on rs for now * temp disable all complex structs * skip some currently failoing bq tests * Properly exclude tests to skip, add comments * dbx too * rename reworked_compare to compare_and_classify_query_results * Rename files * rename macro file * Add relation_focused macros * Add BQ-specific generate_set_results for hashes, enable json tests * Implement hash comparisons for BQ and DBX (#103) * disable tests for unrelated adapters * Avoid lateral column aliasing * First cross-db complex struct fixture * Add final fixtures * Initial work on dbx compatibility * remove lateral column alias dbx * cast everything as string before hashing * add comment, enable all tests again * rename to dbt_audit_in_a instead of in_a * Protect against missing PK columns * gitignore package-lock.yml * add dbx variant of simple structs * Rename private macros to have _ prefix * Fix get comparison bounds (#104) * change to getting comparison bounds for queries not relations * add test for introspective queries * Make compare query columns multi pk (#105) * rm packagelock.yml
dbt-labs · Jun 13, 2024 · d10124a · d10124a
1 parent 8473293
commit d10124a
Show file tree

Hide file tree

Showing 64 changed files with 1,752 additions and 106 deletions.
diff --git a/.circleci/config.yml b/.circleci/config.yml
@@ -33,7 +33,7 @@ jobs:
             . dbt_venv/bin/activate
 
             python -m pip install --upgrade pip setuptools
-            python -m pip install --pre dbt-core dbt-postgres dbt-redshift dbt-snowflake dbt-bigquery
+            python -m pip install --pre dbt-core dbt-postgres dbt-redshift dbt-snowflake dbt-bigquery dbt-databricks
 
             mkdir -p ~/.dbt
             cp integration_tests/ci/sample.profiles.yml ~/.dbt/profiles.yml
@@ -51,9 +51,8 @@ jobs:
             cd integration_tests
             dbt deps --target postgres
             dbt seed --target postgres --full-refresh
-            dbt compile --target postgres
-            dbt run --target postgres
-            dbt test --target postgres
+            dbt run --target postgres --exclude tag:skip+ tag:temporary_skip+
+            dbt test --target postgres --exclude tag:skip+ tag:temporary_skip+
 
       - run:
           name: "Run Tests - Redshift"
@@ -63,9 +62,8 @@ jobs:
             cd integration_tests
             dbt deps --target redshift
             dbt seed --target redshift --full-refresh
-            dbt compile --target redshift
-            dbt run --target redshift
-            dbt test --target redshift
+            dbt run --target redshift --exclude tag:skip+ tag:temporary_skip+
+            dbt test --target redshift --exclude tag:skip+ tag:temporary_skip+
 
       - run:
           name: "Run Tests - Snowflake"
@@ -75,9 +73,8 @@ jobs:
             cd integration_tests
             dbt deps --target snowflake
             dbt seed --target snowflake --full-refresh
-            dbt compile --target snowflake
-            dbt run --target snowflake
-            dbt test --target snowflake
+            dbt run --target snowflake --exclude tag:skip+ tag:temporary_skip+
+            dbt test --target snowflake --exclude tag:skip+ tag:temporary_skip+
 
       - run:
           name: "Run Tests - BigQuery"
@@ -90,10 +87,19 @@ jobs:
             cd integration_tests
             dbt deps --target bigquery
             dbt seed --target bigquery --full-refresh
-            dbt compile --target bigquery
-            dbt run --target bigquery --full-refresh
-            dbt test --target bigquery
+            dbt run --target bigquery --full-refresh --exclude tag:skip+ tag:temporary_skip+
+            dbt test --target bigquery --exclude tag:skip+ tag:temporary_skip+
 
+      - run:
+          name: "Run Tests - Databricks"
+          command: |
+            . dbt_venv/bin/activate
+            echo `pwd`
+            cd integration_tests
+            dbt deps --target databricks
+            dbt seed --target databricks --full-refresh
+            dbt run --target databricks --exclude tag:skip+ tag:temporary_skip+
+            dbt test --target databricks --exclude tag:skip+ tag:temporary_skip+
 
       - save_cache:
           key: deps1-{{ .Branch }}
@@ -115,3 +121,4 @@ workflows:
             - profile-redshift
             - profile-snowflake
             - profile-bigquery
+            - profile-databricks
diff --git a/.gitignore b/.gitignore
@@ -1,4 +1,7 @@
 target/
 dbt_packages/
 logs/
-logfile
+logfile
+.DS_Store
+package-lock.yml
+integration_tests/package-lock.yml
diff --git a/.vscode/settings.json b/.vscode/settings.json
@@ -0,0 +1,21 @@
+{    
+    "yaml.schemas": {
+        "https://github.com/raw/dbt-labs/dbt-jsonschema/main/schemas/latest/dbt_yml_files-latest.json": [
+            "/**/*.yml",
+            "!profiles.yml",
+            "!dbt_project.yml",
+            "!packages.yml",
+            "!selectors.yml",
+            "!profile_template.yml"
+        ],
+        "https://github.com/raw/dbt-labs/dbt-jsonschema/main/schemas/latest/dbt_project-latest.json": [
+            "dbt_project.yml"
+        ],
+        "https://github.com/raw/dbt-labs/dbt-jsonschema/main/schemas/latest/selectors-latest.json": [
+            "selectors.yml"
+        ],
+        "https://github.com/raw/dbt-labs/dbt-jsonschema/main/schemas/latest/packages-latest.json": [
+            "packages.yml"
+        ]
+    },
+}
diff --git a/integration_tests/ci/sample.profiles.yml b/integration_tests/ci/sample.profiles.yml
@@ -2,10 +2,6 @@
 # HEY! This file is used in the dbt-audit-helper integrations tests with CircleCI.
 # You should __NEVER__ check credentials into version control. Thanks for reading :)
 
-config:
-    send_anonymous_usage_stats: False
-    use_colors: True
-
 integration_tests:
   target: postgres
   outputs:
@@ -27,15 +23,15 @@ integration_tests:
       dbname: "{{ env_var('REDSHIFT_TEST_DBNAME') }}"
       port: "{{ env_var('REDSHIFT_TEST_PORT') | as_number }}"
       schema: audit_helper_integration_tests_redshift
-      threads: 1
+      threads: 8
 
     bigquery:
       type: bigquery
       method: service-account
       keyfile: "{{ env_var('BIGQUERY_SERVICE_KEY_PATH') }}"
       project: "{{ env_var('BIGQUERY_TEST_DATABASE') }}"
       schema: audit_helper_integration_tests_bigquery
-      threads: 1
+      threads: 8
 
     snowflake:
       type: snowflake
@@ -46,4 +42,12 @@ integration_tests:
       database: "{{ env_var('SNOWFLAKE_TEST_DATABASE') }}"
       warehouse: "{{ env_var('SNOWFLAKE_TEST_WAREHOUSE') }}"
       schema: audit_helper_integration_tests_snowflake
-      threads: 1
+      threads: 8
+
+    databricks:
+      type: databricks
+      schema: dbt_project_evaluator_integration_tests_databricks
+      host: "{{ env_var('DATABRICKS_TEST_HOST') }}"
+      http_path: "{{ env_var('DATABRICKS_TEST_HTTP_PATH') }}"
+      token: "{{ env_var('DATABRICKS_TEST_ACCESS_TOKEN') }}"
+      threads: 10
diff --git a/integration_tests/dbt_project.yml b/integration_tests/dbt_project.yml
@@ -17,3 +17,14 @@ clean-targets:         # directories to be removed by `dbt clean`
 
 seeds:
   +quote_columns: false
+
+vars:
+  compare_queries_summarize: true
+  primary_key_columns_var: ['col1']
+  columns_var: ['col1']
+  event_time_var:
+  quick_are_queries_identical_cols: ['col1']
+
+flags:
+  send_anonymous_usage_stats: False
+  use_colors: True
diff --git a/integration_tests/macros/unit_tests/struct_generation_macros.sql b/integration_tests/macros/unit_tests/struct_generation_macros.sql
@@ -0,0 +1,26 @@
+{%- macro _basic_json_function() -%}
+    {%- if target.type == 'snowflake' -%}
+        object_construct
+    {%- elif target.type == 'bigquery' -%}
+        json_object
+    {%- elif target.type == 'databricks' -%}
+        map
+    {%- elif execute -%}
+        {# Only raise exception if it's actually being called, not during parsing #}
+        {%- do exceptions.raise_compiler_error("Unknown adapter '"~ target.type ~ "'") -%}
+    {%- endif -%}
+{%- endmacro -%}
+
+{% macro _complex_json_function(json) %}
+
+    {% if target.type == 'redshift' %}
+        json_parse({{ json }})
+    {% elif target.type == 'databricks' %}
+        from_json({{ json }}, schema_of_json({{ json }}))
+    {% elif target.type in ['snowflake', 'bigquery'] %}
+        parse_json({{ json }})
+    {% elif execute %}
+        {# Only raise exception if it's actually being called, not during parsing #}
+        {%- do exceptions.raise_compiler_error("Unknown adapter '"~ target.type ~ "'") -%}    
+    {% endif %}
+{% endmacro %}
diff --git a/integration_tests/models/compare_which_columns_differ_exclude_cols.sql b/integration_tests/models/compare_which_columns_differ_exclude_cols.sql
diff --git a/...re_all_columns_concat_pk_with_summary.sql → ...re_all_columns_concat_pk_with_summary.sql b/...re_all_columns_concat_pk_with_summary.sql → ...re_all_columns_concat_pk_with_summary.sql
diff --git a/...all_columns_concat_pk_without_summary.sql → ...all_columns_concat_pk_without_summary.sql b/...all_columns_concat_pk_without_summary.sql → ...all_columns_concat_pk_without_summary.sql
diff --git a/...dels/compare_all_columns_where_clause.sql → ...ests/compare_all_columns_where_clause.sql b/...dels/compare_all_columns_where_clause.sql → ...ests/compare_all_columns_where_clause.sql
diff --git a/...dels/compare_all_columns_with_summary.sql → ...ests/compare_all_columns_with_summary.sql b/...dels/compare_all_columns_with_summary.sql → ...ests/compare_all_columns_with_summary.sql
diff --git a/..._all_columns_with_summary_and_exclude.sql → ..._all_columns_with_summary_and_exclude.sql b/..._all_columns_with_summary_and_exclude.sql → ..._all_columns_with_summary_and_exclude.sql
diff --git a/...s/compare_all_columns_without_summary.sql → ...s/compare_all_columns_without_summary.sql b/...s/compare_all_columns_without_summary.sql → ...s/compare_all_columns_without_summary.sql
diff --git a/integration_tests/models/data_tests/compare_and_classify_query_results.sql b/integration_tests/models/data_tests/compare_and_classify_query_results.sql
@@ -0,0 +1,11 @@
+-- this has no tests, it's just making sure that the introspecive queries for event_time actually run
+
+{{
+    audit_helper.compare_and_classify_query_results(
+        a_query="select * from " ~ ref('unit_test_model_a') ~ " where 1=1",
+        b_query="select * from " ~ ref('unit_test_model_b') ~ " where 1=1",
+        primary_key_columns=['id'],
+        columns=['id', 'col1', 'col2'],
+        event_time='created_at'
+    )
+}}
diff --git a/integration_tests/models/compare_queries.sql → ...sts/models/data_tests/compare_queries.sql b/integration_tests/models/compare_queries.sql → ...sts/models/data_tests/compare_queries.sql
diff --git a/...are_queries_concat_pk_without_summary.sql → ...are_queries_concat_pk_without_summary.sql b/...are_queries_concat_pk_without_summary.sql → ...are_queries_concat_pk_without_summary.sql
diff --git a/...s/models/compare_queries_with_summary.sql → ...ta_tests/compare_queries_with_summary.sql b/...s/models/compare_queries_with_summary.sql → ...ta_tests/compare_queries_with_summary.sql
diff --git a/...odels/compare_queries_without_summary.sql → ...tests/compare_queries_without_summary.sql b/...odels/compare_queries_without_summary.sql → ...tests/compare_queries_without_summary.sql
diff --git a/...tests/models/compare_relation_columns.sql → ...s/data_tests/compare_relation_columns.sql b/...tests/models/compare_relation_columns.sql → ...s/data_tests/compare_relation_columns.sql
diff --git a/...e_relations_concat_pk_without_summary.sql → ...e_relations_concat_pk_without_summary.sql b/...e_relations_concat_pk_without_summary.sql → ...e_relations_concat_pk_without_summary.sql
diff --git a/...models/compare_relations_with_exclude.sql → ..._tests/compare_relations_with_exclude.sql b/...models/compare_relations_with_exclude.sql → ..._tests/compare_relations_with_exclude.sql
diff --git a/...models/compare_relations_with_summary.sql → ..._tests/compare_relations_with_summary.sql b/...models/compare_relations_with_summary.sql → ..._tests/compare_relations_with_summary.sql
diff --git a/...els/compare_relations_without_exclude.sql → ...sts/compare_relations_without_exclude.sql b/...els/compare_relations_without_exclude.sql → ...sts/compare_relations_without_exclude.sql
diff --git a/...els/compare_relations_without_summary.sql → ...sts/compare_relations_without_summary.sql b/...els/compare_relations_without_summary.sql → ...sts/compare_relations_without_summary.sql
diff --git a/...ation_tests/models/compare_row_counts.sql → .../models/data_tests/compare_row_counts.sql b/...ation_tests/models/compare_row_counts.sql → .../models/data_tests/compare_row_counts.sql
diff --git a/...s/models/compare_which_columns_differ.sql → ...ta_tests/compare_which_columns_differ.sql b/...s/models/compare_which_columns_differ.sql → ...ta_tests/compare_which_columns_differ.sql
@@ -9,9 +9,9 @@ select
     has_difference
 from (
 
-    {{ audit_helper.compare_which_columns_differ(
+    {{ audit_helper.compare_which_relation_columns_differ(
         a_relation=a_relation,
         b_relation=b_relation,
-        primary_key="id"
+        primary_key_columns=["id"]
     ) }}
 ) as macro_output
diff --git a/integration_tests/models/data_tests/compare_which_columns_differ_exclude_cols.sql b/integration_tests/models/data_tests/compare_which_columns_differ_exclude_cols.sql
@@ -0,0 +1,25 @@
+{% set a_relation=ref('data_compare_which_columns_differ_a')%}
+
+{% set b_relation=ref('data_compare_which_columns_differ_b') %}
+
+{% set pk_cols = ['id'] %}
+{% set cols = ['id','value_changes','becomes_not_null','does_not_change'] %}
+
+{% if target.type == 'snowflake' %}
+    {% set pk_cols = pk_cols | map("upper") | list %}
+    {% set cols = cols | map("upper") | list %}
+{% endif %}
+
+select 
+    lower(column_name) as column_name,
+    has_difference
+from (
+
+    {{ audit_helper.compare_which_relation_columns_differ(
+        a_relation=a_relation,
+        b_relation=b_relation,
+        primary_key_columns=pk_cols,
+        columns=cols
+    ) }}
+
+) as macro_output
diff --git a/integration_tests/models/schema.yml → ...ration_tests/models/data_tests/schema.yml b/integration_tests/models/schema.yml → ...ration_tests/models/data_tests/schema.yml
@@ -2,96 +2,96 @@ version: 2
 
 models:
   - name: compare_queries
-    tests:
+    data_tests:
       - dbt_utils.equality:
           compare_model: ref('expected_results__compare_relations_without_exclude')
 
   - name: compare_queries_concat_pk_without_summary
-    tests:
+    data_tests:
       - dbt_utils.equality:
           compare_model: ref('expected_results__compare_without_summary')
 
   - name: compare_queries_with_summary
-    tests:
+    data_tests:
       - dbt_utils.equality:
           compare_model: ref('expected_results__compare_with_summary')
 
   - name: compare_queries_without_summary
-    tests:
+    data_tests:
       - dbt_utils.equality:
           compare_model: ref('expected_results__compare_without_summary')
 
   - name: compare_relations_with_summary
-    tests:
+    data_tests:
       - dbt_utils.equality:
           compare_model: ref('expected_results__compare_with_summary')
 
   - name: compare_relations_without_summary
-    tests:
+    data_tests:
       - dbt_utils.equality:
           compare_model: ref('expected_results__compare_without_summary')
 
   - name: compare_relations_with_exclude
-    tests:
+    data_tests:
       - dbt_utils.equality:
           compare_model: ref('expected_results__compare_relations_with_exclude')
 
   - name: compare_relations_without_exclude
-    tests:
+    data_tests:
       - dbt_utils.equality:
           compare_model: ref('expected_results__compare_relations_without_exclude')
 
   - name: compare_all_columns_with_summary
-    tests:
+    data_tests:
       - dbt_utils.equality:
           compare_model: ref('expected_results__compare_all_columns_with_summary')
 
   - name: compare_all_columns_without_summary
-    tests:
+    data_tests:
       - dbt_utils.equality:
           compare_model: ref('expected_results__compare_all_columns_without_summary')
 
   - name: compare_all_columns_concat_pk_with_summary
-    tests:
+    data_tests:
       - dbt_utils.equality:
           compare_model: ref('expected_results__compare_all_columns_concat_pk_with_summary')
 
   - name: compare_all_columns_concat_pk_without_summary
-    tests:
+    data_tests:
       - dbt_utils.equality:
           compare_model: ref('expected_results__compare_all_columns_concat_pk_without_summary')
 
   - name: compare_all_columns_with_summary_and_exclude
-    tests:
+    data_tests:
       - dbt_utils.equality:
           compare_model: ref('expected_results__compare_all_columns_with_summary_and_exclude')
 
   - name: compare_all_columns_where_clause
-    tests:
+    data_tests:
       - dbt_utils.equality:
           compare_model: ref('expected_results__compare_all_columns_where_clause')
 
   - name: compare_relation_columns
-    tests:
+    data_tests:
       - dbt_utils.equality:
           compare_model: ref('expected_results__compare_relation_columns')
 
   - name: compare_relations_concat_pk_without_summary
-    tests:
+    data_tests:
       - dbt_utils.equality:
           compare_model: ref('expected_results__compare_without_summary')
 
   - name: compare_which_columns_differ
-    tests:
+    data_tests:
       - dbt_utils.equality:
           compare_model: ref('expected_results__compare_which_columns_differ')
 
   - name: compare_which_columns_differ_exclude_cols
-    tests:
+    data_tests:
       - dbt_utils.equality:
           compare_model: ref('expected_results__compare_which_columns_differ_exclude_cols')
 
   - name: compare_row_counts
-    tests:
+    data_tests:
       - dbt_utils.equality:
           compare_model: ref('expected_results__compare_row_counts')
diff --git a/integration_tests/models/unit_test_placeholder_models/unit_test_model_a.sql b/integration_tests/models/unit_test_placeholder_models/unit_test_model_a.sql
@@ -0,0 +1 @@
+select 12 as id, 22 as id_2, 'xyz' as col1, 'tuv' as col2, 123 as col3, {{ dbt.current_timestamp() }} as created_at
diff --git a/integration_tests/models/unit_test_placeholder_models/unit_test_model_b.sql b/integration_tests/models/unit_test_placeholder_models/unit_test_model_b.sql
@@ -0,0 +1 @@
+select 12 as id, 22 as id_2, 'xyz' as col1, 'tuv' as col2, 123 as col3, {{ dbt.current_timestamp() }} as created_at