New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Updates for unit testing docs #6345

Draft

dbeatty10 wants to merge 5 commits into current from dbeatty/docs-issue-6344

Contributor

dbeatty10 commented Oct 23, 2024 •

edited by github-actions bot

Loading

What are you changing in this pull request and why?

Resolves #6344

Checklist

I have reviewed the Content style guide so my content adheres to these guidelines.

🚀 Deployment available! Here are the direct links to the updated files:

https://docs-getdbt-com-git-dbeatty-docs-issue-6344-dbt-labs.vercel.app/docs/build/unit-tests


          First draft of updates from Ganz

a3af151

vercel bot commented Oct 23, 2024 •

edited

Loading

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
docs-getdbt-com	✅ Ready (Inspect)	Visit Preview	Oct 23, 2024 3:27pm

github-actions bot added content size: small labels

vercel bot deployed to Preview

October 23, 2024 14:48

View deployment

dbeatty10 commented

View reviewed changes

website/docs/docs/build/unit-tests.md Outdated Show resolved Hide resolved

website/docs/docs/build/unit-tests.md Outdated Show resolved Hide resolved

website/docs/docs/build/unit-tests.md Outdated Show resolved Hide resolved

website/docs/docs/build/unit-tests.md Outdated Show resolved Hide resolved

website/docs/docs/build/unit-tests.md Outdated Show resolved Hide resolved

website/docs/docs/build/unit-tests.md Outdated Show resolved Hide resolved

website/docs/docs/build/unit-tests.md Outdated Show resolved Hide resolved


          Restore original formatting in several places

19ff5b2

vercel bot deployed to Preview

October 23, 2024 14:57

View deployment

dbeatty10 added 2 commits

October 23, 2024 08:59


          Restore original

69715ac


          Suggested updates from Ganz

8488b99

dbeatty10 mentioned this pull request

ref and source functions both return a Relation object #6331

Open

1 task

vercel bot deployed to Preview

October 23, 2024 15:11

View deployment

dbeatty10 commented

View reviewed changes

Contributor Author

dbeatty10 left a comment

Left some comments on updates to make in what we say and where we say it.

website/docs/docs/build/unit-tests.md

               - Unit tests must be defined in a YML file in your `models/` directory.
               - Table names must be [aliased](/docs/build/custom-aliases) in order to unit test `join` logic.
               - Redshift customers need to be aware of a [limitation when building unit tests](/reference/resource-configs/redshift-configs#unit-test-limitations) that requires a workaround.
+              - All references (`ref()`) used in your model must be included in the unit test configuration as input fixtures, even if they do not directly affect the logic being tested. If these references are missing, you may encounter "node not found" errors during compilation.

Contributor Author

dbeatty10 Oct 23, 2024

Should be ref or source.

See #6331 for a related docs issue asking for a unified super concept that describes both ref and source calls.

website/docs/docs/build/unit-tests.md

		@@ -56,6 +57,8 @@ Use the [resource type](/reference/global-configs/resource-type) flag `--exclude

		## Unit testing a model

		When defining mock data for a unit test, it's crucial to include all necessary input values that satisfy the entire model logic. This means including values that fulfill any `WHERE` clauses, `JOIN` conditions, or other constraints present in the model, even if they do not seem directly related to the specific logic being tested. Failing to do so may lead to errors or unexpected null values in the unit test results.

Contributor Author

dbeatty10 Oct 23, 2024

We probably don't want to lead with this paragraph. Instead, we'd probably want to move it somewhere else. Possibly under a new "pitfalls" section?

website/docs/docs/build/unit-tests.md

               Learn about [exit codes](/reference/exit-codes) for more information.
+              ### Common Pitfalls
+              > - **Missing Fixtures for Referenced Models**: When creating a unit test, all referenced models must be declared as mock inputs. Missing any referenced model, even if it isn't directly involved in the specific logic being tested, will lead to compilation errors such as "node not found."
+              > - **Not Satisfying `WHERE` or `JOIN` Logic**: Ensure that the mock data meets all conditions in the model, such as `WHERE` clauses or `JOIN` requirements. If these conditions are not met, the unit test will either return null rows or fail to execute properly. This often involves adding rows for auxiliary data tables, like locations or transactions, to satisfy joins and filters.

Contributor Author

dbeatty10 Oct 23, 2024 •

edited

Loading

We'd probably want to give a more detailed example here. Otherwise, it's hard to determine what exactly we mean here. Maybe it would be best to create a post in Discourse with all the details and then link to it?

website/docs/docs/build/unit-tests.md

Comment on lines +329 to +354

+              ### How Unit Tests Compile
+              > During a unit test, dbt creates Common Table Expressions (CTEs) for all dependencies of the model using the mock input data you provide. These CTEs replace the actual references (`ref()`) in the model and allow dbt to run your SQL logic against the mock data.
+              >
+              > For example, when you provide a reference such as `ref('stg_transactions')`, dbt creates a CTE named `__dbt__cte__stg_transactions` that contains the mocked data. The entire compiled SQL might look something like this:
+              > ```sql
+              > with
+              >  __dbt__cte__stg_transactions as (
+              >   -- fixture for stg_transactions
+              >   -- contains unions to create "test inputs" corresponding to all rows
+              > ),
+              >  __dbt__cte__stg_locations as (
+              >   -- fixture for stg_locations
+              >   -- contains select statement that "mocks" stg_locations
+              > ),
+              > applied_donations as (
+              >  select
+              >    transaction_id,
+              >    sum(cash_value) as donated_cash
+              >  from __dbt__cte__stg_donations
+              >  group by transaction_id
+              > )
+              > select * from __dbt__cte__stg_transactions;
+              > ```
+              > Understanding this process will help ensure you configure your unit tests correctly and avoid common issues.

Contributor Author

dbeatty10 Oct 23, 2024

This is interesting implementation detail that helps in troubleshooting when there is some kind of error. In general, this kind of detail doesn't feel like it should go in product docs though because the maintainers are free to change the details at will. Maybe we move it to Markdown docs within the dbt-core or dbt-adapters repo instead?


          Merge branch 'current' into dbeatty/docs-issue-6344

4c8513a

vercel bot deployed to Preview

October 23, 2024 15:27

View deployment

jairus-m commented Oct 23, 2024 •

edited

Loading

Hey @dbeatty10!

I noticed this draft and wanted to point you to #6330 .

I hit super similar updates and give an example! hope this may be helpful whether it's approved or not

Here is that proposed section:

Caveats to unit test configuration

As you start to unit test logic in models that contain complex joins or conditions, there are a few things to consider:

All references in your tested model must be declared in the configuration regardless if they're directly relevant to the core unit test logic or not.

In addition, when mocking data for your unit test, ensure it satisfies all conditions in your model's SQL, including those that may not be directly related to the specific logic that you're testing.

This includes:

WHERE clause conditions
JOIN conditions
Any other logic that affects row selection or transformations

Continuing the example above, we can create an updated version of the customers table called dim_customers_with_orders. Here, we join an additional orders table and add a condition in the WHERE clause:

with customers as (

    select * from {{ ref('stg_customers') }}

),

-- Addition of a ref() that is not used in core unit test
orders as (

  select
      customer_id,
      count(distinct order_id) as num_orders,
      sum(order_total) as total_order_value
  from {{ ref('stg_orders') }}
  group by customer_id

),

accepted_email_domains as (

    select * from {{ ref('top_level_email_domains') }}

)

select
    customers.customer_id,
    customers.first_name,
    customers.last_name,
    customers.email,
    coalesce(regexp_like(
		customers.email, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'
		)
	= true
	and accepted_email_domains.tld is not null,
	false) as is_valid_email_address
    orders.num_orders,
    orders.total_order_value
from customers
left join orders
    on customers.customer_id = orders.customer_id
left join accepted_email_domains
    on customers.email_top_level_domain = lower(accepted_email_domains.tld)
-- Addition of WHERE condition
where customers.age is >= 21

With the added CTE (orders) and the additional statement in the WHERE clause, we now need to add mock data that satifies where age >= 21 as well as add an empty input reference to stg_orders. The resulting unit test configuration will now be:

unit_tests:
  - name: test_is_valid_email_address
    description: "Check my is_valid_email_address logic captures all known edge cases - emails without ., emails without @, and emails from invalid domains."
    model: dim_customers
    given:
      - input: ref('stg_customers')
        rows:
          - {email: cool@example.com,    email_top_level_domain: example.com,  age: 21}
          - {email: cool@unknown.com,    email_top_level_domain: unknown.com,  age: 21}
          - {email: badgmail.com,        email_top_level_domain: gmail.com,    age: 21}
          - {email: missingdot@gmailcom, email_top_level_domain: gmail.com,    age: 21}
      - input: ref('top_level_email_domains')
        rows:
          - {tld: example.com}
          - {tld: gmail.com}
      - input: ref('stg_orders')
        rows:
          - {}
    expect:
      rows:
        - {email: cool@example.com,    is_valid_email_address: true}
        - {email: cool@unknown.com,    is_valid_email_address: false}
        - {email: badgmail.com,        is_valid_email_address: false}
        - {email: missingdot@gmailcom, is_valid_email_address: false}

While the unit test in this example was for a specific column, it’s important to remember that mock inputs are going to be passed through the entire model as a whole and therefore, must be declared as such and configured properly to meet all the model’s expectations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

content size: small