Skip to content

Commit

Permalink
Adding cost optimization to billing page (#4101)
Browse files Browse the repository at this point in the history
## What are you changing in this pull request and why?

Adding a few sections around optimizing costs within the new billing
structure

## Checklist

- [x] Review the [Content style
guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md)
and [About
versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version)
so my content adheres to these guidelines.
- [x] Add a checklist item for anything that needs to happen before this
PR is merged, such as "needs technical review" or "change base branch."
  • Loading branch information
nghi-ly authored Sep 22, 2023
2 parents 4413397 + 554007a commit d8a207a
Showing 1 changed file with 84 additions and 0 deletions.
84 changes: 84 additions & 0 deletions website/docs/docs/cloud/billing.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,90 @@ There are 2 options to disable models from being built and charged:
2. Alternatively, you can delete some or all of your dbt Cloud jobs. This will ensure that no runs are kicked off, but you will permanently lose your job(s).


## Optimize costs in dbt Cloud

dbt Cloud offers ways to optimize your model’s built usage and warehouse costs.

### Best practices for optimizing successful models built

When thinking of ways to optimize your costs from successful models built, there are methods to reduce those costs while still adhering to best practices. To ensure that you are still utilizing tests and rebuilding views when logic is changed, it's recommended to implement a combination of the best practices that fit your needs. More specifically, if you decide to exclude views from your regularly scheduled dbt Cloud job runs, it's imperative that you set up a merge job (with a link to the section) to deploy updated view logic when changes are detected.

#### Exclude views in a dbt Cloud job

Many dbt Cloud users utilize views, which don’t always need to be rebuilt every time you run a job. For any jobs that contain views that _do not_ include macros that dynamically generate code (for example, case statements) based on upstream tables and also _do not_ have tests, you can implement these steps:

1. Go to your current production deployment job in dbt Cloud.
2. Modify your command to include: `-exclude config.materialized:view`.
3. Save your job changes.

If you have views that contain macros with case statements based on upstream tables, these will need to be run each time to account for new values. If you still need to test your views with each run, follow the [Exclude views while still running tests](#exclude-views-while-running-tests) best practice to create a custom selector.

#### Exclude views while running tests

Running tests for views in every job run can help keep data quality intact and save you from the need to rerun failed jobs. To exclude views from your job run while running tests, you can follow these steps to create a custom [selector](https://docs.getdbt.com/reference/node-selection/yaml-selectors) for your job command.

1. Open your dbt project in the dbt Cloud IDE.
2. Add a file called `selectors.yml` in your top-level project folder.
3. In the file, add the following code:

```yaml
selectors:
- name: skip_views_but_test_views
description: >
A default selector that will exclude materializing views
without skipping tests on views.
default: true
definition:
union:
- union:
- method: path
value: "*"
- exclude:
- method: config.materialized
value: view
- method: resource_type
value: test

```

4. Save the file and commit it to your project.
5. Modify your dbt Cloud jobs to include `--selector skip_views_but_test_views`.

#### Build only changed views

If you want to ensure that you're building views whenever the logic is changed, create a merge job that gets triggered when code is merged into main:

1. Ensure you have a [CI job setup](/docs/deploy/ci-jobs) in your environment.
2. Create a new [deploy job](/docs/deploy/deploy-jobs#create-and-schedule-jobs) and call it “Merge Job".
3. Set the  **Environment** to your CI environment. Refer to [Types of environments](/docs/deploy/deploy-environments#types-of-environments) for more details.
4. Set **Commands** to: `dbt run -s state:modified+`.
Executing `dbt build` in this context is unnecessary because the CI job was used to both run and test the code that just got merged into main.
5. Under the **Execution Settings**, select the default production job to compare changes against:
- **Defer to a previous run state** — Select the “Merge Job” you created so the job compares and identifies what has changed since the last merge.
6. In your dbt project, follow the steps in [Run a dbt Cloud job on merge](/guides/orchestration/custom-cicd-pipelines/3-dbt-cloud-job-on-merge) to create a script to trigger the dbt Cloud API to run your job after a merge happens within your git repository or watch this [video](https://www.loom.com/share/e7035c61dbed47d2b9b36b5effd5ee78?sid=bcf4dd2e-b249-4e5d-b173-8ca204d9becb).

The purpose of the merge job is to:

- Immediately deploy any changes from PRs to production.
- Ensure your production views remain up-to-date with how they’re defined in your codebase while remaining cost-efficient when running jobs in production.

The merge action will optimize your cloud data platform spend and shorten job times, but you’ll need to decide if making the change is right for your dbt project.

### Rework inefficient models

#### Job Insights tab

To reduce your warehouse spend, you can identify what models, on average, are taking the longest to build in the **Job** page under the **Insights** tab. This chart looks at the average run time for each model based on its last 20 runs. Any models that are taking longer than anticipated to build might be prime candidates for optimization, which will ultimately reduce cloud warehouse spending.

#### Model Timing tab

To understand better how long each model takes to run within the context of a specific run, you can look at the **Model Timing** tab. Select the run of interest on the **Run History** page to find the tab. On that **Run** page, click **Model Timing**.

Once you've identified which models could be optimized, check out these other resources that walk through how to optimize your work:
* [Build scalable and trustworthy data pipelines with dbt and BigQuery](https://services.google.com/fh/files/misc/dbt_bigquery_whitepaper.pdf)
* [Best Practices for Optimizing Your dbt and Snowflake Deployment](https://www.snowflake.com/wp-content/uploads/2021/10/Best-Practices-for-Optimizing-Your-dbt-and-Snowflake-Deployment.pdf)
* [How to optimize and troubleshoot dbt models on Databricks](/guides/dbt-ecosystem/databricks-guides/how_to_optimize_dbt_models_on_databricks)

## FAQs

* What happens if I need more than 8 seats on the Team plan?
Expand Down

0 comments on commit d8a207a

Please sign in to comment.