Skip to content

Commit

Permalink
Merge pull request #272 from dbt-labs/repo-sync
Browse files Browse the repository at this point in the history
REPO SYNC - Public to Private
  • Loading branch information
john-rock authored Oct 28, 2023
2 parents d182dd3 + 35a5949 commit c8262e6
Show file tree
Hide file tree
Showing 18 changed files with 33 additions and 1,131 deletions.
8 changes: 4 additions & 4 deletions website/blog/2021-11-29-dbt-airflow-spiritual-alignment.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,22 +144,22 @@ An analyst will be in the dark when attempting to debug this, and will need to r
This can be perfectly ok, in the event your data team is structured for data engineers to exclusively own dbt modeling duties, but that’s a quite uncommon org structure pattern from what I’ve seen. And if you have easy solutions for this analyst-blindness problem, I’d love to hear them.

Once the data has been ingested, dbt Core can be used to model it for consumption. Most of the time, users choose to either:
Use the dbt CLI+ [BashOperator](https://registry.astronomer.io/providers/apache-airflow/modules/bashoperator) with Airflow (If you take this route, you can use an external secrets manager to manage credentials externally), or
Use the dbt Core CLI+ [BashOperator](https://registry.astronomer.io/providers/apache-airflow/modules/bashoperator) with Airflow (If you take this route, you can use an external secrets manager to manage credentials externally), or
Use the [KubernetesPodOperator](https://registry.astronomer.io/providers/kubernetes/modules/kubernetespodoperator) for each dbt job, as data teams have at places like [Gitlab](https://gitlab.com/gitlab-data/analytics/-/blob/master/dags/transformation/dbt_trusted_data.py#L72) and [Snowflake](https://www.snowflake.com/blog/migrating-airflow-from-amazon-ec2-to-kubernetes/).

Both approaches are equally valid; the right one will depend on the team and use case at hand.

| | Dependency management | Overhead | Flexibility | Infrastructure Overhead |
|---|---|---|---|---|
| dbt CLI + BashOperator | Medium | Low | Medium | Low |
| dbt Core CLI + BashOperator | Medium | Low | Medium | Low |
| Kubernetes Pod Operator | Very Easy | Medium | High | Medium |
| | | | | |

If you have DevOps resources available to you, and your team is comfortable with concepts like Kubernetes pods and containers, you can use the KubernetesPodOperator to run each job in a Docker image so that you never have to think about Python dependencies. Furthermore, you’ll create a library of images containing your dbt models that can be run on any containerized environment. However, setting up development environments, CI/CD, and managing the arrays of containers can mean a lot of overhead for some teams. Tools like the [astro-cli](https://github.com/astronomer/astro-cli) can make this easier, but at the end of the day, there’s no getting around the need for Kubernetes resources for the Gitlab approach.

If you’re just looking to get started or just don’t want to deal with containers, using the BashOperator to call the dbt CLI can be a great way to begin scheduling your dbt workloads with Airflow.
If you’re just looking to get started or just don’t want to deal with containers, using the BashOperator to call the dbt Core CLI can be a great way to begin scheduling your dbt workloads with Airflow.

It’s important to note that whichever approach you choose, this is just a first step; your actual production needs may have more requirements. If you need granularity and dependencies between your dbt models, like the team at [Updater does, you may need to deconstruct the entire dbt DAG in Airflow.](https://www.astronomer.io/guides/airflow-dbt#use-case-2-dbt-airflow-at-the-model-level) If you’re okay managing some extra dependencies, but want to maximize control over what abstractions you expose to your end users, you may want to use the [GoCardlessProvider](https://github.com/gocardless/airflow-dbt), which wraps the BashOperator and dbt CLI.
It’s important to note that whichever approach you choose, this is just a first step; your actual production needs may have more requirements. If you need granularity and dependencies between your dbt models, like the team at [Updater does, you may need to deconstruct the entire dbt DAG in Airflow.](https://www.astronomer.io/guides/airflow-dbt#use-case-2-dbt-airflow-at-the-model-level) If you’re okay managing some extra dependencies, but want to maximize control over what abstractions you expose to your end users, you may want to use the [GoCardlessProvider](https://github.com/gocardless/airflow-dbt), which wraps the BashOperator and dbt Core CLI.

#### Rerunning jobs from failure

Expand Down
2 changes: 1 addition & 1 deletion website/blog/2022-02-23-founding-an-AE-team-smartsheet.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ In the interest of getting a proof of concept out the door (I highly favor focus

- Our own Dev, Prod & Publish databases
- Our own code repository which we managed independently
- dbt CLI
- dbt Core CLI
- Virtual Machine running dbt on a schedule

None of us had used dbt before, but we’d heard amazing things about it. We hotly debated the choice between dbt and building our own lightweight stack, and looking back now, I couldn’t be happier with choosing dbt. While there was a learning curve that slowed us down initially, we’re now seeing the benefit of that decision. Onboarding new analysts is a breeze and much of the functionality we need is pre-built. The more we use the tool, the faster we are at using it and the more value we’re gaining from the product.
Expand Down
6 changes: 3 additions & 3 deletions website/blog/2022-07-26-pre-commit-dbt.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ The last step of our flow is to make those pre-commit checks part of the day-to-

Adding periodic pre-commit checks can be done in 2 different ways, through CI (Continuous Integration) actions, or as git hooks when running dbt locally

#### a) Adding pre-commit-dbt to the CI flow (works for dbt Cloud and dbt CLI users)
#### a) Adding pre-commit-dbt to the CI flow (works for dbt Cloud and dbt Core users)

The example below will assume GitHub actions as the CI engine but similar behavior could be achieved in any other CI tool.

Expand Down Expand Up @@ -237,9 +237,9 @@ With that information, I could now go back to dbt, document my model customers a

We could set up rules that prevent any change to be merged if the GitHub action fails. Alternatively, this action step can be defined as merely informational.

#### b) Installing the pre-commit git hooks (for dbt CLI users)
#### b) Installing the pre-commit git hooks (for dbt Core users)

If we develop locally with the dbt CLI, we could also execute `pre-commit install` to install the git hooks. What it means then is that every time we want to commit code in git, the pre-commit hooks will run and will prevent us from committing if any step fails.
If we develop locally with the dbt Core CLI, we could also execute `pre-commit install` to install the git hooks. What it means then is that every time we want to commit code in git, the pre-commit hooks will run and will prevent us from committing if any step fails.

If we want to commit code without performing all the steps of the pre-hook we could use the environment variable SKIP or the git flag `--no-verify` as described [in the documentation](https://pre-commit.com/#temporarily-disabling-hooks). (e.g. we might want to skip the auto `dbt docs generate` locally to prevent it from running at every commit and rely on running it manually from time to time)

Expand Down
2 changes: 1 addition & 1 deletion website/docs/dbt-cli/cli-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: "CLI overview"
description: "Run your dbt project from the command line."
---

dbt Core ships with a command-line interface (CLI) for running your dbt project. The dbt CLI is free to use and available as an [open source project](https://github.com/dbt-labs/dbt-core).
dbt Core ships with a command-line interface (CLI) for running your dbt project. dbt Core and its CLI are free to use and available as an [open source project](https://github.com/dbt-labs/dbt-core).

When using the command line, you can run commands and do other work from the current or _working directory_ on your computer. Before running the dbt project from the command line, make sure the working directory is your dbt project directory. For more details, see "[Creating a dbt project](/docs/build/projects)."

Expand Down
2 changes: 1 addition & 1 deletion website/docs/docs/build/jinja-macros.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ You can recognize Jinja based on the delimiters the language uses, which we refe

When used in a dbt model, your Jinja needs to compile to a valid query. To check what SQL your Jinja compiles to:
* **Using dbt Cloud:** Click the compile button to see the compiled SQL in the Compiled SQL pane
* **Using the dbt CLI:** Run `dbt compile` from the command line. Then open the compiled SQL file in the `target/compiled/{project name}/` directory. Use a split screen in your code editor to keep both files open at once.
* **Using dbt Core:** Run `dbt compile` from the command line. Then open the compiled SQL file in the `target/compiled/{project name}/` directory. Use a split screen in your code editor to keep both files open at once.

### Macros
[Macros](/docs/build/jinja-macros) in Jinja are pieces of code that can be reused multiple times – they are analogous to "functions" in other programming languages, and are extremely useful if you find yourself repeating code across multiple models. Macros are defined in `.sql` files, typically in your `macros` directory ([docs](/reference/project-configs/macro-paths)).
Expand Down
2 changes: 1 addition & 1 deletion website/docs/docs/build/tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
```
3. Check out the SQL dbt is running by either:
* **dbt Cloud:** checking the Details tab.
* **dbt CLI:** checking the `target/compiled` directory
* **dbt Core:** checking the `target/compiled` directory


**Unique test**
Expand Down
4 changes: 2 additions & 2 deletions website/docs/docs/deploy/deployment-tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,11 +108,11 @@ If your organization is using [Prefect](https://www.prefect.io/), the way you wi

## Dagster

If your organization is using [Dagster](https://dagster.io/), you can use the [dagster_dbt](https://docs.dagster.io/_apidocs/libraries/dagster-dbt) library to integrate dbt commands into your pipelines. This library supports the execution of dbt through dbt Cloud, dbt CLI and the dbt RPC server. Running dbt from Dagster automatically aggregates metadata about your dbt runs. Refer to the [example pipeline](https://dagster.io/blog/dagster-dbt) for details.
If your organization is using [Dagster](https://dagster.io/), you can use the [dagster_dbt](https://docs.dagster.io/_apidocs/libraries/dagster-dbt) library to integrate dbt commands into your pipelines. This library supports the execution of dbt through dbt Cloud, dbt Core, and the dbt RPC server. Running dbt from Dagster automatically aggregates metadata about your dbt runs. Refer to the [example pipeline](https://dagster.io/blog/dagster-dbt) for details.

## Kestra

If your organization uses [Kestra](http://kestra.io/), you can leverage the [dbt plugin](https://kestra.io/plugins/plugin-dbt) to orchestrate dbt Cloud and dbt Core jobs. Kestra's user interface (UI) has built-in [Blueprints](https://kestra.io/docs/user-interface-guide/blueprints), providing ready-to-use workflows. Navigate to the Blueprints page in the left navigation menu and [select the dbt tag](https://demo.kestra.io/ui/blueprints/community?selectedTag=36) to find several examples of scheduling dbt CLI commands and dbt Cloud jobs as part of your data pipelines. After each scheduled or ad-hoc workflow execution, the Outputs tab in the Kestra UI allows you to download and preview all dbt build artifacts. The Gantt and Topology view additionally render the metadata to visualize dependencies and runtimes of your dbt models and tests. The dbt Cloud task provides convenient links to easily navigate between Kestra and dbt Cloud UI.
If your organization uses [Kestra](http://kestra.io/), you can leverage the [dbt plugin](https://kestra.io/plugins/plugin-dbt) to orchestrate dbt Cloud and dbt Core jobs. Kestra's user interface (UI) has built-in [Blueprints](https://kestra.io/docs/user-interface-guide/blueprints), providing ready-to-use workflows. Navigate to the Blueprints page in the left navigation menu and [select the dbt tag](https://demo.kestra.io/ui/blueprints/community?selectedTag=36) to find several examples of scheduling dbt Core commands and dbt Cloud jobs as part of your data pipelines. After each scheduled or ad-hoc workflow execution, the Outputs tab in the Kestra UI allows you to download and preview all dbt build artifacts. The Gantt and Topology view additionally render the metadata to visualize dependencies and runtimes of your dbt models and tests. The dbt Cloud task provides convenient links to easily navigate between Kestra and dbt Cloud UI.

## Automation servers

Expand Down
2 changes: 1 addition & 1 deletion website/docs/faqs/Project/which-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ id: which-schema
---
By default, dbt builds models in your target schema. To change your target schema:
* If you're developing in **dbt Cloud**, these are set for each user when you first use a development environment.
* If you're developing with the **dbt CLI**, this is the `schema:` parameter in your `profiles.yml` file.
* If you're developing with **dbt Core**, this is the `schema:` parameter in your `profiles.yml` file.

If you wish to split your models across multiple schemas, check out the docs on [using custom schemas](/docs/build/custom-schemas).

Expand Down
2 changes: 1 addition & 1 deletion website/docs/faqs/Runs/checking-logs.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ To check out the SQL that dbt is running, you can look in:

* dbt Cloud:
* Within the run output, click on a model name, and then select "Details"
* dbt CLI:
* dbt Core:
* The `target/compiled/` directory for compiled `select` statements
* The `target/run/` directory for compiled `create` statements
* The `logs/dbt.log` file for verbose logging.
2 changes: 1 addition & 1 deletion website/docs/faqs/Runs/failed-tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ To debug a failing test, find the SQL that dbt ran by:
* dbt Cloud:
* Within the test output, click on the failed test, and then select "Details"

* dbt CLI:
* dbt Core:
* Open the file path returned as part of the error message.
* Navigate to the `target/compiled/schema_tests` directory for all compiled test queries

Expand Down
2 changes: 1 addition & 1 deletion website/docs/guides/advanced/using-jinja.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ If you'd like to work through this query, add [this CSV](https://github.com/dbt-

While working through the steps of this model, we recommend that you have your compiled SQL open as well, to check what your Jinja compiles to. To do this:
* **Using dbt Cloud:** Click the compile button to see the compiled SQL in the right hand pane
* **Using the dbt CLI:** Run `dbt compile` from the command line. Then open the compiled SQL file in the `target/compiled/{project name}/` directory. Use a split screen in your code editor to keep both files open at once.
* **Using dbt Core:** Run `dbt compile` from the command line. Then open the compiled SQL file in the `target/compiled/{project name}/` directory. Use a split screen in your code editor to keep both files open at once.

## Write the SQL without Jinja
Consider a data model in which an `order` can have many `payments`. Each `payment` may have a `payment_method` of `bank_transfer`, `credit_card` or `gift_card`, and therefore each `order` can have multiple `payment_methods`
Expand Down
16 changes: 8 additions & 8 deletions website/docs/guides/best-practices/debugging-errors.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Learning how to debug is a skill, and one that will make you great at your role!
- The `target/run` directory contains the SQL dbt executes to build your models.
- The `logs/dbt.log` file contains all the queries that dbt runs, and additional logging. Recent errors will be at the bottom of the file.
- **dbt Cloud users**: Use the above, or the `Details` tab in the command output.
- **dbt CLI users**: Note that your code editor _may_ be hiding these files from the tree <Term id="view" /> [VSCode help](https://stackoverflow.com/questions/42891463/how-can-i-show-ignored-files-in-visual-studio-code)).
- **dbt Core users**: Note that your code editor _may_ be hiding these files from the tree <Term id="view" /> [VSCode help](https://stackoverflow.com/questions/42891463/how-can-i-show-ignored-files-in-visual-studio-code)).
5. If you are really stuck, try [asking for help](/community/resources/getting-help). Before doing so, take the time to write your question well so that others can diagnose the problem quickly.


Expand Down Expand Up @@ -184,7 +184,7 @@ hello: world # this is not allowed

## Compilation Errors

_Note: if you're using the dbt Cloud IDE to work on your dbt project, this error often shows as a red bar in your command prompt as you work on your dbt project. For dbt CLI users, these won't get picked up until you run `dbt run` or `dbt compile`._
_Note: if you're using the dbt Cloud IDE to work on your dbt project, this error often shows as a red bar in your command prompt as you work on your dbt project. For dbt Core users, these won't get picked up until you run `dbt run` or `dbt compile`._


### Invalid `ref` function
Expand Down Expand Up @@ -228,7 +228,7 @@ To fix this:
- Use the error message to find your mistake

To prevent this:
- _(dbt CLI users only)_ Use snippets to auto-complete pieces of Jinja ([atom-dbt package](https://github.com/dbt-labs/atom-dbt), [vscode-dbt extestion](https://marketplace.visualstudio.com/items?itemName=bastienboutonnet.vscode-dbt))
- _(dbt Core users only)_ Use snippets to auto-complete pieces of Jinja ([atom-dbt package](https://github.com/dbt-labs/atom-dbt), [vscode-dbt extestion](https://marketplace.visualstudio.com/items?itemName=bastienboutonnet.vscode-dbt))

</details>

Expand Down Expand Up @@ -280,7 +280,7 @@ To fix this:
- Find the mistake and fix it

To prevent this:
- (dbt CLI users) Turn on indentation guides in your code editor to help you inspect your files
- (dbt Core users) Turn on indentation guides in your code editor to help you inspect your files
- Use a YAML validator ([example](http://www.yamllint.com/)) to debug any issues

</details>
Expand Down Expand Up @@ -341,10 +341,10 @@ Database Error in model customers (models/customers.sql)
90% of the time, there's a mistake in the SQL of your model. To fix this:
1. Open the offending file:
- **dbt Cloud:** Open the model (in this case `models/customers.sql` as per the error message)
- **dbt CLI:** Open the model as above. Also open the compiled SQL (in this case `target/run/jaffle_shop/models/customers.sql` as per the error message) — it can be useful to show these side-by-side in your code editor.
- **dbt Core:** Open the model as above. Also open the compiled SQL (in this case `target/run/jaffle_shop/models/customers.sql` as per the error message) — it can be useful to show these side-by-side in your code editor.
2. Try to re-execute the SQL to isolate the error:
- **dbt Cloud:** Use the `Preview` button from the model file
- **dbt CLI:** Copy and paste the compiled query into a query runner (e.g. the Snowflake UI, or a desktop app like DataGrip / TablePlus) and execute it
- **dbt Core:** Copy and paste the compiled query into a query runner (e.g. the Snowflake UI, or a desktop app like DataGrip / TablePlus) and execute it
3. Fix the mistake.
4. Rerun the failed model.
Expand All @@ -356,7 +356,7 @@ In some cases, these errors might occur as a result of queries that dbt runs "be
In these cases, you should check out the logs — this contains _all_ the queries dbt has run.
- **dbt Cloud**: Use the `Details` in the command output to see logs, or check the `logs/dbt.log` file
- **dbt CLI**: Open the `logs/dbt.log` file.
- **dbt Core**: Open the `logs/dbt.log` file.
:::tip Isolating errors in the logs
If you're hitting a strange `Database Error`, it can be a good idea to clean out your logs by opening the file, and deleting the contents. Then, re-execute `dbt run` for _just_ the problematic model. The logs will _just_ have the output you're looking for.
Expand All @@ -379,6 +379,6 @@ Using the `Preview` button is useful when developing models and you want to visu
We’ve all been there. dbt uses the last-saved version of a file when you execute a command. In most code editors, and in the dbt Cloud IDE, a dot next to a filename indicates that a file has unsaved changes. Make sure you hit `cmd + s` (or equivalent) before running any dbt commands — over time it becomes muscle memory.
### Editing compiled files
_(More likely for dbt CLI users)_
_(More likely for dbt Core users)_
If you just opened a SQL file in the `target/` directory to help debug an issue, it's not uncommon to accidentally edit that file! To avoid this, try changing your code editor settings to grey out any files in the `target/` directory — the visual cue will help avoid the issue.
Loading

0 comments on commit c8262e6

Please sign in to comment.