Skip to content

Latest commit

 

History

History
79 lines (56 loc) · 3.07 KB

README.md

File metadata and controls

79 lines (56 loc) · 3.07 KB

bq2dbt

This script is used to generate the explicit SQL file (explicit = don't use select star) from an existing table in BigQuery, as well as the dbt model YAML definition including:

  • contract enforcement
    • with column types
    • with description import
    • with not_null contraints
    • with primary key if any
    • with handling of the ARRAY/STRUCT types

Output is generated in the ./target/bq2dbt/{project}/{dataset} folder from where you run the script.

You can change it with CLI argument --output.

Disclaimer

I know this looks like something the codegen dbt package could do, but it was faster for me to develop this rather than contribute both to the BigQuery adapter in dbt and the codegen package (this does not mean that I won't).

We used this script to convert an existing project with missing contracts and with SELECT * statements to something for robust and implicit, and with existing tables in BigQuery. This is the main purpose of this.

Installation

From source

pip install .

From Github

# Latest version
pip install git+https://github.com/H-Max/bq2dbt.git

# Pinned tag/version
pip install git+https://github.com/H-Max/bq2dbt.git@v0.2.5

How to run it ?

Just run the script with complete path of your BigQuery table or dataset.

When using a dataset only, it will generate SQL and YAML files for all found tables in the dataset.

Examples:

# Single table
bq2dbt myproject.mydataset.mytable
 
# Complete dataset
bq2dbt myproject.mydataset

CLI arguments

Option Description
-l, --lower Output type names as lowercase in YAML file
--snake Convert field names to snake_case
--prefix Prefix to add to columns names (default: None)
--suffix Suffix to add to column names (default: None)
--output Destination folder for scripts. (default: target/bq2dbt)
--empty_description Add empty description property to YAML file if field description is empty (placeholder)
--tabs Use tabs instead of 4 spaces in SQL file indentation
--leading_comma Put comma at the start of line in SQL file column list instead of end of line

TODO

  • Error handling
  • Unit testing
  • Merging with existing yaml definition files
  • Generate the files for a complete dataset rather than a single table
  • Support for clustering
  • Support for time partitioning
  • Support for range paritioning
  • Option to output to stdout
    • With the option to select SQL or YAML file only