Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add duckdb build/concepts and use SQLGlot to convert BigQuery SQL into other dialects #1689

Merged
merged 47 commits into from
Feb 20, 2024

Conversation

alistairewj
Copy link
Member

@alistairewj alistairewj commented Jan 6, 2024

A few things in this PR, mainly to get the concepts to work with duckdb. This builds off #1529 / @SphtKr. However #1529 has the conversion for MIMIC-III, whereas here I've converted MIMIC-IV. It should be straightforward to adapt for MIMIC-III, but I can't explain the parsing errors I'm getting when running it over that folder, so I've left it for a future PR.

  • Added build scripts for mimic-iii in duckdb
  • Added build scripts for mimic-iv in duckdb
  • Added a mimic_utils package that converts files/folders between SQL dialects.
  • Use this package to re-convert the postgresql scripts, and remove the old bash script system for MIMIC-IV
  • Moved the mapping folder from mimic-iv/concepts/mapping to mimic-iv/mapping

I also overhauled the READMEs a bit for clarity.

SphtKr and others added 30 commits December 17, 2023 19:46
…es without a large amount of RAM, and this way it can be skipped with `--skip-indexes`
@alistairewj alistairewj changed the title Use SQLGlot to convert BigQuery SQL into other dialects Add duckdb build/concepts and use SQLGlot to convert BigQuery SQL into other dialects Jan 6, 2024
Copy link
Member

@tompollard tompollard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good to me, with the caveat that I haven't run the code! (and that breaking this into smaller PRs would be nicer). I added a couple of notes about typos but all very minor, so feel free to go ahead and merge. The sqlglot package looks very cool.

How much manual tweaking of the outputs is needed?

Ideally I think we would maintain a single source format (psql/bigquery) and then create other dialects in a GitHub workflow. My preference would be to have dialects on separate branches, rather than trying to maintain them all on main.

mimic-iii/buildmimic/duckdb/README.md Outdated Show resolved Hide resolved
mimic-iii/buildmimic/duckdb/README.md Outdated Show resolved Hide resolved
mimic-iii/buildmimic/duckdb/README.md Outdated Show resolved Hide resolved
mimic-iii/buildmimic/duckdb/duckdb_add_tables.sql Outdated Show resolved Hide resolved
mimic-iii/buildmimic/duckdb/duckdb_checks.sql Show resolved Hide resolved
mimic-iii/buildmimic/duckdb/duckdb_add_indexes.sql Outdated Show resolved Hide resolved
mimic-iv/buildmimic/duckdb/README.md Outdated Show resolved Hide resolved
mimic-iv/concepts_duckdb/README.md Outdated Show resolved Hide resolved
@alistairewj
Copy link
Member Author

Thanks for the review! Fixed the typos; regarding some unaddressed points:

  • Consistent formatting - sqlfluff does this now via the action, but it only lints newly changed files.
  • Renaming SQL to remove db name - for mimic-iv I did this, for mimic-iii I left all the scripts with their dialect prefixes (postgres_ ..., duckdb_ ...).
  • We don't need to manually tweak any outputs for MIMIC-IV (though this is because there are Python patches written to parse some unsupported BigQuery functions). For MIMIC-III, I didn't put in the effort to make it all work, but it mostly works I think.

@alistairewj alistairewj merged commit b9ed7a3 into main Feb 20, 2024
1 check passed
@tompollard tompollard deleted the duckdb_concepts branch February 20, 2024 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants