research Dolt and DoltHub #39

aspiers · 2023-03-26T20:03:49Z

I have discovered https://github.com/dolthub/dolt and DoltHub which look like very promising technologies. Perhaps they could be used as an alternative way of storing the data contained here. Certainly a full-blown SQL database would bring a ton of power and flexibility to this project, and when combined with a GitHub-like collaboration model with pull requests, it could be perfect.

aspiers · 2023-03-26T20:05:16Z

See also my "git sourcing" idea: githubocto/flat#64

strk · 2024-07-14T05:58:30Z

I'm not convinced a new format is needed. The biggest value of a project like this is the stability of the format specification, in that it allows multiple projects to depend on it, supporting read and write. Unless I misunderstood what dOLt is about (it isn't clear to me).

Note PostgreSQL supports querying CSV files as if they were tables in the database via Foreign Data Wrappers:
https://www.postgresql.org/docs/current/file-fdw.html
See for example https://gist.github.com/NikolayS/a819f139c37e0d54ad4a4ca70764f225

aspiers · 2024-07-15T17:28:15Z

The stability of the schema is largely orthogonal to whether we use CSV or an RDBMS like Dolt. E.g. we could use CSV but create undesirable instability by regularly reordering/renaming columns etc. Or we could use an RDBMS and keep it very stable by never changing the schema.

The main attractions of Dolt are due to RDBMS being much more flexible than CSV, e.g.

stricter data types and hence validation (e.g. forcing page number fields to be numbers)
enforcement of mandatory vs. optional fields
multiple tables with foreign keys defining relationships between them (which helps with things like Embed book identification information in index header #46)
easier to address extend schema #12
sophisticated query engine for free
easy export to CSV would allow backwards compatibility with this repo
bidirectional syncing is feasible too

Also DoltHub gives us a very nice frontend for free which is specifically designed for decentralized collaboration on data sets, unlike GitHub.

But I admit it would be an increase in complexity too. Another option is introducing CI which does validation on the existing data.

aspiers mentioned this issue Mar 26, 2023

extend schema #12

Open

6 tasks

aspiers mentioned this issue Jul 13, 2024

Embed book identification information in index header #46

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research Dolt and DoltHub #39

research Dolt and DoltHub #39

aspiers commented Mar 26, 2023

aspiers commented Mar 26, 2023

strk commented Jul 14, 2024

aspiers commented Jul 15, 2024

research Dolt and DoltHub #39

research Dolt and DoltHub #39

Comments

aspiers commented Mar 26, 2023

aspiers commented Mar 26, 2023

strk commented Jul 14, 2024

aspiers commented Jul 15, 2024