Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sqlx's AnyConnection to allow users at runtime to select which db they want to use #239

Closed
vgarleanu opened this issue Oct 20, 2021 · 12 comments
Assignees
Labels
db-postgres Anything to do with postgres db-sqlite Anything to do with sqlite out-of-scope Feature is out of scope for now. P-medium Medium priority item. T-improvements Nothing new, just refactoring and touching some things up

Comments

@vgarleanu
Copy link
Member

By using AnyConnection we would allow users to pick what database backend they want to use at runtime. The user would simply need to edit the database URL in the config file. For this all to work we'll have to alter the connection creation logic a bit, and remove the hardcoded paths to the database.

@vgarleanu vgarleanu added db-postgres Anything to do with postgres db-sqlite Anything to do with sqlite T-improvements Nothing new, just refactoring and touching some things up P-medium Medium priority item. labels Oct 20, 2021
@vgarleanu vgarleanu self-assigned this Oct 20, 2021
@rich-murphey
Copy link
Contributor

I tried to test using AnyConnection, and it looks like there is an issue for the query!() macros.

Given an .env file:

DATABASE_URL=sqlite:///tmp/any.db

compiling this code:

pub async fn get_username(pool: &AnyPool) -> Result<String, sqlx::Error> {
        sqlx::query!(" SELECT username from users ")
            .fetch_one(pool)
            .await

produces the error:

38 |             .fetch_one(pool)
   |              ^^^^^^^^^ expected struct `sqlx::Any`, found struct `sqlx::Sqlite`

sqlx has an open issue that discusses this:
launchbadge/sqlx#964

Sounds like they have a plan, but it hasn't been implemented as far as I can tell.

@rich-murphey
Copy link
Contributor

It looks like sqlx macros won't support AnyConnection in the next release.

It would be possible to choose the database at compile time. I'd be glad to help with the compile time option that if there's interest.

@vgarleanu
Copy link
Member Author

vgarleanu commented Jan 10, 2022

We should find a way to support multiple databases at runtime. Supporting it at compile time is too inflexible for our users. I mainly decided to use sqlx for the async database interface and not really because of the macros. In fact the macro is broken in certain cases, and there we are forced to avoid the macro altogether.

While not ideal, I wouldnt mind completely omitting the use of the query macros in favour of the query functions. I am pretty sure that should work at the cost of compile-time type-checking and validation of the SQL queries.

@rich-murphey
Copy link
Contributor

I could help with the non-macro version of queries as well if needed. One option that I've ended up using to test non-macro queries, is regression tests on HTTP api requests. That is, diff the output of the HTTP API endpoints to verify the query results match previous results.

@vgarleanu
Copy link
Member Author

That would be great. Heres how I think we should approach this.

In the config file we default the database to sqlite://./config/dim.db and at runtime when we initialise the database crate statics we parse the URI scheme and accordingly flag which queries we will use. For good housekeeping we will probably want to have a backends module with sqlite, postgres and eventually mysql submodules containing our subqueries, which we will call from our top-level modules like Media, Library, etc. This approach is very similar to how libstd has interfaces for different operating systems. The only difference is that we include all the submodules and at runtime decide which submodule we will use against our DB.

The hardest part here will be testing this, and specifically creating a robust test suite and enivronment. I think warp comes with some testing utilities which will help us here, so all thats left is generating various testing scenarios with useful initial data, and ensuring tests dont collide or race.

The most time consuming tasks here will definitely be creating the queries (each DBMS has slightly different queries) and writing the tests well enough that they can confidently confirm that all the queries exhibit the same behaviour.

@rich-murphey
Copy link
Contributor

If I'm understanding correctly, the backends module would depend on sqlx and use types sepecific to each of sqlite, postgres and mysql. The current database module would depend on sqlx using only the Any* types.

For testing, I wonder what would be the most maintainable way to load initial data for testing: using sql to load it, or writing application code to load it. Sql for migrations is needed regardless, and writing more Sql to load data looks attractive in terms of writing less application code. it would duplicate the data, once for each database; however, testing could ensure equivalence of the data, given test coverage of the queries. Focusing on testing the equivalence of the queries sounds like it should be something to focus on initially.

Is there any initial data available? A postgres dump of some limited data set? Even just a subset to get started working on just a few queries would be enough for the moment.

@vgarleanu
Copy link
Member Author

vgarleanu commented Jan 12, 2022

If I'm understanding correctly, the backends module would depend on sqlx and use types sepecific to each of sqlite, postgres and mysql. The current database module would depend on sqlx using only the Any* types.

We would still depend on sqlx but we'll essentially have to have in some cases multiple queries that query the same data but are DBMS specific. Some queries are simple, like simple SELECTS. Those wont need to be touched at all as theyre most likely compatible with postgres. But other queries like the search one could be optimized with postgres specific extensions. As such we'd have two different versions of the search query, one for sqlite and one for postgres.

For testing, I wonder what would be the most maintainable way to load initial data for testing: using sql to load it, or writing application code to load it. Sql for migrations is needed regardless, and writing more Sql to load data looks attractive in terms of writing less application code. it would duplicate the data, once for each database; however, testing could ensure equivalence of the data, given test coverage of the queries. Focusing on testing the equivalence of the queries sounds like it should be something to focus on initially.

I think writing application code is option that is least likely to break silently. Its a lot easier to debug and write rust than SQL in my opinion. We'd basically have to come up with testing scenarios which contain test data, and we insert those into a temporary database that lives as long as the test env does. Then we run our queries or APIs on it.

Another way we could do it is by writing a full-blown integration test with essentially dummy files. I've done this before when benchmarking the scanner. Essentially I could supply a dump of a filesystem containing media files from various sources, and we can go over this FS-dump and rebuild it by essentially linking each mediafile to a dummy mkv or mp4. In this particular case we dont really care about video metadata, so using the same file should be fine. The only important thing here is the filenames. We run the scanner and then we essentially get a database of data that is basically as close as it gets to something that'd be in production.

This way we dont have to fiddle around and add data in manually, or hardcode lots of data entries in our code that we then insert into our db.

Is there any initial data available? A postgres dump of some limited data set? Even just a subset to get started working on just a few queries would be enough for the moment.

No postgres dump, honestly im not even sure if our migrations for postgres still work. Ill take a look at them over the coming weekend hopefully. That said I would be happy to provide a dump of my entire media collection, which should contain more than enough info for testing.

@rich-murphey
Copy link
Contributor

Thanks for describing the mock media files, that sounds far better, esp in terms of tests covering more code. I'm a fan of integration tests, as well as regression tests by diffing output data vs previous known good output. Thanks very much for the pointers, this helps tremendously. I need to read code for a bit before I'm productive, so it may be several days for me to do that.

@cobyge
Copy link
Contributor

cobyge commented May 15, 2022

I'm thinking of looking into this. Any idea if @rich-murphey is still planning on working on this?

@vgarleanu
Copy link
Member Author

Doesn't look like it.

That said I'm thinking of closing this issue or at least marking this as out-of-scope for now. The main reason for this is that maintaining multiple queries for different database backends is tedious, and imo it is not worth it for now. I think sqlite is more than good enough for 99% of use-cases. There is still a lot potential to squeeze out of sqlite3 which would improve overall performance of the webui, as well as the scanners.

If you need a non-sqlite backend for HA, you can use something like Litestream.

@vgarleanu vgarleanu added the out-of-scope Feature is out of scope for now. label May 15, 2022
@rich-murphey
Copy link
Contributor

Hey guys, sorry I have not been active about this issue. I agree that sqlite is a very good fit, and significantly simplifies development on-boarding. And as @vgarleanu says, to instead pursue sqlite specific optimizations would have performance benefits. Please feel free proceed without me, and I'm glad to help out at some point in the future. Best regards.

@vgarleanu
Copy link
Member Author

Closing this issue until we figure out exactly what our bottlenecks with sqlite would be and whether we'd see any gains in UX from switching to another DB. At the moment I believe there would be no gains from using a different DB backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
db-postgres Anything to do with postgres db-sqlite Anything to do with sqlite out-of-scope Feature is out of scope for now. P-medium Medium priority item. T-improvements Nothing new, just refactoring and touching some things up
Projects
None yet
Development

No branches or pull requests

3 participants