Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLx can't find applied migration due to collision between schema and username in search_path #3523

Open
GunnarMorrigan opened this issue Sep 25, 2024 · 1 comment
Labels
bug db:postgres Related to PostgreSQL migrations Proposals or bugs involving migrations

Comments

@GunnarMorrigan
Copy link

GunnarMorrigan commented Sep 25, 2024

Bug Description

When a user lets call them roster applies a migration that creates a schema equal to the username, roster, only the first migration will be applied correctly.

The second migration attempt SQLx will not be able to find the original migration table public._sqlx_migrations and attempts to apply the migration again. A new table is created under the schema and username roster but applying the migration actually fails as it has already been applied.

After some troubleshooting with @abonander it turns out that postgreSQL's search_path is "$user", public.
Due to the collision between the username and schema, the schema roster will be chosen to search for the migration table. As the schema did not exist before the first migration, the first migration ran successful.
https://www.postgresql.org/docs/current/ddl-schemas.html#DDL-SCHEMAS-PATH

Minimal Reproduction

A small code snippet or a link to a Github repo or Gist, with instructions on reproducing the bug.

  • Create a new DB with an owner called roster (can be anything but be consistent).
  • Create a migration that creates a schema with the same name as the user
  • Run the migration (success)
  • Run the migration (fail)

Info

  • SQLx version:
sqlx --version
sqlx-cli 0.8.2
  • SQLx features enabled:
sqlx = { version = "0.8", features = [
    "runtime-tokio",
    "tls-rustls",
    "chrono",
    "postgres",
    "derive",
    "macros",
    "migrate",
] }
  • Database server and version: Postgres
  • Operating system: windows and linux
  • rustc --version: rustc 1.80.1 (3f5fd8dd4 2024-08-06)
@abonander
Copy link
Collaborator

abonander commented Sep 25, 2024

only the first migration will be applied correctly.

More specifically, all migrations up through the one that creates the schema will succeed, but at that point the migration machinery will begin to behave incorrectly.

This is because the Postgres driver references the _sqlx_migrations table completely unqualified.

Prior to the creation of the new schema, any unqualified object reference like _sqlx_migrations will point to the public schema (because non-existent schemas in search_path are ignored). However, after the schema is created, it will point to (in this case) the roster schema because it comes before public in search_path.

For some reason, this change doesn't take effect immediately, so the first run of migrations will work correctly, but subsequent ones will fail.

I'm not actually sure why this works at all the first time around; I think it may be because the insert to _sqlx_migrations is executed as a cached prepared statement, which means the reference is resolved at parse time and doesn't change until the statement is re-prepared (which would only happen after the cache is explicitly cleared, or on a new connection). I think I would expect this change to invalidate the prepared statement or cause it to be transparently re-prepared, but for whatever reason it is not.

It's also possible that the change doesn't take effect until after the transaction is committed, which we wouldn't see when running just one migration.

Regardless, the next time migrations are run with a new connection, the change in resolution applies.

The migration machinery checks if _sqlx_migrations exists, and creates it if not, so we then end up with two migrations tables with different contents: public._sqlx_migrations, containing the knowledge of the run migrations, and roster._sqlx_migrations, which is empty.

Since the migrations machinery sees an empty table (roster._sqlx_migrations), it assumes no migrations have been run, and attempts to run them all over again.

Since the DDL in OP's case uses CREATE ... without IF NOT EXISTS, it fails immediately.

Failing fast is probably the best case scenario here, as otherwise the migration machinery could end up making duplicate, potentially destructive changes.


Potential Solutions

I'm wary of changing the default behavior, because I don't want to screw over anyone who might have run into this and just worked around it.

If we just changed the queries to reference public._sqlx_migrations, but then someone had intentionally changed their default schema, we'd end up with the exact same problem, but reversed: the _sqlx_migrations in their custom schema contains the correct information, but we'd see public._sqlx_migrations is missing or empty and try to re-run migrations.

Being able to explicitly rename or relocate _sqlx_migrations is a goal of #3383. That would certainly be a solution for this.

However, I think we can, and should, detect when this might be happening and bail out before doing the wrong thing.

Amongst a bunch of other very interesting functions (including some that might help improve the null inference in the macros, like pg_get_expr), Postgres has one that's potentially useful to us here: https://www.postgresql.org/docs/current/functions-info.html

current_schemas ( include_implicit boolean ) → name[]

Returns an array of the names of all schemas presently in the effective search path, in their priority order. (Items in the current search_path setting that do not correspond to existing, searchable schemas are omitted.) If the Boolean argument is true, then implicitly-searched system schemas such as pg_catalog are included in the result.

This is essentially the same as SHOW search_path, but it filters out invalid schemas just like object resolution does.

We can query this in Migrate::ensure_migrations_table():

fn ensure_migrations_table(&mut self) -> BoxFuture<'_, Result<(), MigrateError>> {

We'd then check if _sqlx_migrations exists in each schema returned:

  • If it exists in the first schema returned, (public or otherwise), then do nothing. Resolution of the name should work fine as-is.
  • If it exists in none of the schemas returned, create the table with the unqualified name. Resolution of the name should work fine as-is, and if it changes later, we can catch it.
  • If it does not exist in the first schema, but does in one of the other ones, bail with an error:
Cannot migrate: `_sqlx_migrations` does not exist in the default schema `<schema name>`, 
but does exist at `<path 1>` [, `<path 2>`, ...].

This suggests that the `search_path` for the current user or database has changed since migrations were created.
This may happen explicitly, or implicitly if a schema is created with the same name as the user (<https://www.postgresql.org/docs/current/ddl-schemas.html#DDL-SCHEMAS-PATH>).

Since SQLx cannot know if any of the existing `_sqlx_migrations` tables is correct for the current context, 
migration cannot continue.

To resolve this ambiguity, either change the default `search_path` for this database (using `ALTER DATABASE`), 
or create `<schema name>._sqlx_migrations` with the correct set of migrations.

See issue <https://github.com/launchbadge/sqlx/issues/3523> for details.

Bikeshedding: should we bail if a table named _sqlx_migrations exists in any schema we have access to, regardless of whether it exists in the search path? This would catch if a schema is removed from search_path, but is perhaps a bit of an overreach.

We could also be smarter in generating the error if we actually check if the contents of any of the _sqlx_migrations tables matches the resolved migrations, but I think it's unlikely that there'd be more than one candidate unless someone was going really crazy with schemas and multiple apps.

After #3383, if migrate.table-name is explicitly specified then this check doesn't need to happen, as the user has taken responsibility here.

@abonander abonander added db:postgres Related to PostgreSQL migrations Proposals or bugs involving migrations labels Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug db:postgres Related to PostgreSQL migrations Proposals or bugs involving migrations
Projects
None yet
Development

No branches or pull requests

2 participants