Add support for asynchronous creation/removal of indexes

fatkodima · Jan 18, 2024 · ac05d84 · ac05d84
1 parent 006ca82
commit ac05d84
Show file tree

Hide file tree

Showing 40 changed files with 1,457 additions and 131 deletions.
diff --git a/.rubocop.yml b/.rubocop.yml
@@ -118,6 +118,7 @@ Naming/FileName:
   Exclude:
     - lib/online_migrations/version.rb
     - test/support/schema.rb
+    - test/support/models.rb
     - test/support/db/**
     - test/test_helper.rb
     - gemfiles/**.gemfile

diff --git a/.yardopts b/.yardopts
@@ -2,4 +2,4 @@
 --fail-on-warning
 --markup markdown
 lib/**/**.rb
-- README.md docs/background_migrations.md docs/configuring.md
+- README.md docs/*
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,9 @@
 ## master (unreleased)
 
+- Add support for asynchronous creation/removal of indexes
+
+    See `docs/background_schema_migrations.md` for the feature description.
+
 - Remove potentially heavy queries used to get the ranges of a background migration
 
 ## 0.12.0 (2024-01-18)

diff --git a/README.md b/README.md
@@ -40,11 +40,11 @@ $ bin/rails generate online_migrations:install
 $ bin/rails db:migrate
 ```
 
-**Note**: If you do not have plans on using [background migrations](docs/background_migrations.md) feature, then you can delete the generated migration and regenerate it later, if needed.
+**Note**: If you do not have plans on using [background data migrations](docs/background_data_migrations.md) or [background schema migrations](docs/background_schema_migrations.md) features, then you can delete the generated migration and regenerate it later, if needed.
 
 ### Upgrading
 
-If you're already using [background migrations](docs/background_migrations.md), your background migrations tables may require additional columns. After every upgrade run:
+If you're already using [background data migrations](docs/background_data_migrations.md) or [background schema migrations](docs/background_schema_migrations.md), your background migrations tables may require additional columns. After every upgrade run:
 
 ```sh
 $ bin/rails generate online_migrations:upgrade
@@ -285,7 +285,7 @@ end
 ```
 
 **Note**: If you forget `disable_ddl_transaction!`, the migration will fail.
-**Note**: You may consider [background migrations](#background-migrations) to run data changes on large tables.
+**Note**: You may consider [background data migrations](#background-data-migrations) or [background schema migrations](#background-schema-migrations) to run data changes on large tables.
 
 ### Changing the type of a column
 
@@ -1228,9 +1228,13 @@ Certain methods like `execute` and `change_table` cannot be inspected and are pr
 
 Read [configuring.md](docs/configuring.md).
 
-## Background Migrations
+## Background Data Migrations
 
-Read [background_migrations.md](docs/background_migrations.md) on how to perform data migrations on large tables.
+Read [background_data_migrations.md](docs/background_data_migrations.md) on how to perform data migrations on large tables.
+
+## Background Schema Migrations
+
+Read [background_schema_migrations.md](docs/background_schema_migrations.md) on how to perform background schema migrations on large tables.
 
 ## Credits
 
@@ -1295,7 +1299,7 @@ The main differences are:
      * adding different types of constraints
      * and others
 
-2. This gem has a [powerful internal framework](https://github.com/fatkodima/online_migrations/blob/master/docs/background_migrations.md) for running data migrations on very large tables using background migrations.
+2. This gem has a powerful internal framework for running [data migrations](docs/background_data_migrations.md) and [schema migrations](docs/background_schema_migrations.md) on very large tables in background.
 
    For example, you can use background migrations to migrate data that’s stored in a single JSON column to a separate table instead; backfill values from one column to another (as one of the steps when changing column type); or backfill some column’s value from an API.
 

diff --git a/docs/background_migrations.md → docs/background_data_migrations.md b/docs/background_migrations.md → docs/background_data_migrations.md
@@ -1,4 +1,4 @@
-# Background Migrations
+# Background Data Migrations
 
 When a project grows, your database starts to be heavy and changing the data through the deployment process can be very painful.
 
@@ -246,20 +246,6 @@ migration.update!(
 )
 ```
 
-### Throttling
-
-Background Migrations often modify a lot of data and can be taxing on your database. There is a throttling mechanism that can be used to throttle a background migration when a given condition is met. If a migration is throttled, it will be interrupted and retried on the next Scheduler cycle run.
-
-Specify the throttle condition as a block:
-
-```ruby
-# config/initializers/online_migrations.rb
-
-OnlineMigrations.config.background_migrations.throttler = -> { DatabaseStatus.unhealthy? }
-```
-
-Note that it's up to you to define a throttling condition that makes sense for your app. For example, you can check various PostgreSQL metrics such as replication lag, DB threads, whether DB writes are available, etc.
-
 ### Customizing the error handler
 
 Exceptions raised while a Background Migration is performing are rescued and information about the error is persisted in the database.
@@ -294,21 +280,6 @@ OnlineMigrations.config.background_migrations.migrations_module = "BackgroundMig
 
 If no value is specified, it will default to `OnlineMigrations::BackgroundMigrations`.
 
-### Customizing the backtrace cleaner
-
-`OnlineMigrations.config.background_migrations.backtrace_cleaner` can be configured to specify a backtrace cleaner to use when a Background Migration errors and the backtrace is cleaned and persisted. An `ActiveSupport::BacktraceCleaner` should be used.
-
-```ruby
-# config/initializers/online_migrations.rb
-
-cleaner = ActiveSupport::BacktraceCleaner.new
-cleaner.add_silencer { |line| line =~ /ignore_this_dir/ }
-
-OnlineMigrations.config.background_migrations.backtrace_cleaner = cleaner
-```
-
-If none is specified, the default `Rails.backtrace_cleaner` will be used to clean backtraces.
-
 ### Multiple databases and sharding
 
 If you have multiple databases or sharding, you may need to configure where background migrations related tables live
@@ -318,10 +289,10 @@ by configuring the parent model:
 # config/initializers/online_migrations.rb
 
 # Referring to one of the databases
-OnlineMigrations::BackgroundMigrations::ApplicationRecord.connects_to database: { writing: :animals }
+OnlineMigrations::ApplicationRecord.connects_to database: { writing: :animals }
 
 # Referring to one of the shards (via `:database` option)
-OnlineMigrations::BackgroundMigrations::ApplicationRecord.connects_to database: { writing: :shard_one }
+OnlineMigrations::ApplicationRecord.connects_to database: { writing: :shard_one }
 ```
 
 By default, ActiveRecord uses the database config named `:primary` (if exists) under the environment section from the `database.yml`.

diff --git a/docs/background_schema_migrations.md b/docs/background_schema_migrations.md
@@ -0,0 +1,164 @@
+# Background Schema Migrations
+
+When a project grows, your database starts to be heavy and performing schema changes through the deployment process can be very painful.
+
+E.g., for very large tables, index creation can be a challenge to manage. While adding indexes `CONCURRENTLY` creates indexes in a way that does not block ordinary traffic, it can still be problematic when index creation runs for many hours. Necessary database operations like autovacuum cannot run, and the deployment process is usually blocked waiting for index creation to finish.
+
+**Note**: You probably don't need to use this feature for smaller projects, since performing schema changes directly on smaller databases will be perfectly fine and will not block the deployment too much.
+
+## Installation
+
+Make sure you have migration files generated when installed this gem:
+
+```sh
+$ bin/rails generate online_migrations:install
+```
+
+Start a background migrations scheduler. For example, to run it on cron using [whenever gem](https://github.com/javan/whenever) add the following lines to its `schedule.rb` file:
+
+```ruby
+every 1.minute do
+  runner "OnlineMigrations.run_background_schema_migrations"
+end
+```
+
+or run it manually when the deployment is finished, from the rails console:
+
+```rb
+[production] (main)> OnlineMigrations.run_background_schema_migrations
+```
+
+**Note**: Scheduler will perform only one migration at a time, to not load the database too much. If you enqueued multiple migrations or a migration for multiple shards, you need to call this method a few times.
+
+**Note**: Make sure that the process that runs the scheduler does not die until the migration is finished.
+
+## Enqueueing a Background Schema Migration
+
+Currently, only helpers for adding/removing indexes are provided.
+
+Background schema migrations should be performed in 2 steps:
+
+1. Create a PR that schedules the index to be created/removed
+2. Verify that the PR was deployed and that the index was actually created/removed on production.
+  Create a follow-up PR with a regular migration that creates/removes an index synchronously (will be a no op when run on production) and commit the schema changes for `schema.rb`/`structure.sql`
+
+To schedule an index creation:
+
+```ruby
+# db/migrate/xxxxxxxxxxxxxx_add_index_to_users_email_in_background.rb
+def up
+  add_index_in_background(:users, :email, unique: true)
+end
+```
+
+To schedule an index removal:
+
+```ruby
+# db/migrate/xxxxxxxxxxxxxx_remove_index_from_users_email_in_background.rb
+def up
+  remove_index_in_background(:users, name: "index_users_on_email")
+end
+```
+
+`add_index_in_background`/`remove_index_in_background` accept additional configuration options which controls how the background schema migration is run. Check the [source code](https://github.com/fatkodima/online_migrations/blob/master/lib/online_migrations/background_schema_migrations/migration_helpers.rb) for the list of all available configuration options.
+
+## Instrumentation
+
+Background schema migrations use the [ActiveSupport::Notifications](http://api.rubyonrails.org/classes/ActiveSupport/Notifications.html) API.
+
+You can subscribe to `background_schema_migrations` events and log it, graph it, etc.
+
+To get notified about specific type of events, subscribe to the event name followed by the `background_schema_migrations` namespace. E.g. for retries use:
+
+```ruby
+# config/initializers/online_migrations.rb
+ActiveSupport::Notifications.subscribe("retried.background_schema_migrations") do |name, start, finish, id, payload|
+  # background schema migration object is available in payload[:background_schema_migration]
+
+  # Your code here
+end
+```
+
+If you want to subscribe to every `background_schema_migrations` event, use:
+
+```ruby
+# config/initializers/online_migrations.rb
+ActiveSupport::Notifications.subscribe(/background_schema_migrations/) do |name, start, finish, id, payload|
+  # background schema migration object is available in payload[:background_schema_migration]
+
+  # Your code here
+end
+```
+
+Available events:
+
+* `started.background_schema_migrations`
+* `run.background_schema_migrations`
+* `completed.background_schema_migrations`
+* `retried.background_schema_migrations`
+* `throttled.background_schema_migrations`
+
+## Monitoring Background Schema Migrations
+
+Background Schema Migrations can be in various states during its execution:
+
+* **enqueued**: A migration has been enqueued by the user.
+* **running**: A migration is being performed by a migration executor.
+* **failing**: A migration raised an exception during last run (or last retry) and will be retried.
+* **failed**: A migration raises an exception when running and won't be retried anymore.
+* **succeeded**: A migration finished without error.
+
+## Configuring
+
+There are a few configurable options for the Background Schema Migrations. Custom configurations should be placed in a `online_migrations.rb` initializer.
+
+Check the [source code](https://github.com/fatkodima/online_migrations/blob/master/lib/online_migrations/background_schema_migrations/config.rb) for the list of all available configuration options.
+
+**Note**: You can dynamically change certain migration parameters while the migration is run.
+For example,
+```ruby
+migration = OnlineMigrations::BackgroundSchemaMigrations::Migration.find(id)
+migration.update!(
+  statement_timeout: 2.hours,  # The statement timeout value used when running the migration
+  max_attempts: 10             # The # of attempts the failing migration will be retried
+)
+```
+
+### Customizing the error handler
+
+Exceptions raised while a Background Schema Migration is performing are rescued and information about the error is persisted in the database.
+
+If you want to integrate with an exception monitoring service (e.g. Bugsnag), you can define an error handler:
+
+```ruby
+# config/initializers/online_migrations.rb
+
+OnlineMigrations.config.background_schema_migrations.error_handler = ->(error, errored_migration) do
+  Bugsnag.notify(error) do |notification|
+    notification.add_metadata(:background_schema_migration, { name: errored_migration.name })
+  end
+end
+```
+
+The error handler should be a lambda that accepts 2 arguments:
+
+* `error`: The exception that was raised.
+* `errored_migration`: An `OnlineMigrations::BackgroundSchemaMigrations::Migration` object that represents a failed migration.
+
+### Multiple databases and sharding
+
+If you have multiple databases or sharding, you may need to configure where background migrations related tables live
+by configuring the parent model:
+
+```ruby
+# config/initializers/online_migrations.rb
+
+# Referring to one of the databases
+OnlineMigrations::ApplicationRecord.connects_to database: { writing: :animals }
+
+# Referring to one of the shards (via `:database` option)
+OnlineMigrations::ApplicationRecord.connects_to database: { writing: :shard_one }
+```
+
+By default, ActiveRecord uses the database config named `:primary` (if exists) under the environment section from the `database.yml`.
+Otherwise, the first config under the environment section is used.
diff --git a/docs/configuring.md b/docs/configuring.md
@@ -214,6 +214,35 @@ Add to an initializer file:
 config.auto_analyze = true
 ```
 
+## Throttling
+
+Background data and schema migrations can be taxing on your database. There is a throttling mechanism that can be used to throttle a background migration when a given condition is met. If a migration is throttled, it will be interrupted and retried on the next Scheduler cycle run.
+
+Specify the throttle condition as a block:
+
+```ruby
+# config/initializers/online_migrations.rb
+
+OnlineMigrations.config.throttler = -> { DatabaseStatus.unhealthy? }
+```
+
+**Note**: It's up to you to define a throttling condition that makes sense for your app. For example, you can check various PostgreSQL metrics such as replication lag, DB threads, whether DB writes are available, etc.
+
+## Customizing the backtrace cleaner
+
+`OnlineMigrations.config.backtrace_cleaner` can be configured to specify a backtrace cleaner to use when a background data or schema migration errors and the backtrace is cleaned and persisted. An `ActiveSupport::BacktraceCleaner` should be used.
+
+```ruby
+# config/initializers/online_migrations.rb
+
+cleaner = ActiveSupport::BacktraceCleaner.new
+cleaner.add_silencer { |line| line =~ /ignore_this_dir/ }
+
+OnlineMigrations.config.backtrace_cleaner = cleaner
+```
+
+If none is specified, the default `Rails.backtrace_cleaner` will be used to clean backtraces.
+
 ## Schema Sanity
 
 Columns can flip order in `db/schema.rb` when you have multiple developers. One way to prevent this is to [alphabetize them](https://www.pgrs.net/2008/03/12/alphabetize-schema-rb-columns/).

diff --git a/lib/generators/online_migrations/templates/create_background_schema_migrations.rb.tt b/lib/generators/online_migrations/templates/create_background_schema_migrations.rb.tt
@@ -0,0 +1,29 @@
+class CreateBackgroundSchemaMigrations < <%= migration_parent %>
+  def change
+    # You can remove this migration for now and regenerate it later if you do not have plans
+    # to use background schema migrations, like adding indexes in the background.
+    create_table :background_schema_migrations do |t|
+      t.bigint :parent_id
+      t.string :migration_name, null: false
+      t.string :table_name, null: false
+      t.string :definition, null: false
+      t.string :status, default: "enqueued", null: false
+      t.string :shard
+      t.boolean :composite, default: false, null: false
+      t.integer :statement_timeout
+      t.datetime :started_at
+      t.datetime :finished_at
+      t.integer :max_attempts, null: false
+      t.integer :attempts, default: 0, null: false
+      t.string :error_class
+      t.string :error_message
+      t.string :backtrace, array: true
+      t.string :connection_class_name
+      t.timestamps
+
+      t.foreign_key :background_schema_migrations, column: :parent_id, on_delete: :cascade
+
+      t.index [:migration_name, :shard], unique: true, name: :index_background_schema_migrations_on_unique_configuration
+    end
+  end
+end