Skip to content

Commit

Permalink
Add support for asynchronous creation/removal of indexes
Browse files Browse the repository at this point in the history
  • Loading branch information
fatkodima committed Jan 18, 2024
1 parent 006ca82 commit ac05d84
Show file tree
Hide file tree
Showing 40 changed files with 1,457 additions and 131 deletions.
1 change: 1 addition & 0 deletions .rubocop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@ Naming/FileName:
Exclude:
- lib/online_migrations/version.rb
- test/support/schema.rb
- test/support/models.rb
- test/support/db/**
- test/test_helper.rb
- gemfiles/**.gemfile
Expand Down
2 changes: 1 addition & 1 deletion .yardopts
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
--fail-on-warning
--markup markdown
lib/**/**.rb
- README.md docs/background_migrations.md docs/configuring.md
- README.md docs/*
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
## master (unreleased)

- Add support for asynchronous creation/removal of indexes

See `docs/background_schema_migrations.md` for the feature description.

- Remove potentially heavy queries used to get the ranges of a background migration

## 0.12.0 (2024-01-18)
Expand Down
16 changes: 10 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,11 @@ $ bin/rails generate online_migrations:install
$ bin/rails db:migrate
```

**Note**: If you do not have plans on using [background migrations](docs/background_migrations.md) feature, then you can delete the generated migration and regenerate it later, if needed.
**Note**: If you do not have plans on using [background data migrations](docs/background_data_migrations.md) or [background schema migrations](docs/background_schema_migrations.md) features, then you can delete the generated migration and regenerate it later, if needed.

### Upgrading

If you're already using [background migrations](docs/background_migrations.md), your background migrations tables may require additional columns. After every upgrade run:
If you're already using [background data migrations](docs/background_data_migrations.md) or [background schema migrations](docs/background_schema_migrations.md), your background migrations tables may require additional columns. After every upgrade run:

```sh
$ bin/rails generate online_migrations:upgrade
Expand Down Expand Up @@ -285,7 +285,7 @@ end
```

**Note**: If you forget `disable_ddl_transaction!`, the migration will fail.
**Note**: You may consider [background migrations](#background-migrations) to run data changes on large tables.
**Note**: You may consider [background data migrations](#background-data-migrations) or [background schema migrations](#background-schema-migrations) to run data changes on large tables.

### Changing the type of a column

Expand Down Expand Up @@ -1228,9 +1228,13 @@ Certain methods like `execute` and `change_table` cannot be inspected and are pr
Read [configuring.md](docs/configuring.md).
## Background Migrations
## Background Data Migrations
Read [background_migrations.md](docs/background_migrations.md) on how to perform data migrations on large tables.
Read [background_data_migrations.md](docs/background_data_migrations.md) on how to perform data migrations on large tables.
## Background Schema Migrations
Read [background_schema_migrations.md](docs/background_schema_migrations.md) on how to perform background schema migrations on large tables.
## Credits
Expand Down Expand Up @@ -1295,7 +1299,7 @@ The main differences are:
* adding different types of constraints
* and others

2. This gem has a [powerful internal framework](https://github.com/fatkodima/online_migrations/blob/master/docs/background_migrations.md) for running data migrations on very large tables using background migrations.
2. This gem has a powerful internal framework for running [data migrations](docs/background_data_migrations.md) and [schema migrations](docs/background_schema_migrations.md) on very large tables in background.

For example, you can use background migrations to migrate data that’s stored in a single JSON column to a separate table instead; backfill values from one column to another (as one of the steps when changing column type); or backfill some column’s value from an API.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Background Migrations
# Background Data Migrations

When a project grows, your database starts to be heavy and changing the data through the deployment process can be very painful.

Expand Down Expand Up @@ -246,20 +246,6 @@ migration.update!(
)
```

### Throttling

Background Migrations often modify a lot of data and can be taxing on your database. There is a throttling mechanism that can be used to throttle a background migration when a given condition is met. If a migration is throttled, it will be interrupted and retried on the next Scheduler cycle run.

Specify the throttle condition as a block:

```ruby
# config/initializers/online_migrations.rb
OnlineMigrations.config.background_migrations.throttler = -> { DatabaseStatus.unhealthy? }
```

Note that it's up to you to define a throttling condition that makes sense for your app. For example, you can check various PostgreSQL metrics such as replication lag, DB threads, whether DB writes are available, etc.
### Customizing the error handler

Exceptions raised while a Background Migration is performing are rescued and information about the error is persisted in the database.
Expand Down Expand Up @@ -294,21 +280,6 @@ OnlineMigrations.config.background_migrations.migrations_module = "BackgroundMig

If no value is specified, it will default to `OnlineMigrations::BackgroundMigrations`.

### Customizing the backtrace cleaner
`OnlineMigrations.config.background_migrations.backtrace_cleaner` can be configured to specify a backtrace cleaner to use when a Background Migration errors and the backtrace is cleaned and persisted. An `ActiveSupport::BacktraceCleaner` should be used.
```ruby
# config/initializers/online_migrations.rb
cleaner = ActiveSupport::BacktraceCleaner.new
cleaner.add_silencer { |line| line =~ /ignore_this_dir/ }
OnlineMigrations.config.background_migrations.backtrace_cleaner = cleaner
```
If none is specified, the default `Rails.backtrace_cleaner` will be used to clean backtraces.
### Multiple databases and sharding

If you have multiple databases or sharding, you may need to configure where background migrations related tables live
Expand All @@ -318,10 +289,10 @@ by configuring the parent model:
# config/initializers/online_migrations.rb
# Referring to one of the databases
OnlineMigrations::BackgroundMigrations::ApplicationRecord.connects_to database: { writing: :animals }
OnlineMigrations::ApplicationRecord.connects_to database: { writing: :animals }
# Referring to one of the shards (via `:database` option)
OnlineMigrations::BackgroundMigrations::ApplicationRecord.connects_to database: { writing: :shard_one }
OnlineMigrations::ApplicationRecord.connects_to database: { writing: :shard_one }
```

By default, ActiveRecord uses the database config named `:primary` (if exists) under the environment section from the `database.yml`.
Expand Down
164 changes: 164 additions & 0 deletions docs/background_schema_migrations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# Background Schema Migrations

When a project grows, your database starts to be heavy and performing schema changes through the deployment process can be very painful.

E.g., for very large tables, index creation can be a challenge to manage. While adding indexes `CONCURRENTLY` creates indexes in a way that does not block ordinary traffic, it can still be problematic when index creation runs for many hours. Necessary database operations like autovacuum cannot run, and the deployment process is usually blocked waiting for index creation to finish.

**Note**: You probably don't need to use this feature for smaller projects, since performing schema changes directly on smaller databases will be perfectly fine and will not block the deployment too much.

## Installation

Make sure you have migration files generated when installed this gem:

```sh
$ bin/rails generate online_migrations:install
```

Start a background migrations scheduler. For example, to run it on cron using [whenever gem](https://github.com/javan/whenever) add the following lines to its `schedule.rb` file:

```ruby
every 1.minute do
runner "OnlineMigrations.run_background_schema_migrations"
end
```

or run it manually when the deployment is finished, from the rails console:

```rb
[production] (main)> OnlineMigrations.run_background_schema_migrations
```

**Note**: Scheduler will perform only one migration at a time, to not load the database too much. If you enqueued multiple migrations or a migration for multiple shards, you need to call this method a few times.

**Note**: Make sure that the process that runs the scheduler does not die until the migration is finished.

## Enqueueing a Background Schema Migration

Currently, only helpers for adding/removing indexes are provided.

Background schema migrations should be performed in 2 steps:

1. Create a PR that schedules the index to be created/removed
2. Verify that the PR was deployed and that the index was actually created/removed on production.
Create a follow-up PR with a regular migration that creates/removes an index synchronously (will be a no op when run on production) and commit the schema changes for `schema.rb`/`structure.sql`

To schedule an index creation:

```ruby
# db/migrate/xxxxxxxxxxxxxx_add_index_to_users_email_in_background.rb
def up
add_index_in_background(:users, :email, unique: true)
end
```

To schedule an index removal:

```ruby
# db/migrate/xxxxxxxxxxxxxx_remove_index_from_users_email_in_background.rb
def up
remove_index_in_background(:users, name: "index_users_on_email")
end
```

`add_index_in_background`/`remove_index_in_background` accept additional configuration options which controls how the background schema migration is run. Check the [source code](https://github.com/fatkodima/online_migrations/blob/master/lib/online_migrations/background_schema_migrations/migration_helpers.rb) for the list of all available configuration options.

## Instrumentation

Background schema migrations use the [ActiveSupport::Notifications](http://api.rubyonrails.org/classes/ActiveSupport/Notifications.html) API.

You can subscribe to `background_schema_migrations` events and log it, graph it, etc.

To get notified about specific type of events, subscribe to the event name followed by the `background_schema_migrations` namespace. E.g. for retries use:

```ruby
# config/initializers/online_migrations.rb
ActiveSupport::Notifications.subscribe("retried.background_schema_migrations") do |name, start, finish, id, payload|
# background schema migration object is available in payload[:background_schema_migration]

# Your code here
end
```

If you want to subscribe to every `background_schema_migrations` event, use:

```ruby
# config/initializers/online_migrations.rb
ActiveSupport::Notifications.subscribe(/background_schema_migrations/) do |name, start, finish, id, payload|
# background schema migration object is available in payload[:background_schema_migration]

# Your code here
end
```

Available events:

* `started.background_schema_migrations`
* `run.background_schema_migrations`
* `completed.background_schema_migrations`
* `retried.background_schema_migrations`
* `throttled.background_schema_migrations`

## Monitoring Background Schema Migrations

Background Schema Migrations can be in various states during its execution:

* **enqueued**: A migration has been enqueued by the user.
* **running**: A migration is being performed by a migration executor.
* **failing**: A migration raised an exception during last run (or last retry) and will be retried.
* **failed**: A migration raises an exception when running and won't be retried anymore.
* **succeeded**: A migration finished without error.

## Configuring

There are a few configurable options for the Background Schema Migrations. Custom configurations should be placed in a `online_migrations.rb` initializer.

Check the [source code](https://github.com/fatkodima/online_migrations/blob/master/lib/online_migrations/background_schema_migrations/config.rb) for the list of all available configuration options.

**Note**: You can dynamically change certain migration parameters while the migration is run.
For example,
```ruby
migration = OnlineMigrations::BackgroundSchemaMigrations::Migration.find(id)
migration.update!(
statement_timeout: 2.hours, # The statement timeout value used when running the migration
max_attempts: 10 # The # of attempts the failing migration will be retried
)
```

### Customizing the error handler

Exceptions raised while a Background Schema Migration is performing are rescued and information about the error is persisted in the database.

If you want to integrate with an exception monitoring service (e.g. Bugsnag), you can define an error handler:

```ruby
# config/initializers/online_migrations.rb

OnlineMigrations.config.background_schema_migrations.error_handler = ->(error, errored_migration) do
Bugsnag.notify(error) do |notification|
notification.add_metadata(:background_schema_migration, { name: errored_migration.name })
end
end
```

The error handler should be a lambda that accepts 2 arguments:

* `error`: The exception that was raised.
* `errored_migration`: An `OnlineMigrations::BackgroundSchemaMigrations::Migration` object that represents a failed migration.

### Multiple databases and sharding

If you have multiple databases or sharding, you may need to configure where background migrations related tables live
by configuring the parent model:

```ruby
# config/initializers/online_migrations.rb

# Referring to one of the databases
OnlineMigrations::ApplicationRecord.connects_to database: { writing: :animals }

# Referring to one of the shards (via `:database` option)
OnlineMigrations::ApplicationRecord.connects_to database: { writing: :shard_one }
```

By default, ActiveRecord uses the database config named `:primary` (if exists) under the environment section from the `database.yml`.
Otherwise, the first config under the environment section is used.
29 changes: 29 additions & 0 deletions docs/configuring.md
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,35 @@ Add to an initializer file:
config.auto_analyze = true
```

## Throttling

Background data and schema migrations can be taxing on your database. There is a throttling mechanism that can be used to throttle a background migration when a given condition is met. If a migration is throttled, it will be interrupted and retried on the next Scheduler cycle run.

Specify the throttle condition as a block:

```ruby
# config/initializers/online_migrations.rb

OnlineMigrations.config.throttler = -> { DatabaseStatus.unhealthy? }
```

**Note**: It's up to you to define a throttling condition that makes sense for your app. For example, you can check various PostgreSQL metrics such as replication lag, DB threads, whether DB writes are available, etc.

## Customizing the backtrace cleaner

`OnlineMigrations.config.backtrace_cleaner` can be configured to specify a backtrace cleaner to use when a background data or schema migration errors and the backtrace is cleaned and persisted. An `ActiveSupport::BacktraceCleaner` should be used.

```ruby
# config/initializers/online_migrations.rb

cleaner = ActiveSupport::BacktraceCleaner.new
cleaner.add_silencer { |line| line =~ /ignore_this_dir/ }

OnlineMigrations.config.backtrace_cleaner = cleaner
```

If none is specified, the default `Rails.backtrace_cleaner` will be used to clean backtraces.

## Schema Sanity

Columns can flip order in `db/schema.rb` when you have multiple developers. One way to prevent this is to [alphabetize them](https://www.pgrs.net/2008/03/12/alphabetize-schema-rb-columns/).
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
class CreateBackgroundSchemaMigrations < <%= migration_parent %>
def change
# You can remove this migration for now and regenerate it later if you do not have plans
# to use background schema migrations, like adding indexes in the background.
create_table :background_schema_migrations do |t|
t.bigint :parent_id
t.string :migration_name, null: false
t.string :table_name, null: false
t.string :definition, null: false
t.string :status, default: "enqueued", null: false
t.string :shard
t.boolean :composite, default: false, null: false
t.integer :statement_timeout
t.datetime :started_at
t.datetime :finished_at
t.integer :max_attempts, null: false
t.integer :attempts, default: 0, null: false
t.string :error_class
t.string :error_message
t.string :backtrace, array: true
t.string :connection_class_name
t.timestamps

t.foreign_key :background_schema_migrations, column: :parent_id, on_delete: :cascade

t.index [:migration_name, :shard], unique: true, name: :index_background_schema_migrations_on_unique_configuration
end
end
end
Loading

0 comments on commit ac05d84

Please sign in to comment.