Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch v2 Upgrade #315

Merged
merged 2 commits into from
Dec 15, 2016
Merged

Elasticsearch v2 Upgrade #315

merged 2 commits into from
Dec 15, 2016

Conversation

GUI
Copy link
Member

@GUI GUI commented Dec 15, 2016

This upgrades the bundled version of Elasticsearch to v2.4 (up from v1.7). Since Elasticsearch v1.7 will be end-of-life'd next month, we need to get on a newer version. Version 2 should also bring various speed and stability benefits.

We're also upgrading to v2 rather than the latest v5, since v2 has an easier upgrade path. Upgrading from v1 to v2 should just require a full cluster restart, whereas jumping to v5 will require reindexing data. At some point we'll probably need to tackle the v5 upgrade, but that at least becomes easier once everything's on v2 (so any old v1 data can be reindexed in place).

It's also worth noting that based on earlier discussions (eg, #248 (comment)), it seemed like upgrading older databases from v1 to v2 was going to be problematic given that our old data (before API Umbrella v0.12) may have contained field names with dots and Elasticsearch v2 had disabled support for any field names with dots (if you tried to upgrade an existing database with dotted field names, Elasticsearch v2 would refuse to start). Luckily, Elasticsearch 2.4.0 rolls back some of these restrictions so older API Umbrella installations (pre API Umbrella v0.12) where dotted field names may have been present can now be upgraded directly without reindexing all the data.

One final note is that while we're upgrading our default bundled version of Elasticsearch to v2, API Umbrella is still compatible with v1 if you happen to be running an external Elasticsearch cluster (rather than relying on the versions bundled with API Umbrella). If you need to continue connecting to an Elasticsearch v1 cluster, then you'll need to update your /etc/api-umbrella/api-umbrella.yml config to include:

elasticsearch:
 api_version: 1

However, since Elasticsearch v1 is end-of-life'd next month, I'm thinking we probably won't support this beyond the next release, unless there's a need.

- Switch the default API version to v2. This updates the couple places
  our queries need to be changed for v2 compatibility. v1 mode can still
  be enabled if you're connecting to an external elasticsearch v1
  instance.
- Create the elasticsearch scripts directory, which v2 seems to require
  being present or it will fail to startup.
- Startup v2 with "mapper.allow_dots_in_name=true". This allows for
  easier upgrades from v1 installations where the analytics data in
  older API Umbrella installations (before v0.12.0) may have contained
  field names with dots. With this flag enabled we can upgrade this
  older data directly without having to reindex everything.
- Tweak how the test suite clean Elasticsearch data between tests.
  Elasticsearch imposes a lower per query limit, so our queries with
  size 100,000 were breaking. Instead, we'll switch this to a
  scan-and-scroll query, which should be better anyway, since now we
  don't have to worry about any specific size.
After upgrading to elasticsearch v2, some of our logging tests were
consistently failing if they happened to follow a number of other tests.
The issue was that if a test previously ran that made a bunch of
requests, then Elasticsearch wasn't indexing the data quickly enough,
causing rsyslog queue to get backed up. This resulted in the requests
from the logging tests not getting indexed into Elasticsearch within the
expected time.

There were 2 main performance issues with Elasticsearch v2 that led to
this:

1. Elasticsearch syncs the data to disk on every request in version 2,
   which hurts indexing performance. We've switched things back v1's
   mode of performing asynchronous syncs periodically (so we're trading
   some safety for speed, but that seems okay for this kind of log
   data).
2. Updating index mappings in Elasticsearch version 2 is more costly.
   Our mapping is mostly static unless you enable logging of the
   "request_query" field, which stores all the request query params as
   a nested object. We have some tests that generate a bunch of unique,
   random query parameters (mainly for cache busting), but this leads to
   a deluge of mapping updates since each new query param seen means
   the mapping needs to be updated.

   We recently disabled gathering this "request_query" field by default,
   but it was still enabled in our test suite by default, since we had
   some existing tests that relied on this functionality. So to solve
   the performance issues, we've shifted our test suite to disable
   gathering "request_query" too. This eliminates all the mapping
   updates during tests.

   The ability to enabled "request_query" collection still exists, and
   the existing tests for this functionality have been retained (just in
   a more isolated fashion that won't impact other tests). However,
   given the potential performance issues of enabling this, it might be
   a good reason to get rid of this functionality altogether.

While debugging the performance issues, we've also made a couple tweaks
and improvements to our rsyslog setup:

- The queue.size wasn't configured, so the memory-portion of the queue
  was capped at 1,000 by default. This meant the configured highwater
  and lowwater sizes weren't actually being used. Set a higher
  queue.size to resolve this.
- Enable the impstats plugin to output rsyslog queue stats every minute.
  This seems generally helpful to have in place to be able to see what
  rsyslog's up to and whether anything is becoming congested with
  logging.
- More comments to explain some of the more cryptic rsyslog
  configuration settings.
@GUI
Copy link
Member Author

GUI commented Dec 15, 2016

Oh, and one more additional note about Elasticsearch v2 indexing performance: On the initial implementation, we were hitting bottlenecks in our test suite with requests not being indexed into Elasticsearch in a timely manner. However, with some tweaking, indexing performance is back to where it should be. But this might be worth highlighting in case anyone's using an external Elasticsearch instance. The short version:

  • Elasticsearch's index.translog.durability setting should probably be set to async.
  • We also had do disable API Umbrella's collection of the request_query object inside Elasticsearch (which we've never used, but had been collecting). We had actually already disabled the collection of this recently for other reasons (see a432b22), but with Elasticsearch v2's performance differences, it's probably more important this this remain off and we entirely remove the ability to turn it back on.

The longer explanation is in the commit message here: c3afad9

@GUI GUI merged commit baa4ab6 into master Dec 15, 2016
@GUI GUI deleted the elasticsearch-v2 branch December 15, 2016 04:29
@GUI GUI added this to the v0.14.0 milestone Feb 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant