Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure external refreshes will also refresh internal searcher to minimize segment creation #27253

Merged
merged 8 commits into from
Nov 9, 2017

Conversation

s1monw
Copy link
Contributor

@s1monw s1monw commented Nov 3, 2017

We cut over to internal and external IndexReader/IndexSeacher in #26972 which uses
two independent searcher managers. This has the downside that refreshes of the external
reader will never clear the internal version map which in-turn will trigger additional
and potentially unnecessary segment flushes since memory must be freed. Under heavy
indexing load with low refresh intervals this can cause excessive segment creation which
causes high GC activity and significantly increases the required segment merges.

This change adds a dedicated external reference manager that delegates refreshes to the
internal reference manager that then steals the refreshed reader from the internal
reference manager for external usage. This ensures that external and internal readers
are consistent on an external refresh. As a sideeffect this also releases old segments
referenced by the internal reference manager which can potentially hold on to already merged
away segments until it is refreshed due to a flush or indexing activity.

…ize segment creation

We cut over to internal and external IndexReader/IndexSeacher in elastic#26972 which uses
two independent searcher managers. This has the downside that refreshes of the external
reader will never clear the internal version map which in-turn will trigger additional
and potentially unnecessary segment flushes since memory must be freed. Under heavy
indexing load with low refresh intervals this can cause excessive segment creation which
causes high GC activity and significantly increases the required segment merges.

This change adds a dedicated external reference manager that delegates refreshes to the
internal reference manager that then `steals` the refreshed reader from the internal
reference manager for external usage. This ensures that external and internal readers
are consistent on an external refresh. As a sideeffect this also releases old segments
referenced by the internal reference manager which can potentially hold on to already merged
away segments until it is refreshed due to a flush or indexing activity.
Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left some nits that would have helped me while reading this PR. feel free to do with them as you please.

@@ -224,8 +224,9 @@ public InternalEngine(EngineConfig engineConfig) {
throw e;
}
}
internalSearcherManager = createSearcherManager(new SearcherFactory(), false);
externalSearcherManager = createSearcherManager(new SearchFactory(logger, isClosed, engineConfig), true);
externalSearcherManager = createSearcherManager(new SearchFactory(logger, isClosed,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this fits on one line. yay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, good that this boolean is goon.

private final SearcherFactory searcherFactory;
private final SearcherManager internalSearcherManager;

ExternalSearcherManager(SearcherManager manager, SearcherFactory searcherFactory) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: manger -> internalManager

@@ -456,18 +526,17 @@ private String loadOrGenerateHistoryUUID(final IndexWriter writer, boolean force
return uuid;
}

private SearcherManager createSearcherManager(SearcherFactory searcherFactory, boolean readSegmentsInfo) throws EngineException {
private ExternalSearcherManager createSearcherManager(SearchFactory factory) throws EngineException {
boolean success = false;
SearcherManager searcherManager = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: searcherManager -> internalSearcherManager

@@ -456,18 +526,17 @@ private String loadOrGenerateHistoryUUID(final IndexWriter writer, boolean force
return uuid;
}

private SearcherManager createSearcherManager(SearcherFactory searcherFactory, boolean readSegmentsInfo) throws EngineException {
private ExternalSearcherManager createSearcherManager(SearchFactory factory) throws EngineException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: factory -> externalSearcherFactory

@@ -4226,7 +4226,7 @@ public void assertSameReader(Searcher left, Searcher right) {
List<LeafReaderContext> rightLeaves = ElasticsearchDirectoryReader.unwrap(right.getDirectoryReader()).leaves();
assertEquals(rightLeaves.size(), leftLeaves.size());
for (int i = 0; i < leftLeaves.size(); i++) {
assertSame(leftLeaves.get(i).reader(), rightLeaves.get(0).reader());
assertSame(leftLeaves.get(i).reader(), rightLeaves.get(i).reader());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sneaky :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah suddenly we have a test that triggered it 💃

@s1monw
Copy link
Contributor Author

s1monw commented Nov 8, 2017

@elasticmachine test this please

@s1monw s1monw merged commit a34c2f0 into elastic:master Nov 9, 2017
jasontedor added a commit that referenced this pull request Nov 9, 2017
* master: (22 commits)
  Update Tika version to 1.15
  Aggregations: bucket_sort pipeline aggregation (#27152)
  Introduce templating support to timezone/locale in DateProcessor (#27089)
  Increase logging on qa:mixed-cluster tests
  Update to AWS SDK 1.11.223 (#27278)
  Improve error message for parse failures of completion fields (#27297)
  Ensure external refreshes will also refresh internal searcher to minimize segment creation (#27253)
  Remove optimisations to reuse objects when applying a new `ClusterState` (#27317)
  Decouple `ChannelFactory` from Tcp classes (#27286)
  Fix find remote when building BWC
  Remove colons from task and configuration names
  Add unreleased 5.6.5 version number
  testCreateSplitIndexToN: do not set `routing_partition_size` to >= `number_of_routing_shards`
  Snapshot/Restore: better handle incorrect chunk_size settings in FS repo (#26844)
  Add limits for ngram and shingle settings (#27211) (#27318)
  Correct comment in index shard test
  Roll translog generation on primary promotion
  ObjectParser: Replace IllegalStateException with ParsingException (#27302)
  scripted_metric _agg parameter disappears if params are provided (#27159)
  Update discovery-ec2.asciidoc
  ...
s1monw added a commit that referenced this pull request Nov 11, 2017
…mize segment creation (#27253)

We cut over to internal and external IndexReader/IndexSearcher in #26972 which uses
two independent searcher managers. This has the downside that refreshes of the external
reader will never clear the internal version map which in-turn will trigger additional
and potentially unnecessary segment flushes since memory must be freed. Under heavy
indexing load with low refresh intervals this can cause excessive segment creation which
causes high GC activity and significantly increases the required segment merges.

This change adds a dedicated external reference manager that delegates refreshes to the
internal reference manager that then `steals` the refreshed reader from the internal
reference manager for external usage. This ensures that external and internal readers
are consistent on an external refresh. As a sideeffect this also releases old segments
referenced by the internal reference manager which can potentially hold on to already merged
away segments until it is refreshed due to a flush or indexing activity.
@jasontedor
Copy link
Member

@s1monw It appears the "backport pending" label can be removed now?

martijnvg added a commit that referenced this pull request Nov 14, 2017
* 6.x: (27 commits)
  Reduce synchronization on field data cache
  [Test] #27342 Fix SearchRequests#testValidate
  Fail queries with scroll that explicitely set request_cache (#27342)
  add json-processor support for non-map json types (#27335)
  Upgrade AWS SDK Jackson Databind to 2.6.7.1
  Properly format IndexGraveyard deletion date as date (#27362)
  Stop responding to ping requests before master abdication (#27329)
  [Test] Fix POI version in packaging tests
  Added release notes for 6.0.0
  Update 6.0.0-beta1.asciidoc
  Allow affix settings to specify dependencies (#27161)
  Tests: Improve size regex in documentation test (#26879)
  reword comment
  Ensure external refreshes will also refresh internal searcher to minimize segment creation (#27253)
  Remove unnecessary logger creation for doc values field data
  [DOCS] Fixed link to docker content
  Plugins: Add versionless alias to all security policy codebase properties (#26756)
  [DOCS] Move X-Pack-specific Docker content (#27333)
  [Test] Fix S3BlobStoreContainerTests.testNumberOfMultiparts()
  Set minimum_master_nodes to all nodes for REST tests (#27344)
  ...
@bleskes
Copy link
Contributor

bleskes commented Nov 23, 2017

@jasontedor agreed. I removed the backport pending label.

@clintongormley clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Engine :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v6.1.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants