Use separate searchers for "search visibility" vs "move indexing buffer to disk #26972

s1monw · 2017-10-11T14:22:05Z

Today, when ES detects it's using too much heap vs the configured indexing
buffer (default 10% of JVM heap) it opens a new searcher to force Lucene to move
the bytes to disk, clear version map, etc.

But this has the unexpected side effect of making newly indexed/deleted
documents visible to future searches, which is not nice for users who are trying
to prevent that, e.g. #3593.

This is also an indirect spinoff from #26802 where we potentially pay a big
price on rebuilding caches etc. when updates / realtime-get is used. We are
refreshing the internal reader for realtime gets which causes for instance
global ords to be rebuild. I think we can gain quite a bit if we'd use a reader
that is only used for GETs and not for searches etc. that way we can also solve
problems of searchers being refreshed unexpectedly aside of replica recovery /
relocation.

Closes #15768
Closes #26912

…er to disk" Today, when ES detects it's using too much heap vs the configured indexing buffer (default 10% of JVM heap) it opens a new searcher to force Lucene to move the bytes to disk, clear version map, etc. But this has the unexpected side effect of making newly indexed/deleted documents visible to future searches, which is not nice for users who are trying to prevent that, e.g. elastic#3593. This is also an indirect spinoff from elastic#26802 where we potentially pay a big price on rebuilding caches etc. when updates / realtime-get is used. We are refreshing the internal reader for realtime gets which causes for instance global ords to be rebuild. I think we can gain quite a bit if we'd use a reader that is only used for GETs and not for searches etc. that way we can also solve problems of searchers being refreshed unexpectedly aside of replica recovery / relocation. Closes elastic#15768 Closes elastic#26912

s1monw · 2017-10-11T14:23:28Z

/cc @uschindler

uschindler · 2017-10-11T14:37:20Z

Having 2 IndexReaders (in fact DirectoryReaders) open at same time will not have any impact on memory usage, as both share the same LeafReaders. The memory of stale segments is only freed if the one with oldest/nolonger uptodate segments is released.

@s1monw was this your question so you pinged me?

s1monw · 2017-10-11T14:46:55Z

@uschindler thanks, I was aware of this that's why I added it. I pinged you since you where involved in the original issue here #15768

bleskes

Thanks for picking this up so quickly. I did a pass and left mostly nits around naming (sorry :)). I did leave some comments about what seems like excessive refreshes which is why I marked this as request changes.

Also - with the double searcher managers and their relationship to the version map I got a bit worried about the IndexinMemoryController calling writeIndexingBuffer. I couldn't find any test for that. Do we have one? I think we should have a concurrent read/write by ID engine unit test which validates it's safe to call refresh/flush/writeIndexingBuffer without us messing up. WDYT?

bleskes · 2017-10-11T14:51:53Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

-            manager = createSearcherManager();
+            manager = createSearcherManager(searcherFactory);
+            internalSearcherManager = createSearcherManager(new SearcherFactory());
+            this.internalSearcherManager = internalSearcherManager;


shall we name these by scope? i.e., getScopeSearcherManager?

bleskes · 2017-10-11T14:54:29Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

@@ -215,9 +217,11 @@ public InternalEngine(EngineConfig engineConfig) throws EngineException {
                    throw e;
                }
            }
-            manager = createSearcherManager();
+            manager = createSearcherManager(searcherFactory);


Nit - now that we call createSearchManager twice, shall we move the following line out of that method and into this constructor? it feels unrelated.

lastCommittedSegmentInfos = readLastCommittedSegmentInfos(searcherManager, store);

Another nit - searcherFactory -> searcherFactoryForSearch?

bleskes · 2017-10-11T15:01:51Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+        refresh(source, true);
+    }
+
+    final void refresh(String source, boolean refreshExternal) throws EngineException {


shall we make the refreshExternal an EnumSet of SearcherScope?

bleskes · 2017-10-11T15:04:19Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

        } catch (AlreadyClosedException e) {
            failOnTragicEvent(e);
            throw e;
        } catch (Exception e) {
            try {
-                failEngine("refresh failed", e);
+                failEngine("refresh failed source[" + source + "]", e);


bleskes · 2017-10-11T15:21:45Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

@@ -1231,7 +1240,7 @@ public void writeIndexingBuffer() throws EngineException {
                // The version map is using > 25% of the indexing buffer, so we do a refresh so the version map also clears
                logger.debug("use refresh to write indexing buffer (heap size=[{}]), to also clear version map (heap size=[{}])",
                        new ByteSizeValue(indexingBufferBytes), new ByteSizeValue(versionMapBytes));
-                refresh("write indexing buffer");
+                refresh("write indexing buffer", false);


I wonder if we can now always use this "refresh" path here rather than the IndexWriter.flush - refresh is now much lighter? (not sure what other side effects are there).

+1 I think that's good

bleskes · 2017-10-11T15:23:04Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

@@ -1347,7 +1355,7 @@ public CommitId flush(boolean force, boolean waitIfOngoing) throws EngineExcepti
                        commitIndexWriter(indexWriter, translog, null);
                        logger.trace("finished commit for flush");
                        // we need to refresh in order to clear older version values
-                        refresh("version_table_flush");
+                        refresh("version_table_flush"); // TODO technically we could also only refresh the internal searcher


Doesn't the comment about releasing segments in tryRenewSyncCommit holds here too?

I think I did some copy past mess here. I will fix

bleskes · 2017-10-11T15:24:58Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

@@ -1500,6 +1508,8 @@ public void forceMerge(final boolean flush, int maxNumSegments, boolean onlyExpu
                if (flush) {
                    if (tryRenewSyncCommit() == false) {
                        flush(false, true);
+                    } else {
+                        refresh("renew sync commit"); // we have to refresh both searchers here to ensure we release unreferenced segments.


Isn't tryRenewSyncCommit already doing it in this case?

see my comment above sorry for the noise

bleskes · 2017-10-11T15:26:16Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

@@ -1867,6 +1884,10 @@ protected void doRun() throws Exception {
                        // free up transient disk usage of the (presumably biggish) segments that were just merged
                        if (tryRenewSyncCommit() == false) {
                            flush();
+                        } else {
+                            // we only refresh the rather cheap internal searcher manager in order to not trigger new datastructures


same comment - if tryRenewSyncCommit returned true, it already refreshed, no?

bleskes · 2017-10-11T15:28:22Z

core/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java

+        Searcher getSearcher = engine.acquireSearcher("test", Engine.SearcherScope.GET);
+        Searcher searchSearcher = engine.acquireSearcher("test", Engine.SearcherScope.SEARCH);
+        assertSameReader(getSearcher, searchSearcher);
+        IOUtils.close(getSearcher, searchSearcher);


nit - should we use try-with-resources to avoid leaking on failure?

…xternal refresh

s1monw · 2017-10-11T20:07:09Z

@bleskes I am not sure if you love or hate this 🤷‍♂️ I took it one step further and solely use that internal searcher everywhere. the external searcher is used for non-RT get and search but internally it's all the done via the private reader. This also means we don't refresh anymore during force_merge and flush but to keep it BWC I moved these calls into IndexShard. Conceptually I think we should not refresh in these situations and we might change that in 7.0 another option is to make refresh optional on these API and change a default in 7 so folks can choose.

I personally think this is much cleaner now.

mikemccand · 2017-10-11T21:19:06Z

Yay!!

bleskes · 2017-10-11T22:01:57Z

@mikemccand you're alive!

uschindler · 2017-10-11T18:41:39Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+            case SEARCH:
+                return searcherManager;
+            default:
+                throw new IllegalStateException("unknonw scope: " + scope);


typo: unknown

bleskes

This is great ❤️ . I left suggestions but there is no need for another review.

bleskes · 2017-10-12T09:14:18Z

core/src/main/java/org/elasticsearch/action/termvectors/TermVectorsResponse.java

@@ -305,9 +305,9 @@ private void buildFieldStatistics(XContentBuilder builder, Terms curTerms) throw
        long sumDocFreq = curTerms.getSumDocFreq();
        int docCount = curTerms.getDocCount();
        long sumTotalTermFrequencies = curTerms.getSumTotalTermFreq();
-        if (docCount > 0) {
-            assert ((sumDocFreq > 0)) : "docCount >= 0 but sumDocFreq ain't!";


:) it's a good sign you bumped into this, isn't this changing semantics - before when there were no docs, we won't output the Field Stats sub object

expand the diff below and you will see all the void 💃

bleskes · 2017-10-12T09:21:31Z

core/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+        return acquireSearcher(source, Engine.SearcherScope.EXTERNAL);
+    }
+
+    private Engine.Searcher acquireSearcher(String source, Engine.SearcherScope scope) {


with the move to internal external (rather then get/search), I think this method can go and that also means that the Engine.acquireSearch(source, scope) can be made protected.

Never mind. I see it now :(

bleskes · 2017-10-12T09:26:05Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+                scope = SearcherScope.INTERNAL;
+            } else {
+                // we expose what has been externally expose in a point in time snapshot via an explicit refresh
+                scope = SearcherScope.EXTERNAL;


++ sneaky and nice. I think we can extend InternalEngineTests.testSimpleOperations with:

// but, we can still get it (in realtime) getResult = engine.get(newGet(true, doc), searcherFactory); assertThat(getResult.exists(), equalTo(true)); assertThat(getResult.docIdAndVersion(), notNullValue()); getResult.release(); + // but not real time is not yet visible + getResult = engine.get(newGet(false, doc), searcherFactory); + assertThat(getResult.exists(), equalTo(false)); + getResult.release();

bleskes · 2017-10-12T09:33:10Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+                    // even though we maintain 2 managers we really do the heavy-lifting only once.
+                    // the second refresh will only do the extra work we have to do for warming caches etc.
+                    externalSearcherManager.maybeRefreshBlocking();
+                    // fall-through is intentional


I don't see the fall through?

bleskes · 2017-10-12T09:37:49Z

core/src/main/java/org/elasticsearch/index/engine/LiveVersionMap.java

@@ -98,21 +96,6 @@
    /** Tracks bytes used by tombstones (deletes) */
    final AtomicLong ramBytesUsedTombstones = new AtomicLong();

-    /** Sync'd because we replace old mgr. */
-    synchronized void setManager(ReferenceManager<?> newMgr) {


bleskes · 2017-10-12T09:39:47Z

core/src/main/java/org/elasticsearch/index/shard/IndexShard.java

@@ -1005,6 +1005,7 @@ public CompletionStats completionStats(String... fields) {
        }
        final long time = System.nanoTime();
        final Engine.CommitId commitId = engine.flush(force, waitIfOngoing);
+        engine.refresh("flush"); // TODO this is technically wrong we should remove this in 7.0


OK. Now I see why the engine changes can be done in 6.x. I would opt for keeping as it was (i.e. full refresh on flush) and do as a small following in 7.0 to change the refresh command in the engine + mark it as breaking etc. Just a suggestion.

s1monw · 2017-10-12T12:09:09Z

@uschindler @bleskes thanks for the reviews I will wait for CI and merge it

uschindler · 2017-10-12T15:25:51Z

Thanks @s1monw !

…er to disk (#26972) Today, when ES detects it's using too much heap vs the configured indexing buffer (default 10% of JVM heap) it opens a new searcher to force Lucene to move the bytes to disk, clear version map, etc. But this has the unexpected side effect of making newly indexed/deleted documents visible to future searches, which is not nice for users who are trying to prevent that, e.g. #3593. This is also an indirect spinoff from #26802 where we potentially pay a big price on rebuilding caches etc. when updates / realtime-get is used. We are refreshing the internal reader for realtime gets which causes for instance global ords to be rebuild. I think we can gain quite a bit if we'd use a reader that is only used for GETs and not for searches etc. that way we can also solve problems of searchers being refreshed unexpectedly aside of replica recovery / relocation. Closes #15768 Closes #26912

Today all these API calls have a sideeffect of making documents visible to search requests. While this is sometimes desired it's an unnecessary sideeffect and now that we have an internal (engine-private) index reader (elastic#26972) we artificially add a refresh call for bwc. This change removes this sideeffect in 7.0.

The changes introduced in #26972 missed two places where an internal searcher should be used.

Today all these API calls have a sideeffect of making documents visible to search requests. While this is sometimes desired it's an unnecessary sideeffect and now that we have an internal (engine-private) index reader (#26972) we artificially add a refresh call for bwc. This change removes this sideeffect in 7.0.

* master: (356 commits) Do not set SO_LINGER on server channels (elastic#26997) Fix inconsistencies in the rest api specs for *_script (elastic#26971) fix inconsistencies in the rest api specs for cat.snapshots (elastic#26996) Add docs on full_id parameter in cat nodes API [TEST] Add test that replicates versioned updates with random flushes Use internal searcher for all indexing related operations in the engine Reformat paragraph in template docs to 80 columns Clarify settings and template on create index Fix reference to TcpTransport in documentation Allow Uid#decodeId to decode from a byte array slice (elastic#26987) Fix a typo in the similarity docs (elastic#26970) Use separate searchers for "search visibility" vs "move indexing buffer to disk (elastic#26972) Create weights lazily in filter and filters aggregation (elastic#26983) Use a dedicated ThreadGroup in rest sniffer (elastic#26897) Fire global checkpoint sync under system context Update by Query is modified to accept short `script` parameter. (elastic#26841) Cat shards bytes (elastic#26952) Add support for parsing inline script (elastic#23824) (elastic#26846) Change default value to true for transpositions parameter of fuzzy query (elastic#26901) Adding unreleased 5.6.4 version number to Version.java ...

* master: Remove unnecessary exception for engine constructor Update docs about `script` parameter (#27010) Don't refresh on `_flush` `_force_merge` and `_upgrade` (#27000) Do not set SO_LINGER on server channels (#26997) Fix inconsistencies in the rest api specs for *_script (#26971) fix inconsistencies in the rest api specs for cat.snapshots (#26996) Add docs on full_id parameter in cat nodes API [TEST] Add test that replicates versioned updates with random flushes Use internal searcher for all indexing related operations in the engine Reformat paragraph in template docs to 80 columns Clarify settings and template on create index Fix reference to TcpTransport in documentation Allow Uid#decodeId to decode from a byte array slice (#26987) Fix a typo in the similarity docs (#26970) Use separate searchers for "search visibility" vs "move indexing buffer to disk (#26972)

* 6.x: Remove unnecessary exception for engine constructor Update docs about `script` parameter (#27010) Do not set SO_LINGER on server channels (#26997) Fix inconsistencies in the rest api specs for *_script (#26971) fix inconsistencies in the rest api specs for cat.snapshots (#26996) Add docs on full_id parameter in cat nodes API Add removal of types to the 6.0 breaking changes Create weights lazily in filter and filters aggregation (#26983) [TEST] Add test that replicates versioned updates with random flushes Use internal searcher for all indexing related operations in the engine Use separate searchers for "search visibility" vs "move indexing buffer to disk (#26972) Reformat paragraph in template docs to 80 columns Clarify settings and template on create index Fix reference to TcpTransport in documentation Allow Uid#decodeId to decode from a byte array slice (#26987) Fix a typo in the similarity docs (#26970)

We do some accouting in IndexShard that is not necessarily correct since we maintain two different index readers. This change moves the accounting under the engine which knows what reader we are refreshing. Relates to elastic#26972

We do some accounting in IndexShard that is not necessarily correct since we maintain two different index readers. This change moves the accounting under the engine which knows what reader we are refreshing. Relates to #26972

…ize segment creation We cut over to internal and external IndexReader/IndexSeacher in elastic#26972 which uses two independent searcher managers. This has the downside that refreshes of the external reader will never clear the internal version map which in-turn will trigger additional and potentially unnecessary segment flushes since memory must be freed. Under heavy indexing load with low refresh intervals this can cause excessive segment creation which causes high GC activity and significantly increases the required segment merges. This change adds a dedicated external reference manager that delegates refreshes to the internal reference manager that then `steals` the refreshed reader from the internal reference manager for external usage. This ensures that external and internal readers are consistent on an external refresh. As a sideeffect this also releases old segments referenced by the internal reference manager which can potentially hold on to already merged away segments until it is refreshed due to a flush or indexing activity.

…mize segment creation (#27253) We cut over to internal and external IndexReader/IndexSearcher in #26972 which uses two independent searcher managers. This has the downside that refreshes of the external reader will never clear the internal version map which in-turn will trigger additional and potentially unnecessary segment flushes since memory must be freed. Under heavy indexing load with low refresh intervals this can cause excessive segment creation which causes high GC activity and significantly increases the required segment merges. This change adds a dedicated external reference manager that delegates refreshes to the internal reference manager that then `steals` the refreshed reader from the internal reference manager for external usage. This ensures that external and internal readers are consistent on an external refresh. As a sideeffect this also releases old segments referenced by the internal reference manager which can potentially hold on to already merged away segments until it is refreshed due to a flush or indexing activity.

s1monw added :Core/Infra/Core Core issues without another label >enhancement v6.1.0 v7.0.0 labels Oct 11, 2017

s1monw requested review from jpountz and bleskes October 11, 2017 14:22

remove unused import

3cc471a

fix test

7076c9b

bleskes suggested changes Oct 11, 2017

View reviewed changes

s1monw added 2 commits October 11, 2017 21:39

take it one step further and make only Engine#refresh(String) do an e…

d308248

…xternal refresh

add a unittest that simulates what an upsert does

56349a9

fix GC simulation

6cc0a8b

align asserts with its comments

5535320

uschindler approved these changes Oct 12, 2017

View reviewed changes

bleskes approved these changes Oct 12, 2017

View reviewed changes

add additional test and clarify comment

dfa6122

Merge branch 'master' into use_interal_reader_manager

17d1cc8

s1monw merged commit 21eb9bd into elastic:master Oct 12, 2017

s1monw deleted the use_interal_reader_manager branch October 12, 2017 15:19

s1monw added the backport pending label Oct 12, 2017

s1monw mentioned this pull request Oct 13, 2017

Don't refresh on _flush _force_merge and _upgrade #27000

Merged

s1monw removed the backport pending label Oct 13, 2017

s1monw added a commit that referenced this pull request Oct 13, 2017

Use internal searcher for all indexing related operations in the engine

a517758

The changes introduced in #26972 missed two places where an internal searcher should be used.

s1monw added a commit that referenced this pull request Oct 13, 2017

Use internal searcher for all indexing related operations in the engine

5fb70b2

The changes introduced in #26972 missed two places where an internal searcher should be used.

s1monw mentioned this pull request Nov 1, 2017

Move IndexShard#getWritingBytes() under InternalEngine #27209

Merged

s1monw mentioned this pull request Nov 3, 2017

Ensure external refreshes will also refresh internal searcher to minimize segment creation #27253

Merged

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use separate searchers for "search visibility" vs "move indexing buffer to disk #26972

Use separate searchers for "search visibility" vs "move indexing buffer to disk #26972

s1monw commented Oct 11, 2017

s1monw commented Oct 11, 2017

uschindler commented Oct 11, 2017

s1monw commented Oct 11, 2017

bleskes left a comment

bleskes Oct 11, 2017

bleskes Oct 11, 2017

bleskes Oct 11, 2017

bleskes Oct 11, 2017

bleskes Oct 11, 2017

bleskes Oct 11, 2017

s1monw Oct 11, 2017

bleskes Oct 11, 2017

s1monw Oct 11, 2017

bleskes Oct 11, 2017

s1monw Oct 11, 2017

bleskes Oct 11, 2017

bleskes Oct 11, 2017

s1monw Oct 11, 2017

s1monw commented Oct 11, 2017

mikemccand commented Oct 11, 2017

bleskes commented Oct 11, 2017

uschindler Oct 11, 2017

bleskes left a comment

bleskes Oct 12, 2017

s1monw Oct 12, 2017

bleskes Oct 12, 2017

bleskes Oct 12, 2017

bleskes Oct 12, 2017

bleskes Oct 12, 2017

bleskes Oct 12, 2017

bleskes Oct 12, 2017

s1monw commented Oct 12, 2017

uschindler commented Oct 12, 2017

Use separate searchers for "search visibility" vs "move indexing buffer to disk #26972

Use separate searchers for "search visibility" vs "move indexing buffer to disk #26972

Conversation

s1monw commented Oct 11, 2017

s1monw commented Oct 11, 2017

uschindler commented Oct 11, 2017

s1monw commented Oct 11, 2017

bleskes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s1monw commented Oct 11, 2017

mikemccand commented Oct 11, 2017

bleskes commented Oct 11, 2017

Choose a reason for hiding this comment

bleskes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s1monw commented Oct 12, 2017

uschindler commented Oct 12, 2017