Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use _refresh instead of reading from Translog in the RT GET case #20102

Merged
merged 10 commits into from
Aug 24, 2016

Conversation

s1monw
Copy link
Contributor

@s1monw s1monw commented Aug 22, 2016

Today we do a lot of accounting inside the engine to maintain locations
of documents inside the transaction log. This is only needed to ensure
we can return the documents source from the engine if it hasn't been refreshed.
Aside of the added complexity to be able to read from the currently writing translog,
maintainance of pointers into the translog this also caused inconsistencies like different values
of the _ttl field if it was read from the tlog or not. TermVectors are totally different if
the document is fetched from the tranlog since copy fields are ignored etc.

This change will simply call refresh if the documents latest version is not in the index. This
streamlines the semantics of the _get API and allows for more optimizations inside the engine
and on the transaction log. Note: _refresh is only called iff the requested document is not refreshed yet but has recently been updated or added.

NOTE: there are more APIs that are test only now that potentially can be removed... I didn't run REST tests yet etc so it's really WIP

Today we do a lot of accounting inside the engine to maintain locations
of documents inside the transaction log. This is only needed to ensure
we can return the documents source from the engine if it hasn't been refreshed.
Aside of the added complexity to be able to read from the currently writing translog,
maintainance of pointers into the translog this also caused inconsistencies like different values
of the `_ttl` field if it was read from the tlog or not. TermVectors are totally different if
the document is fetched from the tranlog since copy fields are ignored etc.

This chance will simply call `refresh` if the documents latest version is not in the index. This
streamlines the semantics of the `_get` API and allows for more optimizations inside the engine
and on the transaction log. Note: `_refresh` is only called iff the requested document is not refreshed
yet but has recently been updated or added.
@mikemccand
Copy link
Contributor

LGTM, so simple.

This goes a long ways towards addressing #19787 where the version map consumes 2/3 of the indexing buffer. Version map is still alive w/ this change, but not storing the Translog.Location will save a lot of heap.

@jpountz
Copy link
Contributor

jpountz commented Aug 22, 2016

LGTM2

@nik9000
Copy link
Member

nik9000 commented Aug 22, 2016

In a followup we should switch the default for GETs to non-realtime. Its nice that we can still do a realtime GET but the performance is very different. I figure we can do the docs in that PR? I'm happy to make the PR if no one else wants it.

@jpountz
Copy link
Contributor

jpountz commented Aug 22, 2016

@nik9000 What is your motivation to make GET not realtime by default? Are you worried about the worst-case of someone doing a PUT then GET on the same id all the time?

@nik9000
Copy link
Member

nik9000 commented Aug 22, 2016

Are you worried about the worst-case of someone doing a PUT then GET on the same id all the time?

Basically, yes. HTTP GETs are usually fast and don't make changes to the server. People are used to that because it is true for almost all GETs. This way of doing realtime GETs is still fairly fast in most cases but it'll make lots of little segments. Lots of little segments are bad for performance for lots of reasons but in this case I worry because they don't actually slow down the GET all that much, just the searches done later.

I think it is the right thing to do if we're trying to push folks towards fast things by default.

@s1monw
Copy link
Contributor Author

s1monw commented Aug 22, 2016

Lots of little segments are bad for performance for lots of reasons but in this case I worry because they don't actually slow down the GET all that much, just the searches done later.

I don't think this is true. We only refresh iff the document we are fetching has changed since we refreshed the last time. So I guess commonly we won't refresh at all

@s1monw
Copy link
Contributor Author

s1monw commented Aug 23, 2016

I removed discuss and WIP labels and updated docs, I think it's ready

has been updated but is not yet refreshed, the get API will issue a refresh
call in-place to make the document visible. This will also make other documents
changed since the last refresh visible. In order to disable realtime GET,
one can set `realtime` parameter to `false`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set THE

@clintongormley
Copy link

minor docs changes, otherwise LGTM


@Override
public long ramBytesUsed() {
return RamUsageEstimator.NUM_BYTES_OBJECT_HEADER + Long.BYTES + RamUsageEstimator.NUM_BYTES_OBJECT_REF +
(translogLocation != null ? translogLocation.size : 0);
return RamUsageEstimator.NUM_BYTES_OBJECT_HEADER + Long.BYTES + RamUsageEstimator.NUM_BYTES_OBJECT_REF;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so Translog.Location does not need to implement Accountable anymore?

@tylerfontaine
Copy link

tylerfontaine commented Aug 24, 2016

In the event of an explosion of segments because of these refreshes, performance concerns aside (as I don't know enough to comment on them), it /could/ chew through file descriptors, which would be more problematic. Would it make sense to include a caveat to that effect in the event someone abuses this?

Or are we assuming that

  1. it shouldn't happen most of the time
  2. if it did, the segments are small enough that default merging rules would keep file descriptor use in check?

@s1monw
Copy link
Contributor Author

s1monw commented Aug 25, 2016

@tylerfontaine very good question! The assumption here is that our mechanisms of throttling indexing and only flushing compound file segments will keep the number of file descriptors at bay by triggering merging. If you have thousands of shards on a node that might become a problem but that's orthogonal to this change. makes sense?

@ismael-hasan
Copy link
Contributor

ismael-hasan commented Aug 25, 2016

As of now, one recommendation to improve indexing speed in bulk is to change the refresh interval parameter to a higher value (30 seconds) or to disable refresh meanwhile doing bulk operations.
In the following use case:

  • Periodic bulk operations
  • Quality of data is not good, the same document can be updated several times in a bulk operation

Could it happen that we slow indexing speed because of too many refresh operations happening constantly?
Does this make sense? If so, would it be worthy (and possible and advisable!) that updates done from the bulk api do not trigger the refresh, but instead use the translog?

@s1monw
Copy link
Contributor Author

s1monw commented Aug 25, 2016

@ismael-hasan your analysis of the problem is correct. If you are updating the same doc over and over again you will hit refresh way more often. If this is the rule rather than the exception in you application it will be recommended to bulk those updates together on your application layer. I am convinced that this is a rare corner case, would you agree given the overall user base? We are trying to reduce the footprint in terms of memory and APIs that exists solely because of this corner-case so reading from translog on only this situation is basically reverting this PR. We don't want to scarify the overall experience for corner-cases like this since it will make maintainability of performance and quality hard over time.

Yet, I am happy to add this limitation and your use-case to the docs to give people a heads-up, can you elaborate on your use-case a bit more so I can phrase it?

@ismael-hasan
Copy link
Contributor

ismael-hasan commented Aug 25, 2016

@s1monw It was an example use case, not an actual one. I also think it is an unusual scenario and it should be addressed from the data point of view - I saw it just a couple of times in previous jobs, for instance, I saw it with partial updates: a document is inserted in a production database triggering storing it somewhere else to be indexed later; then, a new field is populated and the change is stored also, and a second/third time; finally, when all of this goes to ingestion, we have 1 record for the original document and 3 records with addition of new fields to the object. I just wanted to raise awareness of it and to know better how we planned to handle it; if allowing the translog approach for bulk was already considered and discarded, I am happy with it!

I strongly +1 to adding a note to the documentation, it will save some headaches in the future with "Ingestion is going slow" - it is easier to plan ahead how to implement your bulk ingestion taking this into account than changing later the approach when you realize you are refreshing too often during bulk. I would add something like "When doing bulk indexing, sometimes the data will not be curated, including several operations to the same document in the same bulk operation. This approach can cause slow indexing, since it will trigger unnecessary refresh operations. It is recommended in this use case to pre-process the data to send only one addition/update per document per bulk operation which includes all of the modifications at once."

@s1monw
Copy link
Contributor Author

s1monw commented Aug 25, 2016

@ismael-hasan I though about your usecase again and I think there is a misunderstanding. This really only applied to indexing if you use elasticsearch as a primary store not when you sync a database. For instance if you have a trigger in the DB you generate a new version of doc X and either index it or put it aside (as you said) to bulk it later. Now another field is populated you create version N+1 of the doc and either index it or put it aside in the same bulk, and so on. Non of these indexing operations will use the GET API. if you use the upsert feature for something like a counter or so (note: elasticsearch is not good for counters !!!!) where you read the document from ES via GET, increment the counter and index it again that is when you will hit this. I hope this clarifies the situation?

@ismael-hasan
Copy link
Contributor

Note this is informative, by no means I am suggesting to revert this change
@s1monw There is another scenario that came to my mind, sharing it in case more people were doing this. I had a couple of cases in which the users had log entries related by "event_id", so, to be able to build an entity-centric approach, we used logstash outputs to do upserts: as the log entries related to an event were coming into the system the entity would be updated. This came with an acknowledged and accepted penalty in indexing performance.
It seems the performance in 5 will be much worse using this way of dealing with centralizing entities from logs, so it is worthy to start considering different methods to deal with this scenario.

jpountz added a commit to jpountz/elasticsearch that referenced this pull request Aug 26, 2016
This makes GET operations more consistent with `_search` operations which expect
`(stored_)fields` to work on stored fields and source filtering to work on the
`_source` field. This is now possible thanks to the fact that GET operations
do not read from the translog anymore (elastic#20102) and also allows to get rid of
`FieldMapper#isGenerated`.

The `_termvectors` API (and thus more_like_this too) was relying on the fact
that GET operations would extract fields from either stored fields or the source
so the logic to do this that used to exist in `ShardGetService` has been moved
to `TermVectorsService`. It would be nice that term vectors do not rely on this,
but this does not seem to be a low hanging fruit.
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Jun 30, 2017
tlrx added a commit that referenced this pull request Jul 4, 2017
tlrx added a commit that referenced this pull request Jul 4, 2017
tlrx added a commit that referenced this pull request Jul 4, 2017
@clintongormley clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Engine :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Feb 13, 2018
ywelsch added a commit that referenced this pull request Nov 9, 2019
The realtime GET API currently has erratic performance in case where a document is accessed
that has just been indexed but not refreshed yet, as the implementation will currently force an
internal refresh in that case. Refreshing can be an expensive operation, and also will block the
thread that executes the GET operation, blocking other GETs to be processed. In case of
frequent access of recently indexed documents, this can lead to a refresh storm and terrible
GET performance.

While older versions of Elasticsearch (2.x and older) did not trigger refreshes and instead opted
to read from the translog in case of realtime GET API or update API, this was removed in 5.0
(#20102) to avoid inconsistencies between values that were returned from the translog and
those returned by the index. This was partially reverted in 6.3 (#29264) to allow _update and
upsert to read from the translog again as it was easier to guarantee consistency for these, and
also brought back more predictable performance characteristics of this API. Calls to the realtime
GET API, however, would still always do a refresh if necessary to return consistent results. This
means that users that were calling realtime GET APIs to coordinate updates on client side
(realtime GET + CAS for conditional index of updated doc) would still see very erratic
performance.

This PR (together with #48707) resolves the inconsistencies between reading from translog and
index. In particular it fixes the inconsistencies that happen when requesting stored fields, which
were not available when reading from translog. In case where stored fields are requested, this
PR will reparse the _source from the translog and derive the stored fields to be returned. With
this, it changes the realtime GET API to allow reading from the translog again, avoid refresh
storms and blocking the GET threadpool, and provide overall much better and predictable
performance for this API.
ywelsch added a commit that referenced this pull request Nov 9, 2019
The realtime GET API currently has erratic performance in case where a document is accessed
that has just been indexed but not refreshed yet, as the implementation will currently force an
internal refresh in that case. Refreshing can be an expensive operation, and also will block the
thread that executes the GET operation, blocking other GETs to be processed. In case of
frequent access of recently indexed documents, this can lead to a refresh storm and terrible
GET performance.

While older versions of Elasticsearch (2.x and older) did not trigger refreshes and instead opted
to read from the translog in case of realtime GET API or update API, this was removed in 5.0
(#20102) to avoid inconsistencies between values that were returned from the translog and
those returned by the index. This was partially reverted in 6.3 (#29264) to allow _update and
upsert to read from the translog again as it was easier to guarantee consistency for these, and
also brought back more predictable performance characteristics of this API. Calls to the realtime
GET API, however, would still always do a refresh if necessary to return consistent results. This
means that users that were calling realtime GET APIs to coordinate updates on client side
(realtime GET + CAS for conditional index of updated doc) would still see very erratic
performance.

This PR (together with #48707) resolves the inconsistencies between reading from translog and
index. In particular it fixes the inconsistencies that happen when requesting stored fields, which
were not available when reading from translog. In case where stored fields are requested, this
PR will reparse the _source from the translog and derive the stored fields to be returned. With
this, it changes the realtime GET API to allow reading from the translog again, avoid refresh
storms and blocking the GET threadpool, and provide overall much better and predictable
performance for this API.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
das awesome :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v5.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants