Add differential Elasticsearch sync #247

Jotschi · 2018-01-05T16:05:11Z

We currently drop the whole index and reindex everything if we detect an unclean shutdown. We should add a way to scan for changes in-between ES and Mesh and only sync the needed data. This mechanism is especially important when dealing with an external elasticsearch cluster.

Questions

Can we encode the version of the document within the documentId? This would allow us to quickly decide whether a found document needs to be removed (it may be outdated).
Node containers have versions. We can use that version for containers but what versions are we using for Users, Roles etc.?
Should we encode the element version in the document id?

Tasks

Add differential index sync
Add version number to each element which can be indexed.
Failure handling for ES must be added
We could also include a ES-Mesh structure revision hash in the index name to avoid usage incompatible indices after an mesh update.

Sync with lookup

Graph→ES

Iterate over elements in the graph
Read uuid and version of 1000 elements
Check whether the elements exists in the index
Store missing elements in the index

Rinse and repeat until all elements have been processed.

ES ←Graph

Setup scroll query over all elements in the index
Check whether the encountered element exists in the graph
Remove elements from the index which have no graph representation

Pro:

Ensures easy consistency

Con:

Can potentially take longer to sync
Requires additional tracking of document versions in the search index

Snapshot Handling / Journal Handling

Another possible solution:
We could the create snapshots of the elastic search indices periodically: (eg once per day)
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html#_snapshot

If mesh was shutdown uncleanly, the snapshot is restored on startup and only the changes since the last snapshot need to be applied to the restored indices. In the graph we need to store changes (deletions) since the last snapshot.

Pro:

Results in most cases in quicker sync time

Con:

Requires more effort to track changes after snapshots have been taken

Jotschi · 2018-04-09T11:56:21Z

Released with 0.18.0

Jotschi added f/elasticsearch enhancement labels Jan 5, 2018

Jotschi mentioned this issue Jan 23, 2018

Update Elasticsearch from 2.4 to 6.1 #169

Closed

16 tasks

Jotschi changed the title ~~Add differential ElasticSearch sync~~ Add differential Elasticsearch sync Feb 14, 2018

Jotschi added this to the 1.0.0 milestone Mar 2, 2018

Jotschi added the feature label Mar 3, 2018

gentics deleted a comment from elbird Mar 16, 2018

Jotschi closed this as completed Apr 9, 2018

Jotschi removed this from the 1.0.0 milestone Apr 9, 2018

Jotschi self-assigned this Apr 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add differential Elasticsearch sync #247

Add differential Elasticsearch sync #247

Jotschi commented Jan 5, 2018 •

edited

Loading

Jotschi commented Apr 9, 2018

Add differential Elasticsearch sync #247

Add differential Elasticsearch sync #247

Comments

Jotschi commented Jan 5, 2018 • edited Loading

Questions

Tasks

Sync with lookup

Graph→ES

ES ←Graph

Snapshot Handling / Journal Handling

Jotschi commented Apr 9, 2018

Jotschi commented Jan 5, 2018 •

edited

Loading