Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add differential Elasticsearch sync #247

Closed
2 of 4 tasks
Jotschi opened this issue Jan 5, 2018 · 1 comment
Closed
2 of 4 tasks

Add differential Elasticsearch sync #247

Jotschi opened this issue Jan 5, 2018 · 1 comment

Comments

@Jotschi
Copy link
Contributor

Jotschi commented Jan 5, 2018

We currently drop the whole index and reindex everything if we detect an unclean shutdown. We should add a way to scan for changes in-between ES and Mesh and only sync the needed data. This mechanism is especially important when dealing with an external elasticsearch cluster.

Questions

  • Can we encode the version of the document within the documentId? This would allow us to quickly decide whether a found document needs to be removed (it may be outdated).
  • Node containers have versions. We can use that version for containers but what versions are we using for Users, Roles etc.?
  • Should we encode the element version in the document id?

Tasks

  • Add differential index sync
  • Add version number to each element which can be indexed.
  • Failure handling for ES must be added
  • We could also include a ES-Mesh structure revision hash in the index name to avoid usage incompatible indices after an mesh update.

Sync with lookup

Graph→ES

  1. Iterate over elements in the graph
    Read uuid and version of 1000 elements
  2. Check whether the elements exists in the index
  3. Store missing elements in the index

Rinse and repeat until all elements have been processed.

ES ←Graph

  1. Setup scroll query over all elements in the index
  2. Check whether the encountered element exists in the graph
  3. Remove elements from the index which have no graph representation

Pro:

  • Ensures easy consistency

Con:

  • Can potentially take longer to sync
  • Requires additional tracking of document versions in the search index

Snapshot Handling / Journal Handling

Another possible solution:
We could the create snapshots of the elastic search indices periodically: (eg once per day)
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html#_snapshot

If mesh was shutdown uncleanly, the snapshot is restored on startup and only the changes since the last snapshot need to be applied to the restored indices. In the graph we need to store changes (deletions) since the last snapshot.

Pro:

  • Results in most cases in quicker sync time

Con:

  • Requires more effort to track changes after snapshots have been taken
@Jotschi Jotschi changed the title Add differential ElasticSearch sync Add differential Elasticsearch sync Feb 14, 2018
@Jotschi Jotschi added this to the 1.0.0 milestone Mar 2, 2018
@Jotschi Jotschi added the feature label Mar 3, 2018
@gentics gentics deleted a comment from elbird Mar 16, 2018
@Jotschi
Copy link
Contributor Author

Jotschi commented Apr 9, 2018

Released with 0.18.0

@Jotschi Jotschi closed this as completed Apr 9, 2018
@Jotschi Jotschi removed this from the 1.0.0 milestone Apr 9, 2018
@Jotschi Jotschi self-assigned this Apr 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant