Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A unified event store for Aleph and ingest-file #2128

Open
sunu opened this issue Feb 11, 2022 · 0 comments
Open

A unified event store for Aleph and ingest-file #2128

sunu opened this issue Feb 11, 2022 · 0 comments

Comments

@sunu
Copy link
Contributor

sunu commented Feb 11, 2022

Problems that an event store can solve

With an event store we can tackle a few problems:

  1. It will help us maintain ingest logs for collections and entities which will help us keep track of error and show users the history of ingest events for a particular collection or an entity. (See Improve the UX for bulk uploading and processing of large number of files #2124 (comment))
  2. It will help us store notifications for users and collections. Currently, we use Elasticsearch to store notifications.
  3. We can use it as an audit log to be able to able to audit
    • the actions of a user
    • the actions in a collection
    • the actions on an entity
  4. ftm-store is kind of an event store already where we store entity fragments and combine them to generate full entities.

Features we need

  1. The ability to store events permanently
    • We need these events to persist until they are explicitly flushed
  2. The ability to query, sort these events.
    • ftm-store needs to be able to sort fragments based on entity id, origin etc
    • for audit, we may need to query for events from a particular user within a certain time frame
  3. The ability possibly run this locally without a complex set up.
    • This is helpful if we are generating FtM locally outside Aleph
    • It will also be helpful if we implement a way to ingest files offline to generate ftm-bundles (See Run ingestors on the CLI to generate "ftm-bundles" ingest-file#223). We'll need a way to store ingest logs locally so that we can aggregate them and print out a report at the end of the process.

Possible implementation methods

  • The industry seems to like Kafka as an event store.
    • Kafka can store events permanently.
    • But I couldn't find a easy way to query or sort the stored data without replaying it from the beginning. One possible way to do it is by using something like KSQL on top of Kafka.
    • Running Kafka for small local jobs seems like a overkill.
  • The other option is use SQL
    • We already use PostgreSQL/SQLite for ftm-store. It can of course store the data permanently.
    • It's easy to query and sort the stored data
    • For local jobs, sqlite is a great option.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants