Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking: deprecate safe epoch and generalize time travel query #18214

Open
4 tasks
wenym1 opened this issue Aug 23, 2024 · 2 comments
Open
4 tasks

Tracking: deprecate safe epoch and generalize time travel query #18214

wenym1 opened this issue Aug 23, 2024 · 2 comments

Comments

@wenym1
Copy link
Contributor

wenym1 commented Aug 23, 2024

Proposal

Generalize time travel query for all batch queries, which means that all batch query will be handled as time travel query.

In a single HummockVersion, we only provide a single view at the committed epoch rather than views at all epochs between safe_epoch and committed_epoch, and as a result, we can then deprecate safe_epoch.

Moreover, we need to deprecate support on barrier read on uncommitted epoch with consistency.

Motivation

Currently, we have safe_epoch in HummockVersion to specify that, in this HummockVersion, we are safe to make a query on any epoch above this safe_epoch. In other word, we support querying multiple versions of data under different epochs providing a single HummockVersion. The reason for this feature is that, in each CN, we only have a single latest HummockVersion (ignored those versions pinned at created iterators), but in frontend, each session will pin an epoch (PinnedSnapshot), and we want to serve the query from different pinned epoch with this single latest HummockVersion.

This design makes the communication between frontend and CN elegant, but comes with price on the other hands:

  • In hummock, for a key, we may store multi-version of its values in different epochs, and these values are stored physically next to each other in the sst. In most streaming internal states, we only read the latest value of a key, and therefore storing multiple versions of value will incur unnecessary cost.
  • We need to maintain and even persist the pinned snapshot, so that frontend won't be affected when meta node crashes and restarts

After we support time-travel in batch query, to support queries on different epochs, we don't have to rely on a single hummock version, and instead, we can rebuild a hummock version for a specific epoch. Therefore, we can generalize time travel query for all batch queries, which means for all batch queries, we will first figure out a hummock version for the provided epoch, either from the latest version, or rebuild a new version, and then read data the version, and then each hummock version does not need to store multiple versions of a key anymore, and the safe_epoch can be deprecated.

Besides, we need to deprecate support on barrier read on uncommitted epoch with consistency. Currently, for uncommitted barrier read, we pin an uncommitted non-checkpoint current epoch and use this epoch in batch query. However, since this pinned epoch is non-checkpoint epoch, after this checkpoint epoch gets committed, the pinned non-checkpoint epoch will be below the committed epoch, and to support consistent query on this epoch, the committed version will still have to maintain values of multiple versions between the committed epoch and the previous checkpoint epoch. To make things easier, we can still support barrier read, but the batch query of barrier read won't carry any epoch information anymore. The barrier read batch query always reads the latest uncommitted data of each table, and the consistency is ignored.

Tracking

@github-actions github-actions bot added this to the release-2.1 milestone Aug 23, 2024
@wenym1
Copy link
Contributor Author

wenym1 commented Aug 23, 2024

cc @hzxa21 @zwang28

@hzxa21
Copy link
Collaborator

hzxa21 commented Aug 26, 2024

LGTM for the proposal in general!

To make things easier, we can still support barrier read, but the batch query of barrier read won't carry any epoch information anymore. The barrier read batch query always reads the latest uncommitted data of each table, and the consistency is ignored.

+1. Scarifying consistency for simplicity in the context of read uncommitted query sounds reasonable to me. cc @fuyufjh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants