Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Protocol Change Request] Delta Coordinated Commits #2598

Open
1 of 3 tasks
prakharjain09 opened this issue Feb 2, 2024 · 9 comments
Open
1 of 3 tasks

[Protocol Change Request] Delta Coordinated Commits #2598

prakharjain09 opened this issue Feb 2, 2024 · 9 comments
Labels
enhancement New feature or request
Milestone

Comments

@prakharjain09
Copy link
Collaborator

prakharjain09 commented Feb 2, 2024

Feature request

Which Delta project/connector is this regarding?

This will be a PROTOCOL change.

Overview

Today’s Delta commit protocol relies on the filesystem to provide commit atomicity. This writer-only table feature request is to allow an external commit-coordinator to decide whether a commit succeeded instead of relying on filesystem atomicity, similar to Iceberg's metastore tables. The filesystem remains the source of truth about the actual content of commits, and (unlike Iceberg metastore tables) filesystem-based readers can still access the table directly. This allows us to deal with various limitations of delta mentioned below in Motivation section.

Motivation

Today’s Delta commit protocol relies on the filesystem to provide commit atomicity, both for durability and mutual exclusion purposes. This is problematic for several reasons:

  1. No reliable way to involve a catalog with the commit
    1.1. Impossible to keep the catalog and the table state in sync, and the catalog cannot reject commit attempts it wouldn’t like, because it doesn’t even know about them until after they become durable (and visible to readers).
    1.2. No clear path to transactions that could span multiple tables and/or involve catalog updates, because filesystem commits cannot be made conditionally or atomically.
  2. No way to tie commit ownership to a table
    2.1. In general, Delta tables have no way to advertise that they are managed by catalog or LogStore X (at endpoint Y)
    2.2. Today a table could be configured to use different log stores in different clusters. Each of the LogStore implementations tries to implement put-if-absent on their own. Since these implementations are not aware of each other, this leads to lost commits and table corruption.
    2.3. The logStore setting today is cluster-wide, so you can't safely mix tables with different logStores.
  3. No way to do commits over multiple tables atomically
    3.1 We need a way where we can commit to multiple tables at once, allowing us to support multi-table transactions.

Further details

The detail proposal and the required protocol changes are sketched out in this doc.

At high level: We propose a new Delta writer table feature, coordinatedCommits. The table feature includes the following capabilities:

  1. Conforming Delta clients will refuse to attempt filesystem-based commits against a table that enables coordinateCommits.
  2. A table that uses coordinated commits can include metadata (dictated by the table’s commit coordinator) that would-be writers can use to correctly perform a commit.
  3. Delta client passes the actions that need to be committed to the commit-store.
    a. The commit-store writes the actions in a unique commit file.
    b. The commit-store then does a commit as per its spec.
  4. If a commit fails due to physical conflict (e.g. two racing blind appends), the client can verify no logical conflicts exist, rebase their actions and then reattempt the commit with commit-store with an updated version, after.
  5. Once commit V has been ratified, it should be copied to V.json (= “backfilled”) in order to bound the amount of history the commit coordinator is required to keep for each table.
  6. The commit-store maintains information about all the un-backfilled commits which will be used by other other delta clients to access the most recent snapshot of the table.
  7. At some point after backfill completes, the commit coordinator deletes the internal mapping for that commit.

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

  • Yes. I can contribute this feature independently.
  • Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
  • No. I cannot contribute this feature at this time.
@prakharjain09 prakharjain09 added the enhancement New feature or request label Feb 2, 2024
@prakharjain09 prakharjain09 changed the title [Protocol Change Request][Feature Request] Delta Managed Commits [Protocol Change Request] Delta Managed Commits Feb 2, 2024
@tdas
Copy link
Contributor

tdas commented Feb 7, 2024

Can we remove the checkbox for which project/connector does this apply to? This is a protocol change, it will eventually apply to every one!

@felipepessoto
Copy link
Contributor

felipepessoto commented Feb 23, 2024

@prakharjain09, at "Converting a managed-commit table to filesystem table" it says:

  1. Write the commit which removes the ownership.
    • Either the commit-owner writes the backfilled commit file directly.
    • Or it writes an unbackfilled commit and ensures that it is backfilled reliably. Until the backfill is done, table will be in unusable state:
      • the filesystem based delta clients won't be able to write to such table as they still believe that table has managed-commit enabled.
      • the managed-commit aware delta clients won't be able to write to such table as the commit-owner won't accept any new
        commits. In such a scenario, they could backfill required commit themselves (preferably using PUT-if-absent) to unblock themselves.

After the un-backfilled commit, a managed commits aware delta reader could contact the commit-owner to get recent commits. In that case, the commit-owner would return the commit which removes the ownership?
If so, the Delta client would need to be smart to know the ownership removal information is still un-backfilled and don't try to commit to filesystem.

tdas pushed a commit that referenced this issue Mar 20, 2024
…e.getCommits API (#2712)

## Protocol Change Request

### Description of the protocol change
This change puts added responsibility on CommitStore to return the
latestTableVersion in the CommitStore.getCommits API along with the list
of Commits.

Protocol RFC issue: #2598

### Willingness to contribute

The Delta Lake Community encourages protocol innovations. Would you or
another member of your organization be willing to contribute this
feature to the Delta Lake code base?

- [x] Yes. I can contribute.
- [ ] Yes. I would be willing to contribute with guidance from the Delta
Lake community.
- [ ] No. I cannot contribute at this time.
@tdas tdas added this to the 4.0 milestone Mar 25, 2024
@felipepessoto
Copy link
Contributor

@prakharjain09, @sumeet-db, could you help validate if this question is relevant?

@prakharjain09, at "Converting a managed-commit table to filesystem table" it says:

  1. Write the commit which removes the ownership.

    • Either the commit-owner writes the backfilled commit file directly.

    • Or it writes an unbackfilled commit and ensures that it is backfilled reliably. Until the backfill is done, table will be in unusable state:

      • the filesystem based delta clients won't be able to write to such table as they still believe that table has managed-commit enabled.
      • the managed-commit aware delta clients won't be able to write to such table as the commit-owner won't accept any new
        commits. In such a scenario, they could backfill required commit themselves (preferably using PUT-if-absent) to unblock themselves.

After the un-backfilled commit, a managed commits aware delta reader could contact the commit-owner to get recent commits. In that case, the commit-owner would return the commit which removes the ownership? If so, the Delta client would need to be smart to know the ownership removal information is still un-backfilled and don't try to commit to filesystem.

@prakharjain09
Copy link
Collaborator Author

After the un-backfilled commit, a managed commits aware delta reader could contact the commit-owner to get recent commits. In that case, the commit-owner would return the commit which removes the ownership?
If so, the Delta client would need to be smart to know the ownership removal information is still un-backfilled and don't try to commit to filesystem.

@felipepessoto Thats correct. The Delta client would need to be smart enough to know:

  • The table was recently converted back to filesystem table
  • The table still has commits which are not backfilled
  • The delta client should not write next commit Y until everything before Y is backfilled.
  • The delta client should infact figure out this missing backfilled commit from LogSegment and do the backfilling itself if possible.

This above requirements could be baked in the managed-commit spec.
Essentially if the table has managed-commit enabled, then the writer has to do following extra steps if it is not "active" (i.e. table doesn't have an owner):

  • Identify unbackfilled commits and backfill them if needed before writing filesystem based commit Y.

"managed-commit enabled" => PROTOCOL has managed-commit enabled
"managed-commit active" => PROTOCOL has managed-commit enabled and Metadata contains an owner

So when a user wants to downgrade and remove managed-commits completely from the table (for compat reasons etc), it will be a 2 step process:

  • Commit X: Just removed the table owner - managed-commit is still enabled, just not active
  • Commit X+1: Remove the table owner from PROTOCOL also - managed-commit is disabled

@scovich
Copy link
Collaborator

scovich commented Mar 29, 2024

This feature request is to allow delta tables whose source of truth for the actual commit is an external commit-owner and not the filesystem (s3, abfs etc)

This is confusing to readers. We should clarify to match what the design actually calls for:

This writer-only table feature request is to allow an external commit-owner to decide whether a commit succeeded instead of relying on filesystem atomicity, similar to Iceberg's metastore tables. The filesystem remains the source of truth about the actual content of commits, and (unlike Iceberg metastore tables) filesystem-based readers can still access the table directly.

@adriangb
Copy link

I came here from delta-io/delta-rs#2455 (comment). After giving the proposal a quick read (apologies if I’m missing something) I fear that it may not address or improve my use case in delta-io/delta-rs#2455 (comment). In particular, this proposal does not do much for concurrent writers. In fact, it goes a bit further towards prohibiting the approach I’m exploring than I’ve found in any other spec: “ Multiple clients might try to contact the commit-store for attempting the same commit version. In such a case, the commit-store chooses which request to accept (if any). The commit-store rejects all other requests for that commit version and responds with error - this tells the client that some other writer has succeeded and it should rebase itself.”

Could this proposal allow for commit-manager to hand out non-consecutive commit ids? For the specific case of appending new data to a table (not modifying it, not depending on existing data) this works quite well. I feel like it’s a reasonable use case. Maybe commit-manager can take a flag that says “this commit doesn’t care about existing data and can happen at any time” to adopt this behavior?

@prakharjain09
Copy link
Collaborator Author

Yes managed commits does not solve usecase mentioned in delta-io/delta-rs#2455 (comment) . Rebasing after a physical conflict is needed because things like rowids / inCommitTimstamp and schema evolution do in fact affect even supposedly blind appends.

That said, supporting blind appends better is an interesting use case to think about (in order to avoid quadratic cost in commit retries) -- irrespective of this proposal.

@adriangb
Copy link

adriangb commented Jun 5, 2024

That said, supporting blind appends better is an interesting use case to think about (in order to avoid quadratic cost in commit retries) -- irrespective of this proposal.

Agreed!

@prakharjain09 prakharjain09 changed the title [Protocol Change Request] Delta Managed Commits [Protocol Change Request] Delta Coordinated Commits Jun 7, 2024
@prakharjain09
Copy link
Collaborator Author

Renaming "managed commit" to "coordinated commit" to avoid ambiguity with "managed tables". This feature enables coordination with any kind of coordinator - catalogs, databases, key value stores, etc. so is completely independent of catalog-defined table types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

5 participants