Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transactional/ACID semantics #150

Open
y4n9squared opened this issue Apr 4, 2024 · 1 comment
Open

Transactional/ACID semantics #150

y4n9squared opened this issue Apr 4, 2024 · 1 comment

Comments

@y4n9squared
Copy link

y4n9squared commented Apr 4, 2024

I have a general question in regard to:

and this sentence in the Blog:

Safety of parallel operations when many machines are accessing the same dataset is achieved through the use of optimistic concurrency, which maintains compatibility with diverse underlying storage layers (including Cloud storage platforms, such as GCS, as well as local filesystems) without significantly impacting performance. TensorStore also provides strong ACID guarantees for all individual operations executing within a single runtime.

I created a dummy dataset with the zarr + S3 drivers:

2024-04-04 15:33:22        230 ts/yang-test-dataset/.zarray
2024-04-04 16:43:54      48573 ts/yang-test-dataset/0.0.0
2024-04-04 16:43:54      48573 ts/yang-test-dataset/0.0.1
2024-04-04 16:43:54      48573 ts/yang-test-dataset/0.0.2
2024-04-04 16:43:54      48573 ts/yang-test-dataset/0.0.3
2024-04-04 16:43:54      48573 ts/yang-test-dataset/0.0.4
2024-04-04 16:43:54      48573 ts/yang-test-dataset/0.0.5
2024-04-04 16:43:54      48573 ts/yang-test-dataset/0.0.6
2024-04-04 16:43:54      48573 ts/yang-test-dataset/0.0.7
2024-04-04 16:43:54      48573 ts/yang-test-dataset/0.0.8
2024-04-04 16:43:54      48573 ts/yang-test-dataset/0.0.9

and then created a situation where the next write to chunk 0.0.3 would fail. Running under a transaction

with ts.Transaction() as txn:
    result = ds.with_transaction(txn)[80:82, 99:102, :] = [[[1],[2],[3]], [[4], [5], [6]]]

would throw

Traceback (most recent call last):
  File "/home/yang.yang/workspaces/tensorstore/.yang/foo.py", line 33, in <module>
    with ts.Transaction() as txn:
ValueError: PERMISSION_DENIED: Error writing "ts/yang-test-dataset/0.0.3": HTTP response code: 403 with body: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>SK6BWG5ESTC2NVJ6</RequestId><HostId>5z3/QZmVne5TyFJUH0A0swSAtyyhsl47I/z7AjULiGmsj1QAtf3JEA6d/TAuWH/ts1xCHJmVucM=</HostId></Error> [source locations='tensorstore/kvstore/s3/s3_key_value_store.cc:777\ntensorstore/kvstore/kvstore.cc:373'

but the S3 bucket after this operation looks like this:

2024-04-04 15:33:22        230 ts/yang-test-dataset/.zarray
2024-04-04 17:14:57      48573 ts/yang-test-dataset/0.0.0
2024-04-04 17:14:58      48573 ts/yang-test-dataset/0.0.1
2024-04-04 17:14:58      48573 ts/yang-test-dataset/0.0.2
2024-04-04 16:43:54      48573 ts/yang-test-dataset/0.0.3  <--- not updated
2024-04-04 17:14:57      48573 ts/yang-test-dataset/0.0.4
2024-04-04 17:14:58      48573 ts/yang-test-dataset/0.0.5
2024-04-04 17:14:57      48573 ts/yang-test-dataset/0.0.6
2024-04-04 17:14:58      48573 ts/yang-test-dataset/0.0.7
2024-04-04 17:14:57      48573 ts/yang-test-dataset/0.0.8
2024-04-04 17:14:57      48573 ts/yang-test-dataset/0.0.9

So from the perspective of an observer (who may eventually want to load this dataset again), the operation does not appear to be transactional. So when the blog says transactional with a single runtime, do you mean that the process's view of ds when the context manager exits is transactional, but otherwise make no guarantees about the state of the underlying storage?

If one sets

with ts.Transaction(atomic=True) as txn:
    ...

then if a write would span multiple chunks, I see an error

ValueError: Cannot read/write "ts/yang-test-dataset/.zarray" and read/write "ts/yang-test-dataset/0.0.0" as single atomic transaction [source locations='tensorstore/internal/cache/kvs_backed_cache.h:221\ntensorstore/internal/cache/async_cache.cc:660\ntensorstore/internal/cache/async_cache.h:383\ntensorstore/internal/cache/chunk_cache.cc:438\ntensorstore/internal/grid_partition.cc:246\ntensorstore/internal/grid_partition.cc:246\ntensorstore/internal/grid_partition.cc:246']

I'm guessing this is expected since you have no way of performing a transactional write across multiple S3 objects?

Lastly, on the topic of "optimistic concurrency and compatibility with GCS/other storage layers", since AFAIK S3 does not support conditional PUTs the way that GCS does, is there a possibility of data loss when using S3?

Thanks in advance!

@jbms
Copy link
Collaborator

jbms commented Apr 7, 2024

The S3 support was added recently but we indeed need to clarify the limitations in the documentation.

S3 lacks conditional write support and it is indeed possible with multiple concurrent writes to the same object that some writes will be lost.

There is a strategy for implementing atomic writes on S3 under certain assumptions on the timestamps, but it would require a list operation in order to read, which may be costly. When using this strategy with ocdbt, only a single list operation would be needed for the manifest, and subsequent reads (using the cached manifest) would be normal read operations, and multi-key atomic transactions could also be supported (currently a small amount of work remains to actually support both s3 and multi-key atomic operations with ocdbt).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants