-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[c++/python] Enable optional global-order writes for sparse arrays #2564
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for the non-Python side of things
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2564 +/- ##
==========================================
+ Coverage 90.21% 90.24% +0.02%
==========================================
Files 37 37
Lines 3975 4038 +63
==========================================
+ Hits 3586 3644 +58
- Misses 389 394 +5
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nguyenv this is a great step forward! Global-order writes are awesome.
But the spelling is a bit off. I went to give this two test drives: one with writing to local disk; the other with writing to a cloud URI. I didn't do the second yet but realized I have some feedback for you after doing the first.
Namely, I checked out this branch & put some prints in to make sure I was getting sort_coords
when I asked for them.
Using a performance script we've been looking at this week
https://gist.github.com/johnkerl/d5002b994caccdfb059efba081f28bb5
I put in a "sort_coords": False
line into the platform_config
.
Then I noticed that sort_coords
was not being used in the code, even though I'd included it in my platform_config
in the above script.
Then I realized that I needed to now pass platform_config
to write
. So far so good.
However, our current nesting is like
platform_config = {
"tiledb": {
"create": {
"key": "value",
...
},
}
}
and then we use TileDBCreateOptions.from_platform_config(platform_config)
to extract out the create_options = platform_config["tiledb"]["create"]
subdict and turn it into more of a struct-accessible form so devs can do create_options.key
: for example:
What confused me -- and is confusing -- is that I had to (a) put sort_coords
into my create
subkey, and (b) pass that to write
.
What we should do is:
- Keep
TileDBCreateOptions
for pulling create-time parameters out ofplatform_config["tiledb"]["create"]
- Make a new
TileDBWriteOptions
for pulling write-time parameters out ofplatform_config["tiledb"]["write"]
- Users need to be instructed how to set their
platform_config["tiledb"]["write"]["sort_coords"] = False
Other that that, I like this! It's a step forward in that it's still 'off by default' and we can subsequently look into connecting this automatically for the times it's correct to automatically do so (CSR and CSC inputs) and not when it's not (pa.Table
input to SparseNDArray
's write
).
Thanks for the feedback @johnkerl. I'll definitely make changes so that this goes into |
Also: I think we should consider the
|
Tested writing to cloud, and it is working. (I had to change the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks!!!!
…2564) * Add `TileDBWriteOptions`; Set defaults depending on input * Add assertion checks as reminder to remove warnings in 1.13 * Use `submit_and_finalize` for gow to support on cloud
Issue and/or context: #2054
The C++-ification of the SOMA API now allows us to utilize global writes when writing to sparse arrays, thus saving time by skipping the coordinate sorting step when the coordinates are already known to be ordered.
Changes:
sort_coords
setting in thePlatformConfig
, where by default, it is set toFalse
and thus still usesTILEDB_UNORDERED
sort_coords
parameter toSOMAArray::write
andManagedQuery::submit_write