-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to s3 client v2 #7936
Upgrade to s3 client v2 #7936
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this awesome pr and adding tests @fm3 🎉
Codewise: I only found that you are "explicitly using" net.liftweb.common.
instead importing the package. Could you please tell me why that is?
Testing worked on the first check:
But it seems that the dataset under s3://janelia-cosem-datasets/jrc_mus-nacc-4/jrc_mus-nacc-4.zarr/
is not displayed correctly in webknossos. I also checked this on the master and its the same case there. Maybe the dataset was somehow changed? 🤔 Do you maybe know more about this?
Here is the link to the dataset on this branch's dev instance: https://newsclient.webknossos.xyz/datasets/sample_organization/test-two#2541,2561,381,0,1.3
And here on the master dev instance: https://master.webknossos.xyz/datasets/sample_organization/Test-Import-Int16-S3#2541,2561,381,0,1.3
But as this pr uses a different dataset in the tests, I'd say it is good to go?! (But please clarify why you used net.liftweb.common.
within the code :)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala
Show resolved
Hide resolved
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala
Outdated
Show resolved
Hide resolved
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala
Outdated
Show resolved
Hide resolved
Yep, that’s unrelated. The data did not change afaik, but we never supported reading zstd-compressed zarr2 data. I wrote a separate issue: #7960
That’s right. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your latest commit. IMO this is a great improvement 🎉
Let's mergy merge
@fm3 @MichaelBuessemeyer @normanrz thanks for these updates -- we were just testing with a similar update for our fork (lincbrain@24d2a09) and were about to push upstream 😂 -- quite the coincidence! One follow-up question here -- but since there is now async behavior for the |
@aaronkanzer Oh, I hope you didn’t spend too much effort on something that we now also did? If you are missing features / upgrades like that, feel free to open issues in this repository, we will then discuss if they also make sense for us. There is always a chance that we want to go in similar directions. You are correct, the optimal threading configuration may be affected by this change, especially if you know that you have mostly s3-hosted datasets. Then this change means fewer blocked threads, meaning a lower total number of threads may be more efficient. The values in the default configuration are very rough guesses, and the optmial values will depend on your workload. Did you already make some measurements on this and did you find something new? We’d be interested :) Since in our case we usually have a very diverse set of access paths, with gcs and local file being at least as common as s3. These are not yet properly async, so I do not think we will change the default values at this time. Hope this helps! |
@fm3 Thanks for the response --this helps quite a bit -- ironically, we found the async bottleneck almost at the same time this PR was merged, so all good -- we will definitely open issues in the future nevertheless when applicable. Right now, we are experimenting with AWS instance types vs. Java heap size to see what is the threshold in which rendering and annotating is performant. With a We then bumped the After that experiment, we bumped up to an I think we will still tinker with the thread configuration as we want to maximize performance (although we are coupled to AWS right now) -- will keep you all posted on this thread -- overall; however, we are able to use the open-source WebKNOSSOS framework for our use cases now :) Cc @kabilar |
The new s3 java client lib offers an async api 🎉
URL of deployed dev instance (used for testing):
Steps to test:
TODOs:
Issues: