Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to s3 client v2 #7936

Merged
merged 18 commits into from
Aug 1, 2024
Merged

Upgrade to s3 client v2 #7936

merged 18 commits into from
Aug 1, 2024

Conversation

fm3
Copy link
Member

@fm3 fm3 commented Jul 22, 2024

The new s3 java client lib offers an async api 🎉

URL of deployed dev instance (used for testing):

Steps to test:

TODOs:

  • test range request
  • test directory listing
  • test error handling for directory listing (should be Box.Empty)
  • Region
  • test with custom endpoint
  • Encoding

Issues:


  • Updated changelog
  • Removed dev-only changes like prints and application.conf edits
  • Considered common edge cases
  • Needs datastore update after deployment

@fm3 fm3 self-assigned this Jul 22, 2024
@fm3 fm3 changed the title WIP: Use java s3 client v2 Upgrade to s3 client v2 Jul 25, 2024
@fm3 fm3 marked this pull request as ready for review July 25, 2024 11:16
Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this awesome pr and adding tests @fm3 🎉

Codewise: I only found that you are "explicitly using" net.liftweb.common. instead importing the package. Could you please tell me why that is?

Testing worked on the first check:
But it seems that the dataset under s3://janelia-cosem-datasets/jrc_mus-nacc-4/jrc_mus-nacc-4.zarr/ is not displayed correctly in webknossos. I also checked this on the master and its the same case there. Maybe the dataset was somehow changed? 🤔 Do you maybe know more about this?

Here is the link to the dataset on this branch's dev instance: https://newsclient.webknossos.xyz/datasets/sample_organization/test-two#2541,2561,381,0,1.3
And here on the master dev instance: https://master.webknossos.xyz/datasets/sample_organization/Test-Import-Int16-S3#2541,2561,381,0,1.3

But as this pr uses a different dataset in the tests, I'd say it is good to go?! (But please clarify why you used net.liftweb.common. within the code :)

@fm3
Copy link
Member Author

fm3 commented Jul 31, 2024

But it seems that the dataset under s3://janelia-cosem-datasets/jrc_mus-nacc-4/jrc_mus-nacc-4.zarr/ is not displayed correctly in webknossos.

Yep, that’s unrelated. The data did not change afaik, but we never supported reading zstd-compressed zarr2 data. I wrote a separate issue: #7960

Is there a naming collision that I did not discover?

That’s right. The transformWith method to access exceptions within futures works with scala.util.Try, one flavor of which is scala.util.Failure, creating a naming collision with net.liftweb.common.failure. To make this more explicit, I now opted for renaming imports rather than using the full qualifiers in code. Do you agree that this is more discoverable?

Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your latest commit. IMO this is a great improvement 🎉

Let's mergy merge

@fm3 fm3 enabled auto-merge (squash) August 1, 2024 11:12
@fm3 fm3 merged commit ed54ad2 into master Aug 1, 2024
2 checks passed
@fm3 fm3 deleted the new-s3-client branch August 1, 2024 11:31
@aaronkanzer
Copy link

aaronkanzer commented Aug 1, 2024

@fm3 @MichaelBuessemeyer @normanrz thanks for these updates -- we were just testing with a similar update for our fork (lincbrain@24d2a09) and were about to push upstream 😂 -- quite the coincidence!

One follow-up question here -- but since there is now async behavior for the S3DataVault does this mean that threading configuration can change slightly? -- as defined here -- we are continuing to tinker with these values to see if it makes the requests even faster -- thanks in advance for the consideration.

Cc @kabilar @ayendiki

@fm3
Copy link
Member Author

fm3 commented Aug 5, 2024

@aaronkanzer Oh, I hope you didn’t spend too much effort on something that we now also did? If you are missing features / upgrades like that, feel free to open issues in this repository, we will then discuss if they also make sense for us. There is always a chance that we want to go in similar directions.

You are correct, the optimal threading configuration may be affected by this change, especially if you know that you have mostly s3-hosted datasets. Then this change means fewer blocked threads, meaning a lower total number of threads may be more efficient. The values in the default configuration are very rough guesses, and the optmial values will depend on your workload. Did you already make some measurements on this and did you find something new? We’d be interested :)

Since in our case we usually have a very diverse set of access paths, with gcs and local file being at least as common as s3. These are not yet properly async, so I do not think we will change the default values at this time.

Hope this helps!

@aaronkanzer
Copy link

@fm3 Thanks for the response --this helps quite a bit -- ironically, we found the async bottleneck almost at the same time this PR was merged, so all good -- we will definitely open issues in the future nevertheless when applicable.

Right now, we are experimenting with AWS instance types vs. Java heap size to see what is the threshold in which rendering and annotating is performant. With a m5.2xlarge without JAVA_OPTS defined/overwritten, we quickly observed java heap out-of-memory issues for the API (this was with our highest resolution zarr asset that pulls from S3/zooms down to the ~0.43 micron level)

We then bumped the JAVA_OPTS on the m5.2xlarge to JAVA_OPTS=-Xmx24g -Xms12g and saw that with two concurrent users, via observations from jstat, that the usage of the heap size entered in the high 90% percentiles with degraded performance -- this performance included slow round-trip response times from the server.

After that experiment, we bumped up to an r5.2xlarge with JAVA_OPTS=-Xmx48g -Xms24g and saw much, much better performance with the same asset with concurrent users.

I think we will still tinker with the thread configuration as we want to maximize performance (although we are coupled to AWS right now) -- will keep you all posted on this thread -- overall; however, we are able to use the open-source WebKNOSSOS framework for our use cases now :)

Cc @kabilar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade to AWS Client lib v2, use new async API?
3 participants