Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlite-specific performance tuning #139

Merged
merged 1 commit into from
Jul 9, 2024

Conversation

NLincoln
Copy link
Contributor

@NLincoln NLincoln commented Jun 9, 2024

I use attic on a "homelab" (really an old tower in my office), and I noticed that pushing to attic-server was almost unbearably slow for large files.

I noticed some strange behavior: the upload would race to 100%, but then hang there for awhile, thus reducing throughput.

To investigate, I added some opentelemetry instrumentation, and then sent that to a free honeycomb account. Disclosure: honeycomb is my employer. That code is not in this PR, but I can PR that code upon request - it's just standard tracing-opentelemetry code, and isn't vendor-specific :)

Anyways, the initial investigation proved fruitful:

image

In the above, I'm tracing the upload_chunk function. Notably, the "upload_file" span is the bit that is actually writing the file to disk: all the other spans are doing sqlite work

In aggregate, uploading chunks looked like this:
image

450ms, on average, per chunk! Even with the default parallelism of 10, that's still a ton of time for a meaningful number of chunks.

Here is the above graph, after this change:

image

And here is a trace:

image

And here's an aggregate view of those subspans:

image

As you can see, the work of actually writing the file takes up the majority of the trace time, which makes sense.

@NLincoln
Copy link
Contributor Author

NLincoln commented Jun 9, 2024

Running it this morning, the throughput still isn't as high as I want, and the overhead seems to get really bad when uploading multiple large files in parallel. It's better, but in an ideal world we're fully constrained by the throughput of the disk, instead of the throughput of the metadata store

@zhaofengli
Copy link
Owner

Thanks for the investigation! The new settings seem to be reasonable so let's merge this for now. synchronous=normal should also help with #113, even though I have not done any testing.

@zhaofengli zhaofengli merged commit ee8f374 into zhaofengli:main Jul 9, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants