sqlite-specific performance tuning #139
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I use attic on a "homelab" (really an old tower in my office), and I noticed that pushing to attic-server was almost unbearably slow for large files.
I noticed some strange behavior: the upload would race to 100%, but then hang there for awhile, thus reducing throughput.
To investigate, I added some opentelemetry instrumentation, and then sent that to a free honeycomb account. Disclosure: honeycomb is my employer. That code is not in this PR, but I can PR that code upon request - it's just standard tracing-opentelemetry code, and isn't vendor-specific :)
Anyways, the initial investigation proved fruitful:
In the above, I'm tracing the upload_chunk function. Notably, the "upload_file" span is the bit that is actually writing the file to disk: all the other spans are doing sqlite work
In aggregate, uploading chunks looked like this:
450ms, on average, per chunk! Even with the default parallelism of 10, that's still a ton of time for a meaningful number of chunks.
Here is the above graph, after this change:
And here is a trace:
And here's an aggregate view of those subspans:
As you can see, the work of actually writing the file takes up the majority of the trace time, which makes sense.