-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
share a pool of (un)marshallers #38
Conversation
Also, use a cloner instead of unmarshalling when wrapping an object.
Increasing `numWorkers` doesn't seem to help at all (not supprising). Really, we could probably drop to NumCPUs (instad of 2x that) and we'd still be fine.
More than that doesn't help. +1 helps for the case where a goroutine holding a worker gets unscheduled.
encoding/cloner.go
Outdated
// PooledCloner is a thread-safe pooled object cloner. | ||
type PooledCloner struct { | ||
Count int | ||
cloners chan refmt.Cloner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this better than a sync.Pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was under the impression that the elements in a pool get GCed every GC cycle. However, it does appear faster in the benchmarks and we really care more about throughput than latency so I've switched to pools.
encoding/cloner.go
Outdated
} | ||
|
||
// SetAtlas set sets the pool's atlas. It is *not* safe to call this | ||
// concurrently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not safe to call concurrently, make it only possible to do at construction time?_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we need to update the atlas each time the user registers a new type (which must happen at init). We could lock everything in place with a sync.Once on first use but that'd add a bit of overhead and I'm not sure if it's worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So make the caller register all types before starting to use it, and if they change their mind, they have to make a new instance of the whole pool.
I struggle with how else I'd use this in a semantically race-free way short of either A) doing the all-setup-before-use flow anyway, or B) mutexing the whole pool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So make the caller register all types before starting to use it, and if they change their mind, they have to make a new instance of the whole pool.
That's what we do.
Although, yeah, we should just replace the entire pool instead of having this method (it exists because it used to do more).
IIRC, pools are cleaned every GC cycle. However, by benchmarks aren't actually *faster* with this change so I figured we might as well go for it. Our issue here is really throughput, not latency, so even *if* these pools are cleaned on GC, it's still useful.
So I had a long deep teatime and think about this, and a couple convolutions came to mind that existed before this branch, and aren't getting better. (That said, I'm also about to lament a bigger design issue with more inertia than what's changing here, so this whole comment shouldn't be construed as blocking this changeset if you want to merge it.)
Therefore I'd like to revisit the Doing that change might make it harder to pool the {un,}marshallers and cloners again -- I'm not sure refmt exposes the ability to provide an atlas argument while also reusing the rest of the memory in a marshaller. But if that's a problem, that's a bug in refmt, and should be fixed rather than worked around. (But again, this is commentary about bigger scope than this changeset. Not something that has to be hashed out in this branch.) |
We don't need to mutate, just replace.
Is there any way to merge/extend an atlas? We always need the CID transformation one. IMO, we actually don't want the current system as it allows users to break, e.g., Resolve by registering invasive transformations. However, this starts getting into an interface overhall. Ideally, we'd do very little when deserializing a block, lazily pathing and/or deserializing into some user-provided struct as needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I had a chance to review this again, and I think basically we should go ahead and merge this: it's strictly better.
I made a few comments on benchmarks that need quick fixin'. 👼
The laments about some of the interface stand, but I think we're agreed on that. And after looking at it quite a while, yeah, it's gonna me a multi-step roadmap to getting everything I wish for, and this should definitely merge rather than wait :)
wrap_test.go
Outdated
foo string | ||
bar []byte | ||
baz []int | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly consequential typo... These should all be exported...!
Right now, the len(node.RawData) on this is one, because it's serializing to an empty map.
}() | ||
} | ||
wg.Wait() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to propose one more benchmark here, for going the other direction, obj->cbor:
func BenchmarkDumpObject(b *testing.B) {
obj := testStruct()
for i := 0; i < b.N; i++ {
bytes, err := DumpObject(obj)
if err != nil {
b.Fatal(err, bytes)
}
}
}
So (once the fixture is fixed to be exported) here are some raw numbers "from my machine": master:
feat/refmt-pool branch:
These benchmarks were I'm both convinced this can be further improved, and simultaneously, it looks like it's already beating the non-refmt code on master in time, allocation size, and allocation count. More corpus would make for more confidence, but as it stands... That's cool. |
Make sure we *actually* serialize/deserialize. Addresses @warpfork's CR.
Yeah, you're right. They're beating the current CBOR by a large margin. My benchmarks were completely broken. |
Benchmarking is a brutal thing. 🤕 It always seems to be roughly... 10: write benchmark Fwiw, I did find a terribly interesting result from the old one-byte ones. All the results seemed to be equal-or-better, except for DecodeBlockParallel which was mysteriously worse on time, despite having less mem and the same alloc count. Something which disappeared in runs with the garbage collector disabled. So that's "fascinating"... apparently it's evidence that some kinds of gc load can be heavier than others. Isn't that neat? This is of course hypersuper irrelevant to our actual work, just thought I'd mention for giggles. |
Also, use a cloner instead of unmarshalling when wrapping an object.
This gives about a 2.5x speedup in my simple benchmark and probably saves the GC a lot of trouble.