-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sort map entries marshalling dag-json #203
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,6 +3,7 @@ package dagjson | |
import ( | ||
"encoding/base64" | ||
"fmt" | ||
"sort" | ||
|
||
"github.com/polydawn/refmt/shared" | ||
"github.com/polydawn/refmt/tok" | ||
|
@@ -31,21 +32,32 @@ func Marshal(n ipld.Node, sink shared.TokenSink, allowLinks bool) error { | |
if _, err := sink.Step(&tk); err != nil { | ||
return err | ||
} | ||
// Emit map contents (and recurse). | ||
// Collect map entries, then sort by key | ||
type entry struct { | ||
key string | ||
value ipld.Node | ||
} | ||
entries := []entry{} | ||
for itr := n.MapIterator(); !itr.Done(); { | ||
k, v, err := itr.Next() | ||
if err != nil { | ||
return err | ||
} | ||
tk.Type = tok.TString | ||
tk.Str, err = k.AsString() | ||
keyStr, err := k.AsString() | ||
if err != nil { | ||
return err | ||
} | ||
entries = append(entries, entry{keyStr, v}) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What are the implications of holding on to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The interface doesn't say whether it's safe to use the value of a Next call after the Next call has happened. Many APIs like badgerdb or Go's own I think it should be Eric's call to say what the interface semantics should be here. Allowing implementations to reuse memory would certainly unlock more performance for the cases where one doesn't need to hold onto memory, and those will likely be more common. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (Okay. Going to think-out-loud for a first pass here.) Hrm. I don't think that's actually come up as a trade before, because I think we've always been able to answer with pointers to existing and already escaped-to-heap memory. The immutability guarantee held by
So this is certainly true, we just happen to not have hit a case where the distinction has been possible yet. I can imagine in the future we might have ADL implementations which can optimize things by reusing internal buffers. That would hit this. I'm not immediately certain that's going to be a predominating concern in that situation, though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think assuming it's safe to hold onto the values is the safer approach, anyway, as going the opposite route might well break some users. In the future, we could always add opt-in iterator interfaces that take more aggressive shortcuts, like "may share memory", or "allows for more performant codec sorting", etc. But the basic iteration interface is... simple :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep. Agree. Let's go that route. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also worth noting here that we probably shouldn't optimise too heavily for the theoretical ADL case here because this is a block codec and if you're trying to serialize a massive HAMT into DAG-{JSON,CBOR} then you're in for a bad time regardless of what optimisations might apply here. I think it's safe-ish to assume we're dealing with maps of a maximum of ~1M, maybe a little more, in the 99.9% case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For sorting in codecs -- yeah. It's not particularly important to optimize for someone feeding an ADL into a marshaller. (Even if that's technically allowed, it's... not a usual thing to want to actually do.) The reason we ended up considering it is because it would have effect on the iterator design. But all the same, the result here is: yeah, this is fine as written. |
||
} | ||
sort.Slice(entries, func(i, j int) bool { return entries[i].key < entries[j].key }) | ||
// Emit map contents (and recurse). | ||
for _, e := range entries { | ||
tk.Type = tok.TString | ||
tk.Str = e.key | ||
if _, err := sink.Step(&tk); err != nil { | ||
return err | ||
} | ||
if err := Marshal(v, sink, allowLinks); err != nil { | ||
if err := Marshal(e.value, sink, allowLinks); err != nil { | ||
return err | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is declaring a struct in the middle of this naughty, or OK? can I get away without a struct somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's naughty :) certainly better than declaring it globally, unless we need methods on it in the future.