Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort map entries marshalling dag-json #203

Merged
merged 3 commits into from
Jul 21, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 16 additions & 4 deletions codec/dagjson/marshal.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package dagjson
import (
"encoding/base64"
"fmt"
"sort"

"github.com/polydawn/refmt/shared"
"github.com/polydawn/refmt/tok"
Expand Down Expand Up @@ -31,21 +32,32 @@ func Marshal(n ipld.Node, sink shared.TokenSink, allowLinks bool) error {
if _, err := sink.Step(&tk); err != nil {
return err
}
// Emit map contents (and recurse).
// Collect map entries, then sort by key
type entry struct {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is declaring a struct in the middle of this naughty, or OK? can I get away without a struct somehow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's naughty :) certainly better than declaring it globally, unless we need methods on it in the future.

key string
value ipld.Node
}
entries := []entry{}
for itr := n.MapIterator(); !itr.Done(); {
k, v, err := itr.Next()
if err != nil {
return err
}
tk.Type = tok.TString
tk.Str, err = k.AsString()
keyStr, err := k.AsString()
if err != nil {
return err
}
entries = append(entries, entry{keyStr, v})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the implications of holding on to v while we collect and sort? Yeah, a perf hit, but what about memory and safety?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface doesn't say whether it's safe to use the value of a Next call after the Next call has happened. Many APIs like badgerdb or Go's own range will reuse the same memory for the following iterations, meaning that it's on you to make a copy if you want to keep it around for longer.

I think it should be Eric's call to say what the interface semantics should be here. Allowing implementations to reuse memory would certainly unlock more performance for the cases where one doesn't need to hold onto memory, and those will likely be more common.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Okay. Going to think-out-loud for a first pass here.)

Hrm. I don't think that's actually come up as a trade before, because I think we've always been able to answer with pointers to existing and already escaped-to-heap memory.

The immutability guarantee held by all most of the node implementations so far also made it easy to say it's no problem to hold onto things.

Allowing implementations to reuse memory would certainly unlock more performance for the cases where one doesn't need to hold onto memory, and those will likely be more common.

So this is certainly true, we just happen to not have hit a case where the distinction has been possible yet.

I can imagine in the future we might have ADL implementations which can optimize things by reusing internal buffers. That would hit this. I'm not immediately certain that's going to be a predominating concern in that situation, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think assuming it's safe to hold onto the values is the safer approach, anyway, as going the opposite route might well break some users.

In the future, we could always add opt-in iterator interfaces that take more aggressive shortcuts, like "may share memory", or "allows for more performant codec sorting", etc. But the basic iteration interface is... simple :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Agree. Let's go that route.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also worth noting here that we probably shouldn't optimise too heavily for the theoretical ADL case here because this is a block codec and if you're trying to serialize a massive HAMT into DAG-{JSON,CBOR} then you're in for a bad time regardless of what optimisations might apply here. I think it's safe-ish to assume we're dealing with maps of a maximum of ~1M, maybe a little more, in the 99.9% case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sorting in codecs -- yeah. It's not particularly important to optimize for someone feeding an ADL into a marshaller. (Even if that's technically allowed, it's... not a usual thing to want to actually do.)

The reason we ended up considering it is because it would have effect on the iterator design.

But all the same, the result here is: yeah, this is fine as written.

}
sort.Slice(entries, func(i, j int) bool { return entries[i].key < entries[j].key })
// Emit map contents (and recurse).
for _, e := range entries {
tk.Type = tok.TString
tk.Str = e.key
if _, err := sink.Step(&tk); err != nil {
return err
}
if err := Marshal(v, sink, allowLinks); err != nil {
if err := Marshal(e.value, sink, allowLinks); err != nil {
return err
}
}
Expand Down
8 changes: 6 additions & 2 deletions codec/dagjson/roundtripBytes_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,11 @@ var byteNode = fluent.MustBuildMap(basicnode.Prototype__Map{}, 4, func(na fluent
na.AssembleEntry("plain").AssignString("olde string")
na.AssembleEntry("bytes").AssignBytes([]byte("deadbeef"))
})
var byteSerial = `{"plain":"olde string","bytes":{"/":{"bytes":"ZGVhZGJlZWY="}}}`
var byteNodeSorted = fluent.MustBuildMap(basicnode.Prototype__Map{}, 4, func(na fluent.MapAssembler) {
na.AssembleEntry("bytes").AssignBytes([]byte("deadbeef"))
na.AssembleEntry("plain").AssignString("olde string")
})
var byteSerial = `{"bytes":{"/":{"bytes":"ZGVhZGJlZWY="}},"plain":"olde string"}`

func TestRoundtripBytes(t *testing.T) {
t.Run("encoding", func(t *testing.T) {
Expand All @@ -29,7 +33,7 @@ func TestRoundtripBytes(t *testing.T) {
nb := basicnode.Prototype__Map{}.NewBuilder()
err := Decode(nb, buf)
Require(t, err, ShouldEqual, nil)
Wish(t, nb.Build(), ShouldEqual, byteNode)
Wish(t, nb.Build(), ShouldEqual, byteNodeSorted)
})
}

Expand Down
2 changes: 1 addition & 1 deletion codec/dagjson/roundtripCidlink_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ func TestRoundtripCidlink(t *testing.T) {

n2, err := lsys.Load(ipld.LinkContext{}, lnk, basicnode.Prototype.Any)
Require(t, err, ShouldEqual, nil)
Wish(t, n2, ShouldEqual, n)
Wish(t, n2, ShouldEqual, nSorted)
}

// Make sure that a map that *almost* looks like a link is handled safely.
Expand Down
20 changes: 18 additions & 2 deletions codec/dagjson/roundtrip_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,23 @@ var n = fluent.MustBuildMap(basicnode.Prototype__Map{}, 4, func(na fluent.MapAss
})
})
})
var serial = `{"plain":"olde string","map":{"one":1,"two":2},"list":["three","four"],"nested":{"deeper":["things"]}}`
var nSorted = fluent.MustBuildMap(basicnode.Prototype__Map{}, 4, func(na fluent.MapAssembler) {
na.AssembleEntry("list").CreateList(2, func(na fluent.ListAssembler) {
na.AssembleValue().AssignString("three")
na.AssembleValue().AssignString("four")
})
na.AssembleEntry("map").CreateMap(2, func(na fluent.MapAssembler) {
na.AssembleEntry("one").AssignInt(1)
na.AssembleEntry("two").AssignInt(2)
})
na.AssembleEntry("nested").CreateMap(1, func(na fluent.MapAssembler) {
na.AssembleEntry("deeper").CreateList(1, func(na fluent.ListAssembler) {
na.AssembleValue().AssignString("things")
})
})
na.AssembleEntry("plain").AssignString("olde string")
})
var serial = `{"list":["three","four"],"map":{"one":1,"two":2},"nested":{"deeper":["things"]},"plain":"olde string"}`

func TestRoundtrip(t *testing.T) {
t.Run("encoding", func(t *testing.T) {
Expand All @@ -41,7 +57,7 @@ func TestRoundtrip(t *testing.T) {
nb := basicnode.Prototype__Map{}.NewBuilder()
err := Decode(nb, buf)
Require(t, err, ShouldEqual, nil)
Wish(t, nb.Build(), ShouldEqual, n)
Wish(t, nb.Build(), ShouldEqual, nSorted)
})
}

Expand Down
2 changes: 1 addition & 1 deletion fluent/qp/example_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,5 +30,5 @@ func Example() {
dagjson.Encode(n, os.Stdout)

// Output:
// {"some key":"some value","another key":"another value","nested map":{"deeper entries":"deeper values","more deeper entries":"more deeper values"},"nested list":[1,2]}
// {"another key":"another value","nested list":[1,2],"nested map":{"deeper entries":"deeper values","more deeper entries":"more deeper values"},"some key":"some value"}
}
4 changes: 2 additions & 2 deletions node/bindnode/example_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ func ExampleWrap_withSchema() {
dagjson.Encode(nodeRepr, os.Stdout)

// Output:
// {"Name":"Michael","Friends":["Sarah","Alex"]}
// {"Friends":["Sarah","Alex"],"Name":"Michael"}
}

func ExamplePrototype_onlySchema() {
Expand Down Expand Up @@ -78,5 +78,5 @@ func ExamplePrototype_onlySchema() {
dagjson.Encode(nodeRepr, os.Stdout)

// Output:
// {"Name":"Michael","Friends":["Sarah","Alex"]}
// {"Friends":["Sarah","Alex"],"Name":"Michael"}
}
6 changes: 3 additions & 3 deletions node/tests/schemaStructsContainingMaybe.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ func SchemaTestStructsContainingMaybe(t *testing.T, engine Engine) {
{
name: "vvvvv-AllFieldsSet",
typeJson: `{"f1":"a","f2":"b","f3":"c","f4":"d","f5":"e"}`,
reprJson: `{"r1":"a","r2":"b","r3":"c","r4":"d","f5":"e"}`,
reprJson: `{"f5":"e","r1":"a","r2":"b","r3":"c","r4":"d"}`,
typePoints: []testcasePoint{
{"", ipld.Kind_Map},
{"f1", "a"},
Expand All @@ -68,7 +68,7 @@ func SchemaTestStructsContainingMaybe(t *testing.T, engine Engine) {
{
name: "vvnnv-Nulls",
typeJson: `{"f1":"a","f2":"b","f3":null,"f4":null,"f5":"e"}`,
reprJson: `{"r1":"a","r2":"b","r3":null,"r4":null,"f5":"e"}`,
reprJson: `{"f5":"e","r1":"a","r2":"b","r3":null,"r4":null}`,
typePoints: []testcasePoint{
{"", ipld.Kind_Map},
{"f1", "a"},
Expand All @@ -89,7 +89,7 @@ func SchemaTestStructsContainingMaybe(t *testing.T, engine Engine) {
{
name: "vzvzv-AbsentOptionals",
typeJson: `{"f1":"a","f3":"c","f5":"e"}`,
reprJson: `{"r1":"a","r3":"c","f5":"e"}`,
reprJson: `{"f5":"e","r1":"a","r3":"c"}`,
typePoints: []testcasePoint{
{"", ipld.Kind_Map},
{"f1", "a"},
Expand Down
2 changes: 1 addition & 1 deletion traversal/focus_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -318,7 +318,7 @@ func TestFocusedTransformWithLinks(t *testing.T) {
Wish(t, progress.Path.String(), ShouldEqual, "linkedMap/nested/nonlink")
Wish(t, must.String(prev), ShouldEqual, "zoo")
Wish(t, progress.LastBlock.Path.String(), ShouldEqual, "linkedMap")
Wish(t, progress.LastBlock.Link.String(), ShouldEqual, "baguqeeye2opztzy")
Wish(t, progress.LastBlock.Link.String(), ShouldEqual, "baguqeeyezhlahvq")
nb := prev.Prototype().NewBuilder()
nb.AssignString("new string!")
return nb.Build(), nil
Expand Down