Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use string instead of struct backing #47

Closed
wants to merge 4 commits into from
Closed

Conversation

dignifiedquire
Copy link
Member

@dignifiedquire dignifiedquire commented May 17, 2018

cc @Stebalien

ref #38

@Stebalien
Copy link
Member

I like this. However, it's a massively breaking change so we'll have to be careful (I've received a lot of push-back from @whyrusleeping when trying to do this kind of thing in the past). Also, this makes extracting a multihash from a CID allocate (could be fixed by using a string internally) and means that constructing a CID involves copying bytes. We can fix the former by storing multihashes as strings (multiformats/go-multihash#29) and the latter by adding a function to multihash that writes to a writer (e.g., a string builder) which we'll probably want anyways.

A less breaking-change version would be to use a pointer to a string (or a pointer to, e.g., struct Cid{ string }). However, that will require an additional allocation (although the KeyString method becomes trivial which is my personal goal).


Performance wise, implementing the varint skipping code manually may be faster. That is, to skip two varints, count two bytes with the high-order bit unset.


Regardless, I think we should wait for the libp2p refactor + dht fixes + log fixes + everything else that is currently blocked.

Personally, I'd like to just break things and fix this all at once but this really is a massively breaking change. All cid == nil checks now no longer work.

@whyrusleeping
Copy link
Member

@dignifiedquire please familiarize yourself with the other attempts to do similar things, primarily: multiformats/go-multiaddr#62

@dignifiedquire
Copy link
Member Author

@whyrusleeping I did read through that discussion, but didn't see that there was a final decision made on the direction to go, so this is based on a conversation I had with @Stebalien afterwards.

I am happy to adjust if you and @Stebalien get to an agreement on which version exactly you want to go for in these, just let me know.

// - hash mh.Multihash
type Cid string

var EmptyCid = Cid(string([]byte{}))
Copy link
Contributor

@kevina kevina Jun 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why string([]byte{}) and not just ""?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also make this a constant so we have:

const EmptyCid = Cid("")

Copy link
Contributor

@kevina kevina Aug 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also. I find EmptyCid a bit tedious of a name. How out just Nil"? When used outside this package it would be cid.Nil.

Copy link
Contributor

@kevina kevina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of my comments are just suggestions but I am blocking for two reasons (1) The KeyString() method is wrong and needs to be fixed. (2) I opened a second p.r. (#71) where I implemented my suggestions.

func (c *Cid) KeyString() string {
return string(c.Bytes())
func (c Cid) KeyString() string {
return string(c)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. If the version is 0 only the Multihash should be returned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should actually be right. If the version is 0, this will only be the multihash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Nvm, missed #47 (comment)


var EmptyCid = Cid(string([]byte{}))

func (c Cid) version() uint64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should export this method, it can be useful.

hash: mhash,
}
func NewCidV0(mhash mh.Multihash) Cid {
return newCid(0, DagProtobuf, mhash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this representation of CidV0 odd. Why not just use the normal binary representation, i.e, just the multihash?

// - hash mh.Multihash
type Cid string

var EmptyCid = Cid(string([]byte{}))
Copy link
Contributor

@kevina kevina Aug 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also. I find EmptyCid a bit tedious of a name. How out just Nil"? When used outside this package it would be cid.Nil.

c.version == o.version &&
bytes.Equal(c.hash, o.hash)
func (c Cid) Equals(o Cid) bool {
// TODO: can we use regular string equality?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use regular string equality?

Yes.

// - version uvarint
// - codec uvarint
// - hash mh.Multihash
type Cid string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should make this type Cid struct{ string } to ensure the Cid type is always valid.

}
}

// making sure we don't allocate when returning bytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we do now and can't avoid it. I take it BenchmarkBytesV1 is leftover from the first attempt with used []byte as the representation.

@Stebalien
Copy link
Member

Stebalien commented Aug 25, 2018

@dignifiedquire for context, this is moving forward due to:

#70, the fact that by-value CIDs will work better with refmt, and the fact that we're reworking a ton of CID stuff anyways for the base32 endeavor.

@dignifiedquire
Copy link
Member Author

@Stebalien thanks for the heads up, I was following the other thread. Let me know if you need anything from me, otherwise feel free to take the code and use what you can, and throw away what you don’t like :)

@kevina
Copy link
Contributor

kevina commented Aug 31, 2018

Closing in favor of #71

@kevina kevina closed this Aug 31, 2018
@ghost ghost removed the status/in-progress In progress label Aug 31, 2018
@kevina kevina deleted the feat/bytes branch August 31, 2018 05:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants