Path clarifications #47

warpfork · 2020-02-23T11:04:38Z

Several significant clarifications to Path docs.

Primarily, being committal and direct about how special values, erm... aren't -- namely, "/" is a valid segment, and so is empty string -- and confessing how much tricky work that makes.

Several methods are now more upfront about how much they do not do good things on such values. I'd like to fix these methods, but to do that, we should formally specify an escaping system and canonical string encoding for paths. That work should start in the IPLD Specs repo, and involve fixtures. Once that is tackled, I will (very!) gladly start fixing these methods to operate accordingly, and get rid of all the sharp-edges warnings.

Several methods also now include comments about whether they will enforce string character encoding or normalization on their function boundaries. They won't. (This is certainly the norm in Go -- the stance of "string is really just bytes" is ubiquitous in the gopher community -- so it may seem not worthy of wasting docs on to other gopher readers. However, since IPLD is all about cross-language consistency, and we may have people from other programming language ecosystems reading our docs, it seems good to be as clear and explicit as possible.) I suspect this will remain the case (it's turned out tricky and work-makey to try to specify anything else; subtracting the countably-infinite domain of non-normalized strings from the countably-infinite domain of all possible byte sequences makes for some very nasty and pernicious expressibility and round-tripping issues, long story short).

Thanks to @ribasushi for the kick in the shins that these docs needed work and clarifications. The previous docs were written a fair while ago when we knew less things less confidently about how the system and specs would evolve, so these clarifications were badly needed.

@ribasushi

Primarily, being commital and direct about how special values, erm... *aren't* -- namely, "/" is a valid segment, and so is empty string -- and confessing how much tricky work that makes. Several methods are now more upfront about how much they *do not do good things* on such values. Added some new constructors for Path which *do* work for all possible values. (You could've handled such tricky values with a series of Join calls, previously, but that would be ergonomically grueling.) These facts are all defacto 'true' to the best of my knowledge now. However, the IPLD specs repo is a bit on the quiet side about them at present (precisely because it's such an irritating little nest of edge cases and fun stuff)... and so the comments are also littered with "this may change" warnings. Still, it's better to be accurate about what the code does and does not do in its current state. I'd *like* to formally specify an escaping system and canonical string encoding for paths. That should start in the IPLD Specs repo, though, and involve fixtures, so I won't start it here now. Thanks to @ribasushi for the kick in the shins that these docs needed work and clarifications.

It's not very often we care about these things in Go, and when we do, this stance of "string is really just bytes; guard it yourself" is the community norm... so, these docs should *almost* go without saying. However, since IPLD is all about cross-language consistency, and we may have people from other programming language ecosystems reading our docs, it seems good to be as clear and explicit as possible.

I... honestly hadn't even realized this before; I just noticed it when making a refactor that would produce this end state, and thought "oh, that's silly". Evidently... it's fine, since this has already been the case, and not been problematic! Alright then. Apparently it's not too silly after all.

Yep, it makes a lot of creation by struct uglier, as you can see in the diff to the test file. Fortunately? Nobody outside the package is allowed to do that anyway, so, we can pretty much ignore the ergonomic. (It'll still look uglier if someone's looking at stuff with spew or go-cmp or some other library like that which peeks naughtily into unexported fields, but that's hard to do much about.) We could also address this by adding a discriminant field to the PathSegment struct. (If we wanted to use uint internally, we'd probably have to do that, in fact. Either that or start using some other alarmingly magical number like UINT_MAX. Ygh.) I didn't. I don't think we handle lists larger than 2 billion elements particularly well anyway... so, adding more words of memory to PathSegment to support that case seems like an entirely losing trade.

codecov · 2020-02-24T19:05:02Z

Codecov Report

Merging #47 into master will increase coverage by 0.41%.
The diff coverage is 50%.

@@            Coverage Diff             @@
##           master      #47      +/-   ##
==========================================
+ Coverage   72.21%   72.61%   +0.41%     
==========================================
  Files          50       50              
  Lines        3162     3209      +47     
==========================================
+ Hits         2283     2330      +47     
+ Misses        789      788       -1     
- Partials       90       91       +1

Impacted Files	Coverage Δ
path.go	`12% <0%> (+12%)`	⬆️
pathSegment.go	`46.16% <66.67%> (+38.47%)`	⬆️
traversal/common.go	`80.77% <0%> (-6.73%)`	⬇️
schema/gen/go/gen.go	`100% <0%> (ø)`	⬆️
schema/gen/go/genKindLinkNode.go	`100% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 907fb14...8093be7. Read the comment docs.

path.go

ribasushi

LGTM!

There's already explicit sections for empty string and slash cases, on the basis of how likely those are to provoke questions (even though the content is essentially to highlight how *not* exceptional they are); it makes just as much sense to do the same call out for null bytes. Original text started in https://github.com/ipld/go-ipld-prime/pull/47/files#r384127483 . Co-Authored-By: Peter Rabbitson <ribasushi@leporine.io>

Previous commit with suggestions from reviewer made me look at this line again and realize it was *too* specific.

warpfork added 4 commits February 23, 2020 11:06

Spelling fix.

0f7de4a

warpfork requested a review from ribasushi February 23, 2020 11:04

warpfork force-pushed the path-clarifications branch from c7b11de to a48c9db Compare February 24, 2020 19:02

warpfork added 2 commits February 24, 2020 20:02

Addntl comments on PathSegment equality.

a48c9db

ribasushi reviewed Feb 25, 2020

View reviewed changes

path.go Show resolved Hide resolved

ribasushi approved these changes Feb 25, 2020

View reviewed changes

warpfork and others added 2 commits February 27, 2020 12:43

Expand the remark on slashes in paths.

8093be7

Previous commit with suggestions from reviewer made me look at this line again and realize it was *too* specific.

warpfork force-pushed the path-clarifications branch from ae08d2d to 8093be7 Compare February 27, 2020 12:00

warpfork merged commit 4612357 into master Feb 27, 2020

warpfork deleted the path-clarifications branch February 27, 2020 12:02

aschmahmann mentioned this pull request Sep 22, 2020

Release 0.7.0 ipfs/kubo#7560

Closed

72 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Path clarifications #47

Path clarifications #47

warpfork commented Feb 23, 2020

codecov bot commented Feb 24, 2020 •

edited

Loading

ribasushi left a comment

Path clarifications #47

Path clarifications #47

Conversation

warpfork commented Feb 23, 2020

codecov bot commented Feb 24, 2020 • edited Loading

Codecov Report

ribasushi left a comment

Choose a reason for hiding this comment

codecov bot commented Feb 24, 2020 •

edited

Loading