Skip to content
This repository has been archived by the owner on Dec 6, 2022. It is now read-only.

Spec Proposal #2

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 101 additions & 0 deletions draft.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# Draft IPLD Unixfs Spec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stop calling this IPLD Unixfs, the current Unixfs is already IPLD. This proposal is to:

  • Move away from dag-pb to dag-cbor
  • Leverage the flexibility of dag-cbor to add more Metadata
  • Improve Unixfs and remove some of the limitations we found while using unixfs with dag-pb.

One of the design goals of the new Unixfs is that it should be 100% interopable with the old (a directory of Unixfs2 should be able to have a file of Unixfs1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name "IPLD Unixfs" was @whyrusleeping idea and I just went along with it. We could call it "Unixfs V2", although I am not sure how much with want to stick with the unix filesystem structure as a model (I personally thing we should move away from it and focus on the compartments that are important to a generic archive structure).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine not calling it ipld unixfs. @diasdavid is right

Copy link
Member

@achingbrain achingbrain Jun 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a directory of Unixfs2 should be able to have a file of Unixfs1

It's probably worth mentioning this in the spec.

Also, what about Unixfs1 directories?


## Basic Structure

A Unixfs is either a file or a directory.
The top level IPLD object is a CBOR map with at least two fields: `type` and `data`
and maybe a few other such as a version string or a set of flags.
The `type` field is either `file` or `dir`.
Copy link

@ehmry ehmry Dec 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say better to define a CBOR tag for files and and a tag for directories, and define a file as a tagged array and a dir as a tagged map. That makes it clear from the first atomic in the CBOR that you are parsing UnixFS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehmry we can do that, but do we then need to register the tags?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, I was thinking just picking two random uint64 tags.


## IPLD `file`

If an IPLD file is a leaf its CID type is `raw` (0x55) and has no structure.
Otherwise its CID type is `dag-cbor` (0x71).
The `type` field is set to `file` and the `data` field is an CBOR array.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current dag-cbor and bitswap implementation it's quite limiting to require that this be an array of links. There's a block size limit of 2megs across the board. An array of links when serialized to CBOR can't be more than ~2,500 links before the node itself is larger than 2megs.

Unless we bake in a way to shard this we'll be limited to files that are smaller than ~5GB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm a lot less concerned about files under 5GB than I am concerned with not being able to develop a smart chunker for fear that if I don't use the max blocks size the node will be too large.

I'm starting to think about developing a chunker for javascript bundles that uses the sourcemap from the bundler to chunk it into blocks built from each file. This should greatly reduce the new blocks that need to be pushed when new bundles are created, but the number of chunks will easily be greater 2,500.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan is to use a CHAMP instead of an array of links, like we do today for sharded directories. I have an implementation of a CHAMP (HAMT) in ipld here: https://github.com/ipfs/go-hamt-ipld

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JS implementation for unixfs sharding can be found at https://github.com/ipfs/js-ipfs-unixfs-engine/tree/master/src/hamt. It was built by @pgte a while ago.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but I've only seen hash map trees used for named key values, not for an ordered array. If we're using this as a replacement for a CBOR Array is there key semantics that need to be defined here in the spec for that? I'm only seeing the current JS hamt implementation used for sharded directories, not sharded file parts.

Copy link
Contributor

@mikeal mikeal Jun 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the short answer is "don't include that many file parts in a single node"

This isn't sufficient for at least 3 use cases I can think of.

  • Bundles Tarballs, webpack and browserify output, etc. The ideal way to chunk these files is to chunk around the boundaries of the originating files so that changes to the bundle translate into a relatively small number of part changes. Tarballs that pack many small files, and pretty much any modern front-end bundle, is created from more than 2,500 original files so this puts it easily above the limit.
  • Compressed Files Gzipped content, but especially streaming media files: ogg, mpeg, etc. You want the chunker to create blocks that respect the compression windows. This results in faster performance throughout the stack but is especially important when seeking within that content as the codec will always ask for the whole window that seekpoint is from and if it's in the middle of the block this translates into a delayed seek while the content buffers. These compression windows are configurable and sometimes the windows are very small, so a relatively small video file could be more than 2,500K parts.
  • Files Larger than 5GB As discussed here there's currently a 2MB bitswap block limit. There's talk of supporting larger blocks but the larger the block the less efficient the transport will be.

It's fine if we just want to say that these use cases are out of scope for unixfsv2. Pushing them out of scope just means that we have some time to see how people solve these issues outside of unixfsv2 and it may help us get things shipped faster. But we're probably signing up for a unixfsv3 at some point in the future if we don't have any other way to work around this limitation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikeal look at how the importers work and structure the graph of file parts. The answer is still 'dont include that many file parts in a single node'. Ipfs chunks and structures things into a recursive tree, not just a single level with a flat array of links.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone in Berlin mentioned that it was effectively an "array of arrays."

One thing to consider with this design, range requests don't work without loading every part of the file from the beginning to the start of the range. There's no information about the size of the individual parts so the only way to know how to seek is to load them all in serial. Really not ideal, especially for media uses cases because it makes seeking quite slow.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current design has range information, seeking is efficient and only has to load the required nodes for that graph traversal. I assume we would do exactly the same in V2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see our fix to this involves moving from data: [chunks] to data: {parts: [chunks], partLengths: []} because If we make the data attribute an object we can add other relevant information, like the type of chunking algorithm used, which would allow us to implement more efficient syncing between clients.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo, 'an CBOR array'

Each element of the array is CBOR map with the following fields:

- `data`: link
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why this field isn't called link given that it's always a link?

- `size`: cumulative size of `data`
- `fsize`: (file size) cumulative size of the payload of `data`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevina this seems backwards. The size of the protocol-wrapped objects is currently optional for all intents and purposes. On the other hand the actual fsize is mandatory if what you are expressing is a "top node of a file DAG".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear fsize is the logical bytes while size is the direct or size that included the overhead of interior nodes. fsize <= size

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevina precisely. The logical bytes ( fsize ) are interesting because they are needed to calculate buffers / HTTP header / etc

The node overhead is only useful for estimating how much local storage would be necessary to grab the entire DAG locally, but given the overhead is never more than ~10 bytes per node, it becomes ( from my PoV at least ) needless cruft.

I've done a lot of tests over the last year with DAGs specifically excluding mention of what you refer to as size. Everything works fine. I strongly believe the overhead-including-size should not be a mandatory part of any future spec ( and thus should be the last item in the CBOR array, not an intermediate one )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly believe the overhead-including-size should not be a mandatory part of any future spec

Note: This is not a CBOR array but a map the structure is {type: "file", data: [{/*link*/}, {}, {} ...]}.

I don't have a strong opinion on which sizes to include. @whyrusleeping thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think size and fsize are both useful - download progress would depend on size, for example, as you still need to fetch the wrapper bytes.


The `fsize` field is omitted if the link is `raw` as it is the same value as size.

## IPLD `dir`

An IPLD `dir` represents a directory.
Its CID type is `dag-cbor` (0x71).
The `type` field set to `dir` and the data field is an CBOR map.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo, 'an CBOR map'

The key of the map is a filename and is a CBOR text string encoded in UTF-8.
The value of the map is another CBOR map with the following standard fields:

- `type`
- `exe`: CBOR boolean: executable bit

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of just having executable here, we should do a full rwxrwxrwx unix permissions set (a uint32)

Copy link
Contributor Author

@kevina kevina Nov 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is the full unix set of permissions does not have a lot of meaning on other operating systems. Even within unix systems it has limited meaning when stored in an archive.

Others may have stronger opinions on this than me. In particular see #1 (comment) by @ehmry.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of just having executable here, we should do a full rwxrwxrwx unix permissions set (a uint32)

@whyrusleeping the full st_mode ( entry type + permissions ) is in fact only 16 bits ( within a typically-32-bit-aligned struct member ). If we reuse it as-is we gain some extra bit of interop with everything that understands st_mode ( git took this path: https://stackoverflow.com/questions/737673/how-to-read-the-mode-field-of-git-ls-trees-output/8347325#8347325 ).

Of course this makes direct-query of type a bit harder, but then again every libc provides S_IF... macros

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is no harm of having this additional data stored. Systems that have no meaning of those bits will skip them, systems that have will use them and allow for preservation.

It is very similar with uid and gid. They have no meaning on some systems, they may have no meaning on different machine with same system (different uid/gid mappings) but they are crucial if I wanted to, for example in future, use IPFS for /home storage in managed multi-user multi-workstation environment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should support the full range, but maybe not change the default behaviour?
or maybe we only record user executable by default.

thinking about it a bit more, the 'readable' flag really doesnt make a lot of sense in this context. I can read anything thats in ipfs...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing the full st_mode by default just feels wrong to me and I will go as far as it could create additional complications down the road (what those complications are I am not sure).

One possible complication is how to handle the writable bit in st_mode, should it always be set or not by default. If it is not set then should the st_mode be honored when extracting files. If it is then that could be annoying as all files will end up readonly. Or maybe only the executable bit should be honored by default, in that case it just seams better just to store that single bit.

For this version of the standard I feel rather strongly we should stick to just the executable bit as it was stated in the requirements (#1), or nothing at all (as @lgierth suggested we don't add additional meatadata). The full st_mode can be included in a later version of the standard.

- `data`: normally a CBOR link, but can be other types depending on the value of the `type` field
- `size`: cumulative size of `data`
- `fsize`: (file size) cumulative size of the payload of `data`
- `fname`: CBOR byte string: original filename if it differs from the key

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unnecessary (though I admit i've missed a lot of the conversation from over in the other issue). Why would this differ from the map key?

Copy link
Contributor Author

@kevina kevina Nov 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the discussion on #3 is rather long. It may differ because unix filenames are not required to be UTF-8.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whyrusleeping two things are at play / in conflict:

  • Desire for full POSIX compatibility mandates names to allow any sequence of bytes excluding 0x00 and 0x2f
  • The current proto-IPLD spec mandates keys ( i.e. names ) to be unicode: the mandate flows from the spec declaring a strict superset of RFC 7049

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, I don't have a lot of strong opinions here. I will defer to @lgierth @diasdavid and @Stebalien

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this field going include which character set should be used to interpret it somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Just the raw byte string if it differs from the key (which is the filename) that is in utf-8. For display the key should be used.


And at least the following optional fields:

- `ro`: CBOR boolean: read only
- `mtime`: Modification time
- `attr`: CBOR Map: Extended attributes

Additional fields may be defined. All implementation specific or user
defined fields should be stored under the `attr` field.

### Directory Types

The type field is limited to a set of well defined values:

* _omitted_: regular file
* `dir`: directory entry
* `special`: special file type (fifo, device, etc).
The `data` field is a CBOR Map with at least one field to describe the type.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should overload 'data', especially avoid making it have different types based on the value of a key in the parent level. That sort of parsing is hard to do efficiently

Copy link
Contributor Author

@kevina kevina Nov 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whyrusleeping

I like the simplicity of having the contents of the file entry always be in the same field, for most types it is an IPLD link, for symbolic links it is the target, for special file types in a CBOR map with the details of the special file. My thinking was the type would just be an interface and then cast the correct type once it is known.

I can instead have the following fields:

  • link: CBOR link when applicable
  • target: symbolic link target
  • data: a CBOR map that contains additional data for the directory entry that is not a link or target

I rather not provide special fields to describe the content of all the different types of special files.

Thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're going to have to inspect fields to determine what to do with things anyways. Overloading things doesnt really save us much in my opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whyrusleeping I am having a hard time interpreting that comment, are you okay with my proposal (link, target, data fields) are you saying we should create special fields for each and every special file type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More to the point, I don't want to enumerate the required fields in the first version of the spec. I like the data field abstraction because we can defer that part for later, it also allows us to to define a standard set of tags so that an implementation can error out of it find something it doesn't understand.

* `symlink`: symbolic link. The `data` field is the contents of the link.
* `other`: link to other IPLD object, links followed for GC and related operations
* `unknown`: link to unknown objects, links not followed

### Extended Attributes

The extended attributes set is not well defined and can be used for vendor extensions and POSIX attributes that don't make sense on non-unix systems.
Stripping this field MUST not change the meaning of the directory entry.
These attributes SHOULD be passed along but do not have to be understood.

Possible entries:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would "Extended Attributes" be a good place to optionally store explicit media type for problematic data types, as noted in #11 ?


* `user`: unix user name
* `uid`: unix numeric uid
* `group`: unix group name
* `gid`: unix numeric gid
* `perm`: full unix permissions
* extended posix attributes
* windows specific attributes

### Notes

* Note all standard fields need to be defined for all files types.

* The `type` field is omitted for regular files.
* The `exe` field is only present when true and only makes sense for regular files
* The `size` and `fsize` are only required when the type is a regular file and possibly a `dir`.
For other types they may be defined if they have a meaningful value.
* The `fsize` field is omitted for files that are leaves (i.e. `raw`) as it is the same value as `size`.

* IPLD filenames must valid UTF-8 strings which the following additional constraints:
(1) cannot contain the null (0x00) or `/` characters
(2) cannot be the strings: `.` or `..`
Other restricts may be put in place.
If the original filename does not meet these requirements then an implementation MAY transform the file from
the original, so it is valid IPLD file, and store the original file in the `fname` field.
When extracting a directory to the filesystem an implementation
MAY make use of `fname` to restore the original name.
Implementations SHOULD reject invalid files with invalid names by default
and only translate files when a special flag is given.
When extracting implications SHOULD use the IPLD name and not `fname` unless a special flag is given.

* To save space fields of a directory may be assigned integer values.
Integers have the added benefit of conveying additional meaning based on there values;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo: 'there' vs 'their'.

for example, to distinguish between standard and optional fields.

* The `type` field may also be assigned integer values.