-
Notifications
You must be signed in to change notification settings - Fork 3
Spec Proposal #2
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
# Draft Ipld Unixfs Spec | ||
|
||
## Basic Structure | ||
|
||
- Some sort of header that indicates that this a directory and included a version number. The header could also have fields to give additional information on the meaning of the extended attributes. | ||
|
||
- CBOR Map | ||
- Key: CBOR Byte or Text String: File Name | ||
- Value: CBOR Array of: | ||
- Type: CBOR Unsigned Int | ||
- Link or Data: CBOR Type varies | ||
- Optional file size: CBOR Unsigned Int | ||
- Optional Standard Attributes: CBOR Map | ||
- Optional Extended Attributes: CBOR Map | ||
|
||
The file size is only defined for regular files and is the size of the file contents. | ||
|
||
All maps should be ordered based on the binary values of the key, | ||
duplicates are not allowed. | ||
|
||
### Notes | ||
|
||
* An array makes sense to be as this is more compact and the value of | ||
the fields are unambiguous, it also allows for a separation of | ||
standard and extended attributes | ||
|
||
* The key type can either be a byte or text string as POSIX makes no | ||
requirements that file names be utf-8 and it is important that any | ||
file name can be faithfully represented, if the string is utf-8 | ||
then the type will be Text. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This makes an implicit decision regarding the question I posed at the end of ipfs/kubo#4292 (comment): we shift the onus of "check that the name is safe to use/dipslay" to the consumers. Are we ready to do that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it is important to represent any valid file in a POSIX system. I am against restricting the range in the spec. Yes I think it is the consumer job to make sure that the filename is safe to display. |
||
|
||
## Types | ||
|
||
The type field should be limited to a set of well defined values so it | ||
makes sense that this is an integer rather than a text string. The | ||
value is the ascii value of a letter. When converting to JSON the | ||
integer can be represented as a single character string. | ||
|
||
Possible values are as follows: | ||
|
||
* 0, '', `file`: regular file | ||
* `e`, `exe`: executable file | ||
* `d`, `dir`: directory entry | ||
* `s`, `special`: special file type (fifo, device, etc). The second field is a CBOR Map with at least one field to describe the type. | ||
* `l`, `symlink`: symbolic link. The second field is the contents of the link | ||
* `o`, `other`: link to other ipld object, links followed for GC and related operations | ||
* `u`, `unknown`: link to unknown objects, links not followed | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If an integer enumeration was used rather than ascii characters, the canonical CBOR representation would be packed to one byte rather than two. Given that the CID will be in raw representation, I don't think clarity would suffer by an enumeration. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not agents this. |
||
### Notes | ||
|
||
* Rather than have a special attribute for an executable bit it is more compact if we just make this a different type | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If file types are enumerated then the high bit in a one-byte packed CBOR integer (0b10000) could be an informative bit that would make regular files (type 0) into executable files (type 16). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That could work. |
||
* It is very useful to be able to determine if a link is a directory or an ordinary file so I made it as separate type, also there can be multiple ways to define a file size for a directory so it is best to just leave it out as it is of limited usefulness | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Actually there are only 2 ways - either you only count the logical bytes ( what the Windows Given that in the context of IPFS the DAG is completely decoupled from the storage ( it may be files, it may be badger, etc ), the only sensible way to define a file size for a directory is to count the logical bytes, which I've done in my prototypes. I would be sad if I can't express these cumulative values as part of every link within an FS tree. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unix has it's own (basically useless) way of defining the size of a directory. I am not totally against included the cumulative size of a directory if we can agree on how to define it. |
||
|
||
## Standard Attributes: | ||
|
||
The standard set of attributes should be limited to a small set of meaningful values. | ||
Stripping this filed SHOULD not change the meaning of the directory entry. | ||
Clients SHOULD be able to understand these attributes when reading a directory entry. | ||
|
||
Possible entries: | ||
|
||
* `mtime` | ||
* `ro`: Boolean, set if the file or directory should be readonly when copied to the filesystem | ||
|
||
## Extended Attributes | ||
|
||
The extended attributes set is not well defined and can be used for vendor extensions and posix attributes that don't make sense on non-unix systems. | ||
Stripping this field MUST not change the meaning of the directory entry. | ||
These attributes SHOULD be passed along but do not have to be understood. | ||
The directory header MAY include information on the meaning of the attributes; | ||
for example it could indicate that this is a copy of a unix filesystem and to expect a standard set of corresponding attributes. | ||
|
||
Possible entries: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would "Extended Attributes" be a good place to optionally store explicit media type for problematic data types, as noted in #11 ? |
||
|
||
* `user`: unix user name | ||
* `uid`: unix numeric uid | ||
* `group`: unix group name | ||
* `gid`: unix numeric gid | ||
* `perm`: full unix permissions | ||
* extended posix attributes | ||
* windows specific attributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am really concerned having attributes being represented as an array:
Why not just a map? With all keys being pre-agreed upon multicodec-style: i.e. it must exist in one of the centralized spec tables in order to be recognized by anyone
We can still declare some of the keys as mandatory, and it is at the discretion of gateways/nodes/etc to decide what to do with "obviously malformed" blocks. We already have this with protobuf/unixfs: if one uploads a link-block with only "type 2" fields, and no "data" - everything rejects it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CBOR defines sorting logic for canonicalizing maps, but not for arrays, and a canonical representation for unixfs directory should be a must.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I covered the reasons below in the notes section.
@ehmry I don't see how your comment applies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not tied to this idea. The amount of space saving is something that can be calculated once we determine what the keys will be if we use a map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevina I assumed you meant an array of [key, value] tuples. CBOR is supposed to be schema-less and ordered values seem like schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if we go with a map for directory entries certain keys will be required in order for the directory entry to be well defined so that is also a schema in a way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, assigning integer keys to the spec attributes would order them in the same way and just be one byte of overhead for each attribute in a map.