Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Filesystem Interoperability Notes #227

Closed
wants to merge 4 commits into from

Conversation

ahankinson
Copy link
Contributor

A first pass at filesystem interoperability considerations.

Open for comment and review, not for merging (yet)

@ahankinson
Copy link
Contributor Author

Any thoughts?

integrating files from filesystems that use other encodings.
</li>
<li>
Some filesystems are not <strong>case sensitive</strong>, meaning two file names that differ only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a normative MUST, SHOULD, or MAY in this last item?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a significant question. We either allow a degree of incompatibility or we are rather restrictive. Neither is very appealing. Bagit takes the first approach but gives a description of the issue: https://tools.ietf.org/html/draft-kunze-bagit-17#section-6.1.1.1
so I think I would tend toward that approach. A link to BagIt might be nice here but essentially this is the tack @ahankinson has taken in the PR

Access Control Lists or Hidden files.
</li>
<li>
The <strong>character encoding</strong> of the filesystem and Inventory SHOULD be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what it means to talk about character encodings of a filesytem -- I think our expectation here is bytestream fidelity, without that all is lost! I would thus rephrase this paragraph to talk only about inventory files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use 'Unicode-compatible' as the language.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have filenames that are encoded in a non-unicode compatible way, some transformations will be needed.

</li>
<li>
The <strong>character encoding</strong> of the filesystem and Inventory SHOULD be
Unicode-compatible, either UTF-8, UTF-16, or UCS-2. Implementers may experience problems
Copy link
Contributor

@zimeon zimeon Oct 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON itself is defined over UTF-8, UTF-16 or UTF-32 (with either byte order for UTF-16 or UTF-32), see https://tools.ietf.org/html/rfc4627#section-3 . I do not know whether all JSON parsers have good support for all of these encodings. There is no note about UCS-2 encoding so I'm not sure how we imagine that to be handled? I feel a little out of my comfort zone with this question but I think we need to be more explicit and tie to the JSON spec. The current text seems to raise more questions than it answers in my mind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I'm new here but wanted to add that it may be helpful to be more specific about Unicode Normalization, particularly given the difficulties encountered by folks working on BagIt. (See here).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example @srerickson raises seems to be when one is which the user was using BagIt on a local system (or perhaps the now defunct Apple Server) which generated the issue. Since OCFL, at least in my mind, would be used by systems not people packaging things up, I can't imagine a situation where this would happen. Are there other cases or systems where this might happen? Its been a while since someone let me on a server so I admit to being ignorant...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rosy1280 I think the edge case to consider is when a repository is rebuilt on a file system that handles filenames differently than the filesystem the OCFL Object was created on. In that situation (as in the example with BagIt) it might be possible for the filenames in the inventory to differ from the actual filenames (even though they both look the same, visually).

or "colon" (':') as a path delimiter.
</li>
<li>
<strong>File permissions</strong> MAY be applied to files in an OCFL Object; however, implementers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File permissions are unavoidably applied to any file in all current filesystems I know about so the MAY here seems odd. I also don't think we should introduce fuzzy terms like ACLs and hidden files. I think we should make a simpler statement along the lines of:

File permissions are not portable across filesystems and are not expected to be preserved by OCFL clients.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that @neilsjefferies has language similar to this and I remember a discussion in the September F2F meeting along these lines. So 👍 to @zimeon 's suggestion

integrating files from filesystems that use other encodings.
</li>
<li>
Some filesystems are not <strong>case sensitive</strong>, meaning two file names that differ only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a significant question. We either allow a degree of incompatibility or we are rather restrictive. Neither is very appealing. Bagit takes the first approach but gives a description of the issue: https://tools.ietf.org/html/draft-kunze-bagit-17#section-6.1.1.1
so I think I would tend toward that approach. A link to BagIt might be nice here but essentially this is the tack @ahankinson has taken in the PR

@zimeon
Copy link
Contributor

zimeon commented Oct 13, 2018

I wonder whether some of these questions should be elevated to issues for discussion?

@neilsjefferies
Copy link
Member

Windows is a real pain here. Under the hood, NTFS allows / and \ to be interchangeable directory separators, and it is also case sensitive and Unicode supporting. However, many of its user space tools are differ since they also support FAT variants with their variable handling of these aspects.

@neilsjefferies
Copy link
Member

Actually, in the light of this (per folder case sensitivity) https://www.windowscentral.com/how-enable-ntfs-treat-folders-case-sensitive-windows-10

...Can we just tell NTFS to go forth and multiply....?

@rosy1280
Copy link
Contributor

I agree with @zimeon if we didn't talk about this in the last Editor's Meeting we should talk about it in the next one.

@ahankinson
Copy link
Contributor Author

Editors call: 05/12/2018 After discussion we'll close this and break specific discussions out to other tickets / PRs.

@ahankinson ahankinson closed this Dec 5, 2018
@awoods awoods deleted the fixed-212-filesystem branch December 7, 2018 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants