Skip to content

Cross Platform File Handling

Daniel Molkentin edited this page Apr 3, 2014 · 9 revisions

Introduction

ownCloud faces problems with syncing files between various operating system platforms because each system has specific requirements regarding

  1. Case sensitivity versus case preservation versus case insensitivity. Case sensitivity means that a file system is able to distinguish between the file named TODO.txt and Todo.txt, for example most file systems Linux. On case preserving file systems, both files could be created individually, but if one exists, the other can not be added because the system can not distinguish between these two. This is the default on Mac OS X's HFS+ and Windows' NTFS/VFAT.
  2. The length of an absolute file name: Even if there is a physical limit of 32k chars, there are also APIs on Windows which support far less path lengths (255 chars).
  3. The character set which are allowed within a file- or directory name, think Umlauts, symbols and stuff, but also common chars forbidden for file names like the colon.
  4. Other very platform specific problems like the encoding of drive letters on Windows or file names with a trailing space.

Solutions:

The ownCloud ecosystem can be seen as a an ownCloud core with various data storage backends and different clients such as desktop clients, mobile app clients or third party WebDAV clients.

Full File name

In ownCloud core and "on the wire" between core and backends we always work with so called Full Names. A Full Name is defined as a file name with up to 32767 unicode chars of length, with case sensitive file and path names. The name is UTF8 encoded, in Unicode Normalization Form C (NFC), as native on Linux. Path delimiters are / and drive numbers as used on Win32 are encoded like //d/tmp/foobar [FIXME: drive letters correct?]

Platform Name

On the clients or on the storage backends the Platform Names have to be used to store and work with the file data. The level of crippling depends on the ability of the target system. For example, if the target system is not able to maintain case sensitivity, the incoming interface has to convert the Full Name to the Platform Name accordingly.

After a Platform Name name was computed, the interface software has to check if the Platform Name is already taken by another file on the target platform. If so, a new name has to be computed. The mapping between Platform Name and Full Name is the responsibility of the interface software.

Solutions for the specific problems

Case insensitivity:

If the underlying system does not support case sensitive file- or folder names, the Full Name is kept as reported by either the server or the system functions, yet if the filename collides with an already existing file, the Platform Name is created from the original name, appended with the term " (case conflict)", appended by the original extension.

Example: On a MacOSX system, there is a file TODO.txt. Now, on the server a file with Full Name Todo.txt appears. The Platform Name for Todo.txt on MacOSX ends up with as Todo (case conflict).txt

Path Length

ownCloud Core supports path lengths up to 32k. if a system does not fully support that, like Windows with some APIs, the client affected by that has to use only APIs that support the long pathes. That way this problem becomes isolated to that client.

Not Supported Characters

As the server core only deals with the Full Name of a resource, it is again up to the interface software to replace the not allowed characters. For some chars, like the colon ':' the replacement can be just another char like '-'. There will be a transition table to convert the chars.

On more complex problems, the iconv translit feature will be used.

Other, system specific problems

The process of mapping the Full Name on the Platform Name is very much depending on the system/platform what for a file name is computed. So all specific features of a single system can and must be considered there.

Things still to sort out:

  • Should we limit the utf8 namespace in Full Names?
  • Document or correctly deal with Fringe cases resulting from file systems that differently from the rest of the platform, like HFS+ file systems with case sensitivity enabled or JFS without case sensitivity

Interesting Links: