Skip to content

Cross Platform File Handling

Thomas Müller edited this page Apr 2, 2014 · 9 revisions

Introduction

ownCloud faces problems with syncing files between various operating system platforms because each system has specific requirements regarding

  1. Case sensitivity versus case preservance versus case insensitivity.
  2. The length of an absolute file name
  3. The characters which are allowed within a file- or directory name
  4. Other very platform specific problems like the encoding of drive letters on Windows or filenames with a trailing space.

Solutions:

The ownCloud ecosystem can be seen as a an ownCloud core with various data storage backends and a different clients such as desktop clients, mobile app clients or third party WebDAV clients.

Maxname

In ownCloud core and "on the wire" between core and backends we always work with so called Maxnames. A Maxname is defined as a file name with up to 32767 unicode chars of length, with case sensitive file and path names. The name is UTF8 encoded, in Unicode Normalization Form C (NFC), as native on Linux. Path delimiters are / and drive numbers as used on Win32 are encoded like //d/tmp/foobar [FIXME: drive letters correct?]

Crippledname

On the clients or on the storage backends so called Criplednames have to be used to store and work with the file data. The level of crippling depends on the ability of the target system. For example, if the target system is not able to maintain case sensitivity, the incoming interface has to convert the Maxname to the Crippledname accordingly.

After a crippled name was computed, the crippling software has to check if the crippled name is already taken by another file on the target platform. If so, a new name has to be computed.

Solutions for the specific problems

Case insensitivity:

If the underlying system does not support case sensitive file- or folder names, the Maxname is kept as reported by either the server or the system functions, yet if the filename collides with an already existing file, the Crippledname is created from the original name, appended with the term " (case conflict)" appended by the original extension.

Example: On a MacOSX system, there is a file TODO.txt. Now, on the server a file called Todo.txt appears. The crippled name for Todo.txt now ends up with as Todo (case conflict).txt

Path Length

We support path lengths up to 32k. if a system does not fully support that, like Windows with some APIs, the client affected by that has to use only APIs that support the long pathes. That way this problem becomes isolated to that client.

Not Supported Characters

As the server core only deals with the Maxname of a resource, it is again up to the crippler interface to replace the not allowed characters. For some chars, like the colon ':' the replacement can be just another char like '-'. There can be a transition table to convert the chars.

On more complex problems, the iconv translit feature could be used.

Other, system specific problems

The process of crippling a name is very much depending on the system/platform whatfor a file name is crippled. So all specifica of a single system can and must be considered there.

Things still to consider:

  • Should we limit the utf8 namespace in Maxnames?

[1] http://msdn.microsoft.com/en-us/library/aa365247.aspx

[2] http://support.grouplogic.com/?p=1607

[3] https://dl.getdropbox.com/u/228121/checkfile.php

[4] http://tokyoimage.wordpress.com/2011/06/10/enable-a-windows-fileserver-to-support-linux-filenames-with-invalid-characters/