Skip to content

Cross Platform File Handling

Thomas Müller edited this page Apr 2, 2014 · 9 revisions

Filename encoding

ownCloud faces problems with syncing files between various operating system platforms because each system has specific requirements regarding

  1. Case sensitivity versus case preservance versus case insensitivity.

  2. The length of an absolute file name

  3. The characters which are allowed within a file- or directory name

  4. Other very platform specific problems like the encoding of drive letters on Windows or filenames with a trailing space.

Solutions:

The ownCloud ecosystem can be seen as a an ownCloud core with various data storage backends and a different clients such as desktop clients, mobile app clients or third party WebDAV clients.

Maxname

In ownCloud core and "on the wire" between core and backends we always 
work with so called Maxnames. A Maxname is defined as a file name with 
up to 32767 unicode chars of length, with case sensitive file and path 
names. The name is UTF8 encoded, in Unicode Normalization Form C (NFC), 
as native on Linux. Path delimiters are / and drive numbers as used on 
Win32 are encoded like //d/tmp/foobar [FIXME: drive letters correct?]

Crippledname

On the clients or on the storage backends so called Criplednames have to be used to store and work with the file data. The level of crippling depends on the ability of the target system. For example, if the target system is not able to maintain case sensitivity, the incoming interface has to convert the Maxname to the Crippledname accordingly.

After a crippled name was computed, the crippling software has to check if the crippled name is already taken by another file on the target platform. If so, a new name has to be computed.

Solutions for the specific problems:

  1. Case insensitivity:
If the underlying system does not support case
sensitive file- or folder names, the Maxname is kept as reported by either
the server or the system functions, yet if the filename collides with an
already existing file, the Crippledname is created from the original name, 
appended with the term " (case conflict)" appended by the original extension.

Example: 
On a MacOSX system, there is a file TODO.txt. Now, on the server a file 
called Todo.txt appears. The crippled name for Todo.txt now ends up with
as Todo (case conflict).txt

2. Path Length
~~~~~~~~~~~~~~
We support path lengths up to 32k. if a system does not fully support that, 
like Windows with some APIs, the client affected by that has to use only
APIs that support the long pathes. That way this problem becomes isolated
to that client.

3. Not Supported Characters

As the server core only deals with the Maxname of a resource, it is again up to the crippler interface to replace the not allowed characters. For some chars, like the colon ':' the replacement can be just another char like '-'. There can be a transition table to convert the chars.

On more complex problems, the iconv translit feature could be used.

  1. Other, system specific problems
The process of crippling a name is very much depending on the system/platform
whatfor a file name is crippled. So all specifica of a single system can and
must be considered there.

Things still to consider:
- Should we limit the utf8 namespace in Maxnames?
- 




[1] http://msdn.microsoft.com/en-us/library/aa365247.aspx
[2] http://support.grouplogic.com/?p=1607
[3] https://dl.getdropbox.com/u/228121/checkfile.php
[4] http://tokyoimage.wordpress.com/2011/06/10/enable-a-windows-fileserver-to-support-linux-filenames-with-invalid-characters/