Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow illegal characters / ? < > \ : * | " and so on in file and folder names #27891

Closed
RuudschMaHinda opened this issue Jul 9, 2021 · 22 comments
Labels
0. Needs triage Pending check for reproducibility or if it fits our roadmap client: 💻 desktop enhancement needs info stale Ticket or PR with no recent activity

Comments

@RuudschMaHinda
Copy link
Member

Nextcloud allows characters in file names and folder names, which are illegal in some operating systems, or all operating systems.

For some reason users in one of my installs use Asterisks, some use colons, I even found pipes and question marks, let alone quotation marks, and ampersands.

Nextcloud itself should not allow naming files or folders with those characters in them. Windows clients basically refuse to work if a folder name has a colon in them. Renaming said folder stops synching alltogether because windows can't handle it.

I suggest to not allow those special characters when creating files and folders within the webinterface. Files and folders with those characters in the sync folder should be renamed and said character removed before it gets distributed to other clients on different OSs.

To be honest I don't even know how some users manage to get some of these characters in there. It's not been the first time, and I just wrote a third email about this issue to this team. They are very learning resistant.

@RuudschMaHinda RuudschMaHinda added enhancement 0. Needs triage Pending check for reproducibility or if it fits our roadmap labels Jul 9, 2021
@alx-tuilmenau
Copy link
Contributor

I find this difficult because there is a problem with forbidding characters when non-Windows clients synchronize and these characters are used in the file names. In this case, the sync client on Linux/other (where filenames can contain any character) must handle this, maybe running in conflicts, if 2 names are only differ in a special character. Wouldn't it be better if the Windows client does this, because Windows is the only system which has problems with this? When downloading / saving a file, the browser in Windows simply throws away all special characters from the file name in the content disposition header that are not allowed.

@RuudschMaHinda
Copy link
Member Author

RuudschMaHinda commented Jul 9, 2021

I don't know where one would start to address this problem. Fact is, in a heterogenous OS environment all the Windows users that are locked in to Windows because of certain vendors, get a messed up nextcloud-folder on their drive.

Try it out: Create a Folder name with a colon in the middle of it. "Project name : YYYY" put the year in for example. Then use the windows client. Then rename it from the web-interface, or even the original OS you created that on. The folder on the windows drive will be messed up, and the client throws errors.

This is making it unusable for the computer illiterate who apparently never visited Basics in Computer Use 101. We have to account for them somehow. No matter how many emails I write, they mess it up every few months and then start complaining that their nextcloud isn't working.

Basically because of this, I can't with all confidence say that the nextcloud client is safe to use on Windows. Maybe it is actually a "bug"? Because it makes the windows client bug out. Might open a bug report on the client team, if this is not a feature the server crew can address. But since those characters can't be used in Windows, I doubt they can do anything about it at all. Which is why I put it as a feature request.

@alx-tuilmenau
Copy link
Contributor

I tried to reproduce with Windows Client 3.2.3: It's ignoring files and folders with a ':' or '*' in its name, it does not show or sync them. There is only a message, "files from ignore list and symblic links are not syncronized" (my ignore list is emtpy, maybe this is set internal somewhere). Renaming a folder in the WebUI from a "normal" name to a name with a colon removes the folder during the next sync on windows completely.

Getting a message on renaming in WebUI (like the message if the new name contains "/") , which can be deactived in the (personal) settings, may be a good solution. Renaming in the WebUI is not the problem that I wanted to address with my first comment, but that there are problems with clients on other platforms if these characters in filenames are no longer supported in the backend.

Now, the backend supports almost all characters in filenames. Of course, Windows can only use a part of it, but the Windows client will also never create names with forbidden characters. Other clients may use their own character set. If you limit the backend, every linux/android/macos/other client has to map its local names to limited character set, if some of the windows special chars in it. Of course, if you want to sync between different OS, you should only use safe characters.

My last note (maybe offtopic), there are a lot of other windows "features", you may also want to block on windows, like device names (nul, con, ...) or special folders with GUID names ( CP.{ED7BA470-8E54-465E-825C-99712043E01C} ). Or file names which differ only in uppercase / lowercase.

@alexanderdd
Copy link
Contributor

@RuudschMaHinda
Copy link
Member Author

RuudschMaHinda commented Jul 13, 2021

A new discovery:
Folders with a Space as the last character do not show up in the Win-Sync-Client - Same customer.

@solracsf solracsf changed the title Disallow illegal characters / ? < > \ : * | " and so on in file and folder names Disallow illegal characters / ? < > \ : * | " and so on in file and folder names Jul 16, 2021
@solracsf
Copy link
Member

@mgallien @FlexW maybe you have some input here?

@szaimen
Copy link
Contributor

szaimen commented Aug 8, 2021

I am not sure if this feature request is feasible.

cc @nextcloud/server-triage for more opinion on this.

@RuudschMaHinda
Copy link
Member Author

concerning feasability:
Have a pop-up in the web interface that lets the user know "You can not use / ? < > \ : * | " or have a leading or trailing space in folder and file names." and just not allow finishing the naming of the file or folder until the input criteria have been met.

@skjnldsv
Copy link
Member

skjnldsv commented Aug 9, 2021

For the front yes, but coming from the dav backend, it's different

@RuudschMaHinda
Copy link
Member Author

RuudschMaHinda commented Aug 9, 2021

I still maintain that this is something that must be addressed. Until then, I created a workaround for myself:

Using Flow External Scripts I created the following rule:

When file is created, renamed, copied
and User is not member of Temp-Ban-Group
Run script rename 's/[\?\<\>\:\@\*\|\$]/_/g' /path/to/datafolder/%n && php /path/to/nextcloud/occ files:scan %o

This replaces any of the offending characters with an underscore, and then rescans the userfolder. This can take a while depending on the amount of files and folders. It also depends on the hardware. In my case I had to install rename with apt install rename.

This also seems to be working with group folders.

Edit: This does NOT work with folder names - only files! Flow currently can't be triggered by folders being created/uploaded/renamed.

@ghost
Copy link

ghost commented Sep 8, 2021

This issue has been automatically marked as stale because it has not had recent activity and seems to be missing some essential information. It will be closed if no further activity occurs. Thank you for your contributions.

@ghost ghost added the stale Ticket or PR with no recent activity label Sep 8, 2021
@allexzander
Copy link

It would be also nice to introduce a trimming mechanism such that all new uploads would get sanitized of trailing spaces. Currently, it is bringing problems to Windows users when syncing down files/folders created with trailing spaces via the Linux terminal and some of the command-line tools. @AndreasErgenzinger Could we maybe add this to the board, or, at least add one card for the trailing space trimming feature? @tobiasKaminsky @mgallien @FlexW

@ghost ghost removed the stale Ticket or PR with no recent activity label Sep 9, 2021
@mgallien
Copy link
Contributor

mgallien commented Sep 9, 2021

@mgallien @FlexW maybe you have some input here?

we are going to have to do the file name sanitation on the desktop client side to ensure proper integration with End to End Encryption
support from the server would be nice to have (like proper propagation of the same setting to all Files clients)

@ghost
Copy link

ghost commented Oct 9, 2021

This issue has been automatically marked as stale because it has not had recent activity and seems to be missing some essential information. It will be closed if no further activity occurs. Thank you for your contributions.

@ghost ghost added the stale Ticket or PR with no recent activity label Oct 9, 2021
@ghost ghost closed this as completed Oct 23, 2021
@RuudschMaHinda
Copy link
Member Author

RuudschMaHinda commented Oct 23, 2021

Welp, the windows client now at least gives error messages about illegal characters. I also gathered some commands and installed "rename". Will write a script with these commands that triggers every night.

The commands I have gathered (also includes some legal characters):

find /path/to/data/ -name '*\?*' | rename -v -n 's/[\?\!\<\>\:\@\*\|]/_/' #replaces ? with _
find /path/to/data/ -name '*\:*' | rename -v -n 's/[\?\!\<\>\:\@\*\|]/_/' #replaces : with _
find /path/to/data/ -name '*\\*' | rename -v -n 's/[\?\!\<\>\:\@\*\|]/_/' #replaces \ with _ -- might be useless?
find /path/to/data/ -name '*\**' | rename -v -n 's/[\?\!\<\>\:\@\*\|]/_/' #replaces * with _
find /path/to/data/ -name '*\>*' | rename -v -n 's/[\?\!\<\>\:\@\*\|]/_/' #replaces > with _
find /path/to/data/ -name '*\<*' | rename -v -n 's/[\?\!\<\>\:\@\*\|]/_/' #replaces < with _
find /path/to/data/ -name '*\"*' | rename -v -n 's/[\?\!\<\>\:\@\*\|]/_/' #replaces " with _
find /path/to/data/ -name '*\|*' | rename -v -n 's/[\?\!\<\>\:\@\*\|]/_/' #replaces | with _
find /path/to/data/ -name '*\!*' | rename -v -n 's/[\?\!\<\>\:\@\*\|]/_/' #replaces ! with _
find /path/to/data/ -name '* ' | rename -v -n 's/ *$//' #deletes trailing spaces
find /path/to/data/ -name ' *' | rename -v -d -n 's/ *//' #deletes leading spaces
find /path/to/data/ -name '*.' -type f,d | rename -v -n 's/.$//' #deletes trailing periods
sudo -u www-data php7.4 /var/www/nextcloud/occ files:scan --all &&  sudo -u www-data php7.4 /var/www/nextcloud/occ files:scan-app-data

Observe the -n after the rename command .. this makes this a test run! Once you're happy with the shown result, remove the -n.
Also the rename commands need to be cleaned up still for just the characters searched for?
Some commands must be run several times until all - for example - trailing periods have been deleted. i.e. "foldername with open sentence..." needs the trailing periods command to be run three times. Haven't worked out the script yet.

@labor4
Copy link
Contributor

labor4 commented Nov 24, 2021

I wanna add another perspective (Macos Sync Client Side)

  • syncing Macos->NC->externalSAMBA is here, and interesting
  • dropping most of weblinks (website titles containing an "|") into a synced folder creates this case (on Macos)
  • Further Apple uses special "¨A" double character combos to build Umlauts. Not sure if this is the case here.
  • maybe unrelated: beware of the confusing case of path delimiters (":" and "/") on Macos, as they are exchanged vice-versa in "Finder" vs underlying FS on a Macos itself. touch file:name.txt in Macos terminal will show file/name.txt in Macos Finder... (Current Version 3.3.6 macOS client doesn't know it yet)

my opinion as a sysadmin/senior macos end user supporter (no DEV)
the topic scares me. if there was no facility to offload this whole old story, I would rather enforce legalization client side, warn the user, and try to respect only characters that are derived from language/culture. my customers would accept the lecturing as long as their files are safely handled down the pipeline. Macos user are nowadays often forced to work with real samba (non-macos servers), so it is known.

I am sceptic against a full-auto-rename method however, because sometimes folders of files are compendiums of a project that are then unlinked without knowing. Edge case, but they multiply strongly.

One maybe not so bad solution could be, to:

  • run a mandatory GUI-based legalizer in a before/after Left/Right fashion for the user to agree or stop
  • empower the sync client to interrupt the user by alert, when the file is being created (learning effect)

That way a user could see whats going to happen, and sign off the action.

@alexanderdd
Copy link
Contributor

@acsfer @skjnldsv @szaimen
can you please reopen? Was closed by stale bot.

@alexanderdd
Copy link
Contributor

Can someone please triage? Or does someone know who to ping so it can be triaged? What info is still needed (label needs info)?

This was closed by stale bot, imo it is still relevant and not solved.

@hardviper
Copy link

@acsfer @skjnldsv @szaimen
Please reopen, issue is still relevant.

@szaimen
Copy link
Contributor

szaimen commented Jun 2, 2023

Please open a new issue for this with up-to-date information. Thanks!

@khoschi
Copy link

khoschi commented Aug 18, 2023

Well, this is even worse, as you can trap linux shell escaping on the server side.

  1. webinterface foldername -> filesystem foldername
  2. default escape is nothing -> bla
  3. illegal chars result still in ticks -> 'bla*bla'
  4. use ticks in filename results in doublequotes -> "bla'bla"
  5. use ticks AND doublequotes to add backslash -> 'bla" ''' "bla' (this folder in webinterface is bla doubletick tick doubletick space bla)
  6. So now go for full monty and use tick, doublequotes, space and backslash to have escaping fail -> error message.

Edit: github webinterface will break as well when trying to quote the resulting string, as # will be replaced. Try yourself.

So you have code to fix typography in place, but I suggest to block all special chars from all operating systems by default.

In scripting languages like bash or php every not-so-perfect input escaping while reading folders from nextcloud will end in desaster. And of course the clients will stop working for versions 2-4. And you break every other account by sharing these folders into them.

@hardviper
Copy link

@khoschi This issue is closed. I propose to go to a similar one created later, which is open.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0. Needs triage Pending check for reproducibility or if it fits our roadmap client: 💻 desktop enhancement needs info stale Ticket or PR with no recent activity
Projects
None yet
Development

No branches or pull requests