Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forbidden overwrite in public links with drop zone #2510

Closed
wants to merge 5 commits into from

Conversation

gmgigi96
Copy link
Member

@gmgigi96 gmgigi96 commented Feb 7, 2022

Closes #2480

@update-docs
Copy link

update-docs bot commented Feb 7, 2022

Thanks for opening this pull request! The maintainers of this repository would appreciate it if you would create a changelog item based on your changes.

@gmgigi96 gmgigi96 requested a review from a team as a code owner February 7, 2022 10:21
@gmgigi96 gmgigi96 changed the title Append a timestamp to filename when a collision is detected during upload Append a timestamp to filename when a collision is detected during an upload Feb 7, 2022
@labkode
Copy link
Member

labkode commented Feb 7, 2022

@gmgigi96 this only need to trigger inside a public link with "Uploader" role only, I think the current implementation applies to all writes. @ishank011 please confirm as well my assumption.

@phil-davis
Copy link
Contributor

Note: https://github.com/owncloud/core/blob/master/tests/acceptance/features/apiSharePublicLink2/uploadToPublicLinkShare.feature#L24

  Scenario: Uploading same file to a public upload-only share multiple times via new API
    # The new API does the autorename automatically in upload-only folders
    Given user "Alice" has created a public link share with settings
      | path        | FOLDER |
      | permissions | create |
    When the public uploads file "test.txt" with content "test" using the new public WebDAV API
    And the public uploads file "test.txt" with content "test2" using the new public WebDAV API
    Then the HTTP status code should be "201"
    And the following headers should match these regular expressions
      | ETag | /^"[a-f0-9:\.]{1,32}"$/ |
    And the content of file "/FOLDER/test.txt" for user "Alice" should be "test"
    And the content of file "/FOLDER/test (2).txt" for user "Alice" should be "test2"

And https://github.com/owncloud/core/blob/master/tests/acceptance/features/apiSharePublicLink2/uploadToPublicLinkShare.feature#L268 has a similar scenario.

Those scenarios define and explicit file-naming system that is used on oC10 - test (2).txt

The requirement here in reva and oCIS is different. I will adjust the core API test suite so that it covers the different requirement/expectation for reva-oCIS. The file name format looks like it will be:

test (2022-01-29 15:04:05).txt

with the date in the ISO format YYYY-MM-DD.

So the test can check that the file name matches that pattern.

Will there be any date-time localization possible? US-style MM-DD-YYYY appearing? Or are we making the format fixed as above?
("asking for a friend" to double-check that the requirement is "well defined")

Copy link
Member

@glpatcern glpatcern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of changes, one being more an observation/request for discussion ;-)

pkg/utils/utils.go Outdated Show resolved Hide resolved
pkg/eosclient/eosbinary/eosbinary.go Outdated Show resolved Hide resolved
@phil-davis
Copy link
Contributor

@gmgigi96 this only need to trigger inside a public link with "Uploader" role only, I think the current implementation applies to all writes. @ishank011 please confirm as well my assumption.

There are "regular" upload test scenarios, where a regular user is overwriting their own file, or a sharee is overwriting a file that has been shared with them with write access. Those test scenarios expect the file name to remain the same and the content to be overwritten. If the code here does not preserve that, then I expect some unexpected test fails.

@gmgigi96
Copy link
Member Author

gmgigi96 commented Feb 7, 2022

@labkode @phil-davis indeed here I'm changing the file name for all the uploads, I will fix it considering the "Uploader" role only.

@phil-davis for the format I think we could agree on a fixed one, to keep things as simple as possible, and maybe the one proposed by @glpatcern is a good format, so file.2006-01-02T15:04:05.txt. What do you think?

@phil-davis
Copy link
Contributor

@phil-davis for the format I think we could agree on a fixed one, to keep things as simple as possible, and maybe the one proposed by @glpatcern is a good format, so file.2006-01-02T15:04:05.txt. What do you think?

I am not so sure about ending up with 2 "." dots in the filename plus extension. That often confuses "ordinary users" like my mum. But the space character is also annoying to me (for nerds using the command line, or scripts it means that the file name has to be quoted or escaped or something special done with it).

@micbar @butonic @dragotin @pmaier1or anyone else. Comments on the "timestamp" format please.

@butonic
Copy link
Contributor

butonic commented Feb 7, 2022

um ... : is a reserved character on windows and cannot be used in directory or file names ... maybe stick to unix time? ... yes @phil-davis mum will be confused ... but 🤷

windows explorer appends (n+1) when a file already exists / a conflict occurs

@phil-davis
Copy link
Contributor

windows explorer appends (n+1) when a file already exists / a conflict occurs

yes, mum already gets confused. So I'm not sure that we will be able to "unconfuse" her when I give her an oCIS-reva-based cloud storage anyway ;)

@gmgigi96
Copy link
Member Author

gmgigi96 commented Feb 7, 2022

@butonic maybe the UNIX time is confusing not only for mum, what about 2018_02_03_07_30_00?

@gmgigi96 gmgigi96 marked this pull request as draft February 7, 2022 11:19
@butonic
Copy link
Contributor

butonic commented Feb 7, 2022

@butonic maybe the UNIX time is confusing not only for mum, what about 2018_02_03_07_30_00?

I don't think having the date in the filename adds any value:

  • the mtime of the file already contains the time when the conflict occured
  • the longer the appended string the earlier will we run into max filename length issues

I recommend using (n+1) ...

@phil-davis
Copy link
Contributor

I recommend using (n+1)

Note: see also the discussion in and around issue comment #2480 (comment)

"somebody"/"somebodies" will need to come up with an agreed requirement.

@pmaier1
Copy link

pmaier1 commented Feb 7, 2022

I'm fine with both, (n+1) and using the timestamp. I think timestamp has slightly better usability. See Dropbox example screenshot.
Screenshot from 2022-02-07 12-52-48

They do <user name> - <date> <time> - <name>. We usually don't know the user name in the current concept as file drop works based on anonymous access at this time.

@butonic
Copy link
Contributor

butonic commented Feb 7, 2022

lol ... so they use . to separate the time components

@ishank011
Copy link
Contributor

Move the checks to the ocdav service, for the PUT and POST handlers

Copy link
Contributor

@ishank011 ishank011 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just check if stat returned ok here https://github.com/cs3org/reva/blob/master/internal/http/services/owncloud/ocdav/put.go#L147. If it did, keep updating the appended timestamp to the path you pass to InitiateFileUpload

@gmgigi96 gmgigi96 changed the title Append a timestamp to filename when a collision is detected during an upload Forbidden overwrite in public links with drop zone Feb 8, 2022
@gmgigi96 gmgigi96 marked this pull request as ready for review February 8, 2022 11:53
@butonic
Copy link
Contributor

butonic commented Feb 9, 2022

This actually violates the PUT semantics, because we are not creating the resource requested in PUT. It cannot be retrieved with a subsequent GET request. IMO this should be fixed on the client side.

http://www.webdav.org/specs/rfc4918.html#put-resources explicitly states:

A PUT performed on an existing resource replaces the GET response entity of the resource. Properties defined on the resource may be recomputed during PUT processing but are not otherwise affected. For example, if a server recognizes the content type of the request body, it may be able to automatically extract information that could be profitably exposed as properties.

What happens when a file with the appended timestamp already exists, because two uploaders upload in the same second?

It seems like a POST request makes more sense here. File drop seems to be the exact usecase for RFC5995 - Using POST to Add Members to Web Distributed Authoring and Versioning (WebDAV) Collections

gmgigi96 and others added 2 commits February 11, 2022 09:34
Co-authored-by: Jörn Friedrich Dreyer <jfd@butonic.de>
Co-authored-by: Jörn Friedrich Dreyer <jfd@butonic.de>
@gmgigi96
Copy link
Member Author

This actually violates the PUT semantics, because we are not creating the resource requested in PUT. It cannot be retrieved with a subsequent GET request. IMO this should be fixed on the client side.

This is true, but here we are only modifying the behavior in a drop zone (where an user has only uploader pemission), so it makes no sense do a GET request as it will fail anyway as the user does not have the read permission.

What happens when a file with the appended timestamp already exists, because two uploaders upload in the same second?

Indeed that's a problem, one of the twos files will be overridden.

@phil-davis
Copy link
Contributor

Indeed that's a problem, one of the twos files will be overridden.

And we have this kind of problem in oC10 trashbin and versions. Those use 1-second timestamps and there is "trouble" if more than one action happens per second.

And anyway, someone might have uploaded an actual files called file.txt and file - 2022-02-14 16:30:40.txt a week ago, just to be an annoying person. And at exactly 2022-02-14 16:30:40 someone uploads file.txt again. To be sure, there needs to be a code path that, if the target filename with date-time-stamp already exists, then it has a system to generate another unique name - add (2) or (3)... to the end after the timestamp?

@tbsbdr
Copy link

tbsbdr commented Feb 21, 2022

I also favour the file (1).txt approach, mainly because of 3 reasons:

  1. Robust against same-seconds-problem
  2. Known: My mom has seen the parenthesis-schema already and has a clue why this happened
  3. Simple: Cognitive costs for reading the filename are comparatively low, because its short (compared to our other approaches)

Personally I don't like parenthesis and spaces in names, but I can handle it - no dealbreaker.

Any objections against the (n+1) approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Overwrite must be forbidden on public links with drop zone
8 participants