Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix same file upload #4746

Merged
merged 2 commits into from
Jul 12, 2024
Merged

Fix same file upload #4746

merged 2 commits into from
Jul 12, 2024

Conversation

kobergj
Copy link
Contributor

@kobergj kobergj commented Jul 1, 2024

Fixes a bug where uploading the same file multiple times leads to orphaned blobs in the blobstore.

Reva part of owncloud/ocis#9498

@kobergj kobergj requested review from a team, labkode and glpatcern as code owners July 1, 2024 13:45
@kobergj kobergj force-pushed the FixSameFileUpload branch 2 times, most recently from 740820c to 67b3f12 Compare July 1, 2024 15:10
}

// a revision with this mtime does already exist.
// If the blobs are the same we can just delete the old one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer: I am not very familiar with the upload code, so I might be missing something obvious.

Why can we be sure here that this is really the same blob, when we only look at the mtimes and blobsizes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I asked myself the same. But what else CAN we check? The etag is just a hash over id and mtime, which are both the same here.

At the moment we are just silently discarding the old blob without any further checks. So this is at least a improvement. However if you have any good idea how we can savely say that this is the same blob I am very happy to check it here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already generate checksums for the content of the upload at

if session.info.MetaData["checksum"] != "" {
var err error
parts := strings.SplitN(session.info.MetaData["checksum"], " ", 2)
if len(parts) != 2 {
return errtypes.BadRequest("invalid checksum format. must be '[algorithm] [checksum]'")
}
switch parts[0] {
case "sha1":
err = checkHash(parts[1], sha1h)
case "md5":
err = checkHash(parts[1], md5h)
case "adler32":
err = checkHash(parts[1], adler32h)
default:
err = errtypes.BadRequest("unsupported checksum algorithm: " + parts[0])
}
if err != nil {
session.store.Cleanup(ctx, session, true, false, false)
return err
}
}
// update checksums
attrs := node.Attributes{
prefixes.ChecksumPrefix + "sha1": sha1h.Sum(nil),
prefixes.ChecksumPrefix + "md5": md5h.Sum(nil),
prefixes.ChecksumPrefix + "adler32": adler32h.Sum(nil),
}
. Could we just compare the new checksums with the previous ones?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use the checksums? For the upload I think the caller of updateExistingNode (CreateNodeForUpdate) already has the checksums. For the old I guess we can get them as well somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Good idea 👍 I managed to get the checksums from the xattrs. Also got rid of the blobsize check as it isn't needed any more.

BUT: At the moment I am only checking md5 sum. Should I check another one? Should I check all three?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you only want to check one I think I'd choose sha1. 🤷‍♂️

I guess at some point we need to switch to some less out-dated hashing algorithm, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can check two or all three if that makes sense. Just not sure if it makes sense 😃

I'll switch to sha1 meanwhile

Signed-off-by: jkoberg <jkoberg@owncloud.com>
Signed-off-by: jkoberg <jkoberg@owncloud.com>
Copy link
Contributor

@aduffeck aduffeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@kobergj kobergj merged commit ab146ae into cs3org:edge Jul 12, 2024
9 of 10 checks passed
@kobergj kobergj deleted the FixSameFileUpload branch July 12, 2024 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants