Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typed blobs #729

Merged
merged 2 commits into from
Oct 22, 2019
Merged

Typed blobs #729

merged 2 commits into from
Oct 22, 2019

Conversation

flub
Copy link
Member

@flub flub commented Oct 19, 2019

I never knew what kind of stuff was in one of those strings referring to blobs, so I thought fixing it with types would be a good thing. Turns out that was more involved than hoped for.

Anyway, this tries to be strict about blobs by creating a BlobObject. This object should also make it easier to in the future not store all blobs into one directory or even into an sqlite store. Furthermore since blobs get passed through the params system a lot, the ParamsFile was added, basically any filename read from params should be parsable into that enum, allowing you to know if you're dealing with a pathname or already with a blob.

This doesn't always make things less verbose on call sites, I apologise for that. In the future some of this could improve by making the storage more explicit, e.g. Params could have FromParams/ToParams traits. OTOH maybe Params should be moved to columns in the db directly, anyway I'm not making any calls on those things yet. The upshot of this is that you need to manually remember to parse a filename retrieved from params with ParamsFile for now.

Some places still exists where filenames are retrieved from Params without the ParamsFile bit. They don't have access to a context object and after checking them they already are safe and I felt like adding the complexities of pushing context into those APIs was not a good idea. Again I believe later improvements might end up bypassing some of this, or maybe they wont.

Lastly I must admit I experimented with errors here. I wanted a bespoke error object and wanted to have the cause carried along so you can see where errors come from as well as have backtraces. Again, currently nothing uses the causes and kinds because everything gets dumped in the global Error type. But I think in the future more things could have custom errors and causes and backtraces, I expect this will become especially nice once the core has it's own log it can write.

Finally, (phew, you're still reading), I realise this is bending the "freeze for a release" a bit. But than it's not the only improvement which hasn't been directly related to a bug before. And it's been semi-frozen for ages...

Thanks for reviewing!
The diff is a lot larger than I'd like so here some review pointers: BlobObject in context.rs is important. If you don't care about error handling much skip all the BlobError/BlobErrorKind/BlobErrorData stuff, you can see how it's used which is what matters. Secondly ParamsFile in params.rs is important.
After that the rest is mostly applying this stuff to current call sites, sometimes care must be taken not to change the logic.

@hpk42
Copy link
Contributor

hpk42 commented Oct 20, 2019

Hum, not sure i understand some of the changes on first reading -- will see to look again more thoughtfully. One thing i am concerned about is that if we raise the level of Rust-understanding required to read and make changes that we reduce the circle of people actually doing it. The other thing is that i prefer to touch/rustify areas where we have troubles/issues or plans. Blobfiles have not been reported or discussed as problematic since the last refactoring. And at my first (admittedly very quick) glance it didn't feel like this PR simplifies code.

src/chat.rs Outdated Show resolved Hide resolved
src/chat.rs Outdated Show resolved Hide resolved
src/chat.rs Outdated Show resolved Hide resolved
src/chat.rs Outdated Show resolved Hide resolved
src/context.rs Outdated Show resolved Hide resolved
src/context.rs Outdated Show resolved Hide resolved
src/context.rs Outdated Show resolved Hide resolved
src/context.rs Outdated Show resolved Hide resolved
src/context.rs Outdated
self.inner.kind
}

fn new_create_failure(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_failure would be a better name

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for the others down below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced here, these functions create instances with specific error kinds set, in this case BlobErrorKind::CreateFailure. Using from_failure() suggests that the struct is made based on data from another struct, which it is not, everything is passed in as args, i don't think the cause counts truly as something that warrants a from_*()

Copy link
Member Author

@flub flub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @dignifiedquire ! I addressed the comments, not rebased onto master yet because master seems currently broken for me. but once that works rebasing shouldn't be a problem.

src/chat.rs Outdated Show resolved Hide resolved
src/chat.rs Outdated Show resolved Hide resolved
src/chat.rs Outdated Show resolved Hide resolved
src/chat.rs Outdated Show resolved Hide resolved
src/context.rs Outdated Show resolved Hide resolved
src/context.rs Outdated Show resolved Hide resolved
src/context.rs Outdated Show resolved Hide resolved
src/context.rs Outdated Show resolved Hide resolved
src/context.rs Outdated
self.inner.kind
}

fn new_create_failure(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced here, these functions create instances with specific error kinds set, in this case BlobErrorKind::CreateFailure. Using from_failure() suggests that the struct is made based on data from another struct, which it is not, everything is passed in as args, i don't think the cause counts truly as something that warrants a from_*()

@flub
Copy link
Member Author

flub commented Oct 20, 2019

@hpk42 I don't think this uses any new rust construct that aren't used already in the codebase. More generally I'm also not convinced it's worth to avoid idiomatic code to make it simpler, but than what is idiomatic is probably a tenuous debate anyway :)

wrt to whether this should be refactored and whether it should be done now... who knows. I did this because I never knew how to handle blobs and never knew what expectations I could have from them. It doesn't (yet) make the code shorter, but it does make it stricter. And all the tests pass ;)

src/chat.rs Outdated Show resolved Hide resolved
src/chat.rs Show resolved Hide resolved
src/dc_receive_imf.rs Outdated Show resolved Hide resolved
@hpk42
Copy link
Contributor

hpk42 commented Oct 21, 2019

FWIW i like this now better :)

General sidenote: why do we even store $BLOBDIR at all into the DB? All file paths are relative to BLOBDIR so it's totally redundant and just complicating code.

Floris Bruynooghe added 2 commits October 21, 2019 22:06
This creates a specific type for blobs, with well defined conversions
at the borders.  It also introduces a strong type for the Param::File
value since that param is often used used by the public API to set
filenames using absolute paths, but then core changes the param to a
blob before it gets to the database.

This eliminates a few more functions with very mallable C-like
arguments behaviour which combine a number of operations in one.
Because blob filenames are stored so often in arbitrary strings this
does add more code when receiving those, until the storage is fixed.

File name sanitisation is now deletated to the sanitize-filename crate
which should do a slightly better job at this.
Turns out that anyone that uses these either justs wants a file or
wants a blob.  Consolidate those patterns into one place and simplify
all the callers.
@flub
Copy link
Member Author

flub commented Oct 21, 2019

Agree with that storing of the $BLOBDIR in the database. Am not convinced it's worh the churn to change it though, code would have to accept both, and/or we'd have to migrate the database. But if we did want to do that we now have a single clearly defined interface to do so ;)

I think this can be considered for another review round/approval/merging (if test pass - the ones I ran locally did)

@r10s
Copy link
Member

r10s commented Oct 21, 2019

General sidenote: why do we even store $BLOBDIR at all into the DB? All file paths are relative to BLOBDIR so it's totally redundant and just complicating code.

for historical reasons :)

before 2018-08, external paths were stored in the database, normally absolute but theoretically also relative. the idea was to avoid duplicated files, eg. if a larger video is sent, this generally makes some sense. however, at some point we gave this up as it was too confusing when a message is sent out and the sent file is later deleted - or even modified.

on 2018-08 and database migration v41, the external paths were copied and paths were converted to $BLOBDIR, we use the prefix to be sure, the switch is done and not to copy a file twice accidentally (aborted conversion or so) (i've just learnt the word for that: Idempotence iirc ;)

however, today, yip, might be that $BLOBDIR is superfluous. not totally sure as it is used in several places, eg. also in the config-section for the self-avatar or for group-images.

@hpk42 hpk42 merged commit 5d79690 into master Oct 22, 2019
@flub flub deleted the flub-typed-blobs branch November 3, 2019 12:13
@hpk42 hpk42 restored the flub-typed-blobs branch January 29, 2020 15:26
@flub flub deleted the flub-typed-blobs branch February 6, 2020 21:09
@hpk42 hpk42 restored the flub-typed-blobs branch February 11, 2020 17:45
@hpk42 hpk42 deleted the flub-typed-blobs branch February 21, 2020 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants