From ad9d20fe6ac6892094917b97c5cb416323665a73 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=B6rn=20Friedrich=20Dreyer?= Date: Mon, 9 Mar 2020 10:55:32 +0100 Subject: [PATCH] document current storage drivers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jörn Friedrich Dreyer --- docs/content/en/docs/Concepts/storages.md | 69 +++++++++++++++++++++++ 1 file changed, 69 insertions(+) create mode 100644 docs/content/en/docs/Concepts/storages.md diff --git a/docs/content/en/docs/Concepts/storages.md b/docs/content/en/docs/Concepts/storages.md new file mode 100644 index 00000000000..93bfd4f8c39 --- /dev/null +++ b/docs/content/en/docs/Concepts/storages.md @@ -0,0 +1,69 @@ +--- +title: "Storage Drivers" +linkTitle: "Storage Drivers" +weight: 10 +description: > + Storage drivers and their capabilities +--- + +{{< toc >}} + +## Aspects of storage drivers +A lot of different storage technologies exist, ranging from general purpose filesystems to software defined storage. Choosing any of them is making a tradeoff decision. Or, if a storage technology is already in place it automatically predetermines the capabilities thet reva can make available via the CS3 API. *Not all storage systems are created equal.* + +The CS3APIS connect Storage and Applications Providers, allowing them to exchange information about various aspects of storage. + +### Metadata storage +Using the CS3 API you can magane well known metadata like names, modification times, file size, ETags, and owner as well as arbitrary metadata. + +Depending on the underlying storage technology some operations may either be slow, up to a point where it makes more sense to disable them entirely. One example is a folder rename: on S3 a *simple* folder rename translates to a copy and delete operation for every child of the renamed folder. There is an exception though: this restriction only applies if the S3 storage is treated like a filesystem, where the keys are the path and the value is the file content. There are smarter ways to implement filesystems on top of S3, but again: there is always a tradeoff. + +### ETag propagation +An important aspect when considering the CS3 API for synchronization is that there is no delta API, yet. A client can however discover changes by recursively descending the tree and comparing the ETag for every node. If the storage technology supports propagating ETag changes up the tree, only the root node of a tree needs to be checked to determine if a discovery needs to be started and and which nodes need to be traversed. This allows using the storage technology itself to persist all metadata that is necessard for sync, without additional services or caches. + +### Trash +With the CS3 API files can be restored from a trash, if the underlying storage technology supports it, or if a special file layout is used to implement it. In the latter case all delete operations must move files to the trash location if they should be visible using the CS3 API. If you bypass the CS3 API and delete the file without moving it to the trash location (as in ssh to the storage and `rm`the file), the data is gone. + +### Versions +When the underlying storage technology supports it, the CS3 API also allows listing and restoring file versions. Capturing file versions is harder than a trish, because every file change must be recorded. Similar to the trash this can be done by a storage driver in reva, but when bypassing it versions will not be recorded, unless the storage technology itself has versioning support. + +### Activity History +Building an activity history requires tracking the different actions that have been performed, at least using the CS3 API, but preferably on the storage itself. This does not only include file changes but also metadata changes like renames and permission changes. Maybe even public link access. + +Since the majority of these actions is already persisted by the versions history it makes more sense to keep track of these events in an external append only data structure to efficiently add and query events, which is why an activity history is not part of the CS3 API. + +### Data storage +While File up and download are not part of the CS3 API they can be initiated with it. Initiation responses contain the target URL and allow clients to switch to a more suitable protocol. For download a normal GET request might be sufficient. For upload a resumable protocol like [tus.is](https://tus.io/) might make more sense. + +## Storage driver implementations in reva + +Reva comes with a few storage driver implementations. The following secrtions will list the known tradeoffs. + +### Local Storage Driver +- naive implementation for a local posix filesystem +- no ETag propagation + - could be done with an external ionotify like mechanism +- no trash +- no versions + +### EOS Storage Driver +- requires EOS as the storage +- relies on the EOS native ETag propagation +- supports EOS native trash and versions +- bypassing the driver is possible because all operations are implemented in EOS natively + +### OwnCloud Storage Driver +- uses the owncloud 10 datadirectory layout +- implements ETag propagation +- supports trash and versions when using the driver + - limitations for trash file length + - limitations for versions file length +- requires redis for efficient fileid to path lookup + +### S3 Storage Driver +- this implementation assumes keys reflect the path +- inefficient move operation, because every file has to be copied and deleted +- no ETag propagation + - if the storage technology supports notifications they could be used to update parent ETags +- trash not implemented, yet +- versions not implemented, yet