Skip to content

Commit

Permalink
Datatx tutorial (#3869)
Browse files Browse the repository at this point in the history
* Datatx tutorial

* Add #PR

---------

Co-authored-by: Antoon P <antoon@redblom.com>
  • Loading branch information
redblom and antoonp authored May 9, 2023
1 parent 1028891 commit 90ede7f
Show file tree
Hide file tree
Showing 2 changed files with 170 additions and 0 deletions.
4 changes: 4 additions & 0 deletions changelog/unreleased/datatx-tutorial.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Enhancement: Datatx tutorial

https://github.com/cs3org/reva/pull/3869
https://github.com/cs3org/reva/issues/3864
166 changes: 166 additions & 0 deletions docs/content/en/docs/tutorials/data-transfer-tutorial.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
---
title: "Data transfer functionality in Reva"
linkTitle: "Data transfer functionality"
weight: 5
description: >
Data transfer functionality in Reva.
---

This is a guide on how to try the data transfer functionality in Reva in your local environment using rclone as the data transfer driver.

### Recap
A data transfer is initiated through an OCM share by setting the `protocol` to type `datatx`.

## Prerequisites
* Have an rclone instance running (see [Rclone setup](#1-rclone-setup) below).
* A mesh setup equal to the OCM share tutorial (see [Reva daemons setup](#2-reva-daemons-setup)).

## 1. Rclone setup
Use rclone version v1.61 or higher. Available at [https://rclone.org/](https://rclone.org/).
<br>The rclone server should be run with the `server-side-across-configs` flag set to `true` which will make HTTP Third Party Copy (TPC) transfers possible:
```
rclone -vv rcd --server-side-across-configs=true --rc-user=rclone --rc-pass=rclonesecret --rc-addr=0.0.0.0:5572
```
TPC allows for direct (ie. efficient) Reva to Reva transfers as opposed to streaming the data through rclone

## 2. Reva daemons setup
Follow the setup ([prerequisites](https://reva.link/docs/tutorials/share-tutorial/#prerequisites), [building](https://reva.link/docs/tutorials/share-tutorial/#2-build-reva), [running](https://reva.link/docs/tutorials/share-tutorial/#3-run-reva)) of the OCM share [tutorial](https://reva.link/docs/tutorials/share-tutorial/).

Use the [data transfer example config](https://github.com/cs3org/reva/blob/master/examples/datatx/datatx.toml) for the relevant settings to enable rclone driven data transfer.

At this point you should have a two Reva daemon setup between which we will establish a data transfer driven by rclone.

## 3. Create a datatx protocol type OCM share
(assume we are logged in as einstein on the first Reva instance and we have uploaded some data to the `/home/my-data` folder)
<br>The tutorial explains transfer between user einstein at cern and user marie at cesnet.

Creating a transfer is similar to creating a regular OCM share through the `ocm-share-create` command with the addition of the `-datatx` flag. The `-datatx` flag signifies that this is a data transfer.
<br>The `ocm-share-create` command makes (see example below), via an OCM share, the contents of folder `/home/my-data` available for transferring to the grantee.
<br>*Note that only a folder can be transferred!
```
>> ocm-share-create -grantee f7fbf8c8-139b-4376-b307-cf0a8c2d0d9c -idp cesnet.cz -transfer /home/my-data
+--------------------------------------+-----------------+--------------------------------------+--------------------------------------------------------------------------------------------+-------------------+-------------+--------------------------------------+--------------------------------+--------------------------------+
| # | OWNER.IDP | OWNER.OPAQUEID | RESOURCEID | TYPE | GRANTEE.IDP | GRANTEE.OPAQUEID | CREATED | UPDATED |
+--------------------------------------+-----------------+--------------------------------------+--------------------------------------------------------------------------------------------+-------------------+-------------+--------------------------------------+--------------------------------+--------------------------------+
| edc8f1c3-5f12-4430-8680-95b9034d6592 | cernbox.cern.ch | 4c510ada-c86b-4815-8820-42cdf82c3d51 | storage_id:"123e4567-e89b-12d3-a456-426655440000" opaque_id:"fileid-einstein%2Fmy-data" | GRANTEE_TYPE_USER | cesnet.cz | f7fbf8c8-139b-4376-b307-cf0a8c2d0d9c | 2023-04-11 11:52:08 +0200 CEST | 2023-04-11 11:52:08 +0200 CEST |
+--------------------------------------+-----------------+--------------------------------------+--------------------------------------------------------------------------------------------+-------------------+-------------+--------------------------------------+--------------------------------+--------------------------------+
```

## 4. Discovering the transfer
(assume we are logged in on the receiving Reva instance as marie)
<br>
<br>The grantee (ie. the receiver of the transfer) can now discover the transfer share and its details in the same way as with regular shares using the `ocm-share-list-received` command to obtain the share id, and subsequent `ocm-share-get-received` command using that share id:

```
>> ocm-share-list-received
+--------------------------------------+-----------------+--------------------------------------+-------------------------------------------------------------------------------+-------------------+-------------+--------------------------------------+--------------------------------+--------------------------------+---------------------+-----------------+
| # | OWNER.IDP | OWNER.OPAQUEID | RESOURCEID | TYPE | GRANTEE.IDP | GRANTEE.OPAQUEID | CREATED | UPDATED | STATE | SHARETYPE |
+--------------------------------------+-----------------+--------------------------------------+-------------------------------------------------------------------------------+-------------------+-------------+--------------------------------------+--------------------------------+--------------------------------+---------------------+-----------------+
| 79a2bf32-4bba-437a-ad8f-ec93211375b5 | cernbox.cern.ch | 4c510ada-c86b-4815-8820-42cdf82c3d51 | opaque_id:"123e4567-e89b-12d3-a456-426655440000:fileid-einstein%2Fmy-data" | GRANTEE_TYPE_USER | cesnet.cz | f7fbf8c8-139b-4376-b307-cf0a8c2d0d9c | 2023-04-11 11:52:08 +0200 CEST | 2023-04-11 11:52:08 +0200 CEST | SHARE_STATE_PENDING | SHARE_TYPE_USER |
+--------------------------------------+-----------------+--------------------------------------+-------------------------------------------------------------------------------+-------------------+-------------+--------------------------------------+--------------------------------+--------------------------------+---------------------+-----------------+
>> ocm-share-get-received 79a2bf32-4bba-437a-ad8f-ec93211375b5
{"id":{"opaqueId":"79a2bf32-4bba-437a-ad8f-ec93211375b5"}, "name":"my-data", "resourceId":{"opaqueId":"123e4567-e89b-12d3-a456-426655440000:fileid-einstein%2Fmy-data"}, "grantee":{"type":"GRANTEE_TYPE_USER", "userId":{"idp":"cesnet.cz", "opaqueId":"f7fbf8c8-139b-4376-b307-cf0a8c2d0d9c"}}, "owner":{"idp":"cernbox.cern.ch", "opaqueId":"4c510ada-c86b-4815-8820-42cdf82c3d51", "type":"USER_TYPE_FEDERATED"}, "creator":{"idp":"cernbox.cern.ch", "opaqueId":"4c510ada-c86b-4815-8820-42cdf82c3d51", "type":"USER_TYPE_FEDERATED"}, "ctime":{"seconds":"1683549473", "nanos":722800878}, "mtime":{"seconds":"1683549473", "nanos":722800878}, "shareType":"SHARE_TYPE_USER", "protocols":[{"transferOptions":{"sourceUri":"https://cernbox.cern.ch/remote.php/dav/ocm/IFs4ZVKVjp7OQsArvCSvXkf8A7emEQ71"}}], "state":"SHARE_STATE_PENDING", "resourceType":"RESOURCE_TYPE_CONTAINER"}
```
To start the transfer it must be accepted by the grantee.

## 4. Accepting the transfer by the grantee
The grantee (ie. the receiver of the transfer) must now accept the transfer by updating the `state` of the transfer to `accepted`. That will start the transfer. Optionally the grantee can also specify a path to which the data must be transferred:

```
>> ocm-share-update-received -state accepted -path /home/transfers 79a2bf32-4bba-437a-ad8f-ec93211375b5
OK
```
At this point the transfer should have started automatically. In the command example above the data will be transferred into the `/home/transfers` folder of the grantee. In this case the final resulting path will read `/home/transfer/my-data/`

If a path is not provided with the command the transfers will be written into the folder as set by the configuration property `data_transfers_folder` of the gateway as follows:
```
[grpc.services.gateway]
data_transfers_folder = "/home/MyTransfers"
```
Note that at least one of each must be provided but that the `path` command flag overrides the configuration setting (ie. per transfer).

## 4.1 Do over a transfer
In case the transfer has failed and it is not a driver (rclone) issue, or maybe you want to transfer to another folder, use these 2 steps:
<br>First update the share to `pending`:
```
ocm-share-get-received -state pending 79a2bf32-4bba-437a-ad8f-ec93211375b5
OK
```
Next accept the transfer, optionally with a different path:
```
>> ocm-share-update-received -state accepted -path /home/transfers-sec 79a2bf32-4bba-437a-ad8f-ec93211375b5
OK
```
Now the data will be transferred to the `/home/transfers-sec/my-data/` folder.

Whenever transfer shares are accepted corresponding transfer jobs will be created for them. These can be [managed](#5-managing-transfer-jobs).

## 5. Managing transfer jobs
The transfer driver creates a transfer job for each transfer. These jobs can be managed (request status, retried, cancelled). For this one must first discover the transfer id from the transfers list.

## 5.1 List transfers
List the transfers using the `transfer-list` command to discover their corresponding transfer id:

```
>> transfer-list
+--------------------------------------+--------------------------------------+
| SHAREID.OPAQUEID | ID.OPAQUEID |
+--------------------------------------+--------------------------------------+
| 2c55dc61-4a06-4f44-9478-78eb1243971b | 0f901f2c-a004-4126-b810-29bf51909035 |
| 1f5de8f0-5565-4694-8eca-f66e578783c8 | f0b3b410-0e39-4591-92f7-8e229650b3c7 |
| 79a2bf32-4bba-437a-ad8f-ec93211375b5 | fe671ae3-0fbf-4b06-b7df-32418c2ebfcb |
+--------------------------------------+--------------------------------------+
```
## 5.2 Show status transfer
Show the current status of a transfer using the `transfer-status` command. Possible transfer states are:
```
cancelled
cancel failed
complete
expired
failed
in progress
new
invalid
```
```
transfer-get-status -txId fe671ae3-0fbf-4b06-b7df-32418c2ebfcb
+--------------------------------------+--------------------------------------+--------------------------+-----------------------------------+
| SHAREID.OPAQUEID | ID.OPAQUEID | STATUS | CTIME |
+--------------------------------------+--------------------------------------+--------------------------+-----------------------------------+
| 79a2bf32-4bba-437a-ad8f-ec93211375b5 | fe671ae3-0fbf-4b06-b7df-32418c2ebfcb | STATUS_TRANSFER_COMPLETE | Mon May 8 12:38:08 +0000 UTC 2023 |
+--------------------------------------+--------------------------------------+--------------------------+-----------------------------------+
```


## 5.5 Retry transfer
Retry a transfer using the `transfer-retry` command with the transfer id specified. This should restart the transfer job and return the new status of the transfer:

```
transfer-retry -txId fe671ae3-0fbf-4b06-b7df-32418c2ebfcb
+--------------------------------------+--------------------------------------+---------------------+-------------------------------+
| SHAREID.OPAQUEID | ID.OPAQUEID | STATUS | CTIME |
+--------------------------------------+--------------------------------------+---------------------+-------------------------------+
| 79a2bf32-4bba-437a-ad8f-ec93211375b5 | fe671ae3-0fbf-4b06-b7df-32418c2ebfcb | STATUS_TRANSFER_NEW | 2023-05-08 12:41:07 +0000 UTC |
+--------------------------------------+--------------------------------------+---------------------+-------------------------------+
```
## 5.4 Cancel transfer
A running transfer (transfer state `in progress`) can be cancelled using the `transfer-cancel` command as follows:
```
transfer-retry -txId fe671ae3-0fbf-4b06-b7df-32418c2ebfcb
+--------------------------------------+--------------------------------------+---------------------+-------------------------------+
| SHAREID.OPAQUEID | ID.OPAQUEID | STATUS | CTIME |
+--------------------------------------+--------------------------------------+---------------------+-------------------------------+
| 79a2bf32-4bba-437a-ad8f-ec93211375b5 | fe671ae3-0fbf-4b06-b7df-32418c2ebfcb | STATUS_TRANSFER_CANCELLED | 2023-05-08 13:50:12 +0000 UTC |
+--------------------------------------+--------------------------------------+---------------------+-------------------------------+
```

## 6 Cleanup transfers
Transfers will be removed from the db using the `transfer-cancel` command when the configuration property `remove_on_cancel` of the datatx service has been set to `true` as follows:
```
[grpc.services.datatx]
remove_on_cancel = true
```
Currently this setting is recommended.

0 comments on commit 90ede7f

Please sign in to comment.