Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gfal-copy transfers fail from srm+gsiftp to srm+https #15

Open
andrearendina opened this issue Jul 12, 2023 · 10 comments
Open

gfal-copy transfers fail from srm+gsiftp to srm+https #15

andrearendina opened this issue Jul 12, 2023 · 10 comments

Comments

@andrearendina
Copy link

Dear all,

at INFN-T1 we are experiencing the following behaviour with gfal 2.21.4.
Issuing a third-party gfal-copy from srm+gsiftp to srm+https a strange error occurs and the transfer fails, for example:

gfal-copy -vvv srm://storm-test.cr.cnaf.infn.it:8444/folder/test-andre srm://storm-test.cr.cnaf.infn.it:8444/disk/test-andre
[...]
WARNING  Transfer failed with: [srm_do_transfer][gfalt_copy_file][perform_copy][perform_local_copy][streamed_copy][gfal_plugin_writeG][davix2gliberr] Impossible to write, no buffer. (file was opened only for reading?)

After a deep investigation, we have figured out that the problem is probably related to the creation of an https TURL triggered by the srm request. Apparently, with an https TURL like this one, gfal tries to perform the transfer issuing only HEAD and POST commands.

131.154.161.121 8443 "DC=org,DC=terena,DC=tcs,C=IT,O=Istituto Nazionale di Fisica Nucleare,CN=Andrea Rendina arendina@infn.it" 2023-07-12T12:44:35.915Z "d914e441-b1eb-4675-a196-996a8f9243b6" "POST /disk/test-andre HTTP/1.1" 200 959 29
131.154.161.121 8443 "DC=org,DC=terena,DC=tcs,C=IT,O=Istituto Nazionale di Fisica Nucleare,CN=Andrea Rendina arendina@infn.it" 2023-07-12T12:44:36.142Z "996d9883-0fb6-4c26-8036-8aa81ed89592" "HEAD /disk/test-andre HTTP/1.1" 200 0 133

Indeed when we try to perform the same transfer using the davs/https protocol as destination

gfal-copy -vvv srm://storm-test.cr.cnaf.infn.it:8444/folder/test-andre davs://transfer-test.cr.cnaf.infn.it:8443/disk/test-andre

gfal issues also a PUT command and the transfer succeeds.

We have noticed the same thing directly using the protocols gsiftp -> davs/https without involving srm:

gfal-copy -v gsiftp://transfer-test.cr.cnaf.infn.it:2811//storage/gemss_test1/dteam/folder/test-andre davs://transfer-test.cr.cnaf.infn.it:8443/disk/test-andre

Why in an srm+gsiftp -> srm+https transfer the PUT command is not perfomed by gfal?
Is this expected?

For reference, we are investigating this issue also with the StoRM developers:
https://issues.infn.it/jira/browse/STOR-1569

Please, correct me if I am wrong or I made any mistake.
Thank you very much for your help!

Andrea

@mpatrascoiu
Copy link
Contributor

Hello Andrea,

I see you are using the -vvv option. Can you send the debug output into a file and send it to us? (e-mail it to dmc-devel@cern.ch as to not upload the file here, given it may contain secrets)

$ gfal-copy -vvv --log-file=gfal2.log <src> <dst>

@mpatrascoiu
Copy link
Contributor

Hello,

I've managed to reproduce this problem and here are my findings:

After the SRM PUT, the HTTPS TURL that Gfal2 receives shows that the destination file already exists (with size 0) on the HTTP storage. This causes problems for Gfal2 (and underlying Davix library), which wants the file to not exist in order to perform an upload. This is also behind the cryptic error message:

Transfer failed with: Impossible to write, no buffer. (file was opened only for reading?)

Example:

  1. Checking manually that the file doesn't exist
$ davix-http -X HEAD --trace header --cert /tmp/x509up_u0 --insecure https://transfer-test.cr.cnaf.infn.it:8443/disk/test-mipatras
> HEAD /disk/test-mipatras HTTP/1.1

< HTTP/1.1 404 Not Found
  1. The same stat requests, but happening after the SRM PUT
$ gfal-copy -vv --force gsiftp://transfer-test.cr.cnaf.infn.it:2811//storage/gemss_test1/dteam/folder/test-andre srm://storm-test.cr.cnaf.infn.it:8444/disk/test-mipatras
...
INFO     Davix: > HEAD /disk/test-mipatras HTTP/1.1

INFO     Davix: < HTTP/1.1 200 OK
...

It looks to me that StoRM creates a 0-size file when the SRM PUT is done.
Can you confirm?

Cheers,
Mihai

@andrearendina
Copy link
Author

Hello Mihai,

I confirm that StoRM creates a 0-size file before performing the real transfer.
Actually, I guess a transfer from srm+https to srm+https succeeds in overwriting the 0-size file because a PUT command is issued by gfal. Is this correct?

Cheers,
Andrea

@enricovianello
Copy link

Hi all,

I understood that:

  • with a http|https TURL (returned after a srmPtP with http|https specified as transfer protocol) gfal (using davix library below) cannot assume it's a WebDAV endpoint and tries a HEAD (that finds a file) + POST (which fails because POST is not admitted by WebDAV)
    • using --force lets gfal do a PUT instead of a POST (?)
  • with a dav|davs TURL (not returned by our StoRM srmPtP) gfal issues a HEAD (that finds a file) + PUT instead of a POST because it "recognises" that is a WebDAV TURL (no need to --force the request at gfal side)

is this correct?

If yes I think that from our side (StoRM side) we could work to return a dav TURL if it's explicitly requested as transfer protocol. I wrote only "dav" because I cannot find "davs" into the official IANA registered schemes https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml.
But this could be enough I think.

Cheers,
Enrico

@andrearendina
Copy link
Author

Hi,

unfortunately, adding the --force option the transfer fails anyway:

gfal-copy -f srm://storm-test.cr.cnaf.infn.it:8444/folder/test-andre srm://storm-test.cr.cnaf.infn.it:8444/disk
Copying srm://storm-test.cr.cnaf.infn.it:8444/folder/test-andre   [FAILED]  after 6s
gfal-copy error: 5 (Input/output error) - Impossible to write, no buffer. (file was opened only for reading?)

Cheers,
Andrea

@mpatrascoiu
Copy link
Contributor

Hello both,

@enricovianello I would view the "dav://" and "davs://" protocol schemas used in the Grid world more as "HTTP Grid Storage" then actual "Webdav" (as in the IANA document)

For the problem at hand, I have to explain a bit on how Gfal2 works with SRM:
SRM is not a data/transfer protocol. It is used for metadata operations, but when it comes to reading/writing data, one has to ask the SRM server for a TransferURL (TURL) or Transfer URL 3rd Party (TURL_3RD_PARTY).

When you instruct Gfal2 to copy involving SRM, Gfal2 will ask the SRM server for the TURL_3RD_PARTY, then perform the copy again with the resolved URL replaced. Example:

gfal2.copy(<SRM_src>, <SRM_dst>)
   -- gfal2.resolve_SRM_TURL(<SRM_src>) --> GridFTP_src
       -- gfal2.resolve_SRM_TURL(<SRM_dst>) --> HTTPS_dst
gfal2.copy(<GridFTP_src>, <HTTPS_dst>)

So far, I see 3 scenarios here:

  1. gfal2.copy(<SRM_src>, <SRM_dst>)
  2. gfal2.copy(<GridFTP_src>, <SRM_dst>)
  3. gfal2.copy(<HTTPS_src>, <SRM_dst>)

Also important to note:

  • <SRM_src> will resolve to <GridFTP_src> (so scenarios 1. and 2. can be treated the same)
  • <SRM_dst> will resolve to an HTTPS destination URL

In scenarios 1. / 2., we are dealing with what we call a protocol translation transfer: GridFTP --> HTTPS.
In scenario 3., we handle same-protocol transfer: HTTPS --> HTTPS.

Gfal2 offers much better support for same protocol transfers. It allows us to do TPC (ThirdPartyCopy) and even if we have to stream the data, there are optimizations that can be done. For the protocol translation case, we are forced to read segments from source (via one protocol) and write them to the destination (via the other protocol).

Scenario 3. works, as it boils down to an HTTPS --> HTTPS transfer. I've tested both TPC and streaming and they both work.
Scenarios 1. / 2. don't work as Gfal2 ends up doing a GridFTP --> HTTPS copy. Here, Gfal2 has to do protocol translation and it will do the following steps:

  1. Check that the GridFTP source exists
  2. Check that the HTTPs destination doesn't exist
    a. Gfal2 expects the destination file to not exist so it can start the upload
    b. Gfal2 encounters an existing file at the destination (0-size file)
    c. Gfal2 is forced to open the destination file only with Read permissions
  3. When attempting to write, we hit the following error: Impossible to write, no buffer. (file was opened only for reading?)

What's the reason for SRM creating the 0-size file on the SRM PrepareToPut call?
I've tested this behavior with dCache SRM as well, where the 0-size file does not get created.

Cheers,
Mihai

@mpatrascoiu
Copy link
Contributor

Ah, something else to add. The HTTP POST calls are to request macaroon tokens:

INFO     Davix: > POST /disk/test-mipatras HTTP/1.1
> Content-Type: application/macaroon-request

They don't interfere with why the copy operation fails.

If you want, you can disable them via the RETRIEVE_BEARER_TOKEN=false configuration, either via the Gfal2 config file (/etc/gfal2.d/http_plugin.conf) or the command line:

$ gfal-copy -D"HTTP PLUGIN:RETRIEVE_BEARER_TOKEN=false" <src> <dst>

@andrearendina
Copy link
Author

Hi Mihai,

thank you very much for the clear explanation.

StoRM creates a 0-size file with the proper ACL in order to let GridFTP overwrite it with the right permissions.
However, we noticed that in the third-party copies from srm+https to srm+https, or obviously from srm+gsiftp to srm+gsiftp, StoRM/Gfal2 check the (non-)existence of the destination file issuing an srmLs command, without involving WebDAV.

On the other hand, if we try to perform a TPC from srm+gsfitp to srm+https, we respectively see into the StoRM backend and StoRM WebDAV logs both the srmLs check and the HEAD request:

11:23:15.170 - INFO [xmlrpc-27] - srmLs: user </DC=org/DC=terena/DC=tcs/C=IT/O=Istituto Nazionale di Fisica Nucleare/CN=Andrea Rendina arendina@infn.it> Request for [SURL: [srm://storm-test.cr.cnaf.infn.it/disk/test-andre-dst]] failed with: [status: SRM_FAILURE: All requests failed]

131.154.161.121 8443 "DC=org,DC=terena,DC=tcs,C=IT,O=Istituto Nazionale di Fisica Nucleare,CN=Andrea Rendina arendina@infn.it" 2023-07-18T09:23:19.733Z "2befed3e-2456-4720-999c-3bb734806e8a" "HEAD /disk/test-andre-dst HTTP/1.1" 200 0 69

So it seems that in a mixed TPC like this one Gfal2 checks that the HTTPs destination doesn't exist twice.
Is this expected?

Cheers,
Andrea

@mpatrascoiu
Copy link
Contributor

mpatrascoiu commented Jul 19, 2023

Hello Andrea,

Yes, it checks twice, initially at the SRM level. At this point, the destination file does not exist. Then Gfal2 resolves the SRM TURL into an https URL. Gfal2 will initiate the copy involving the https destination URL. In the copy, the destination file is checked again (using the https URL) and the 0-size file is found, which is why the operation stops.

  1. Is there a reason why the 0-size file is created?
  2. If you move directly srm+https --> srm+https, you won't have this problem. It's only the protocol translation transfers that face this (eg.: srm+gsiftp --> srm+https). Any chance of moving directly to srm+https --> srm+https?

@andrearendina
Copy link
Author

Hello Mihai,

sorry for the late reply. As I explained in a previuos comment, StoRM creates a 0-size file with the proper ACL in order to let GridFTP overwrite it with the right permissions. In fact, a user with the own DN is mapped into an account which must be able to write.

Unfortunately, we cannot move directely from srm+gsiftp --> srm+https to srm+https --> srm+https (other sites could still use srm+gsiftp).
So, If the gfal double check (srmLs+HEAD) cannot be disabled, a possible solution is to remove the 0-size file creation via srm by StoRM only if on the destination GridFTP is not enabled, because otherwise the srm+gsiftp --> srm+gsiftp TPCs would fail.

Cheers,
Andrea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants