Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch uploads error using walk #99

Open
CEBerndsen opened this issue Mar 4, 2019 · 8 comments
Open

batch uploads error using walk #99

CEBerndsen opened this issue Mar 4, 2019 · 8 comments

Comments

@CEBerndsen
Copy link

When using the osf_upload function in combination with purrr::walk I received an inconsistent error. I could upload ~50 files with no problems.

Later when trying the same basic code with a much larger directory (~1000) files only 700 files uploaded before I received this message:

Error:
Encountered an unexpected error with the OSF API
Please report this at https://github.com/aaronwolen/osfr/issues

Code that failed with error above:

  osf_retrieve_node("3r7nw") %>%
  osf_mkdir(., path = "2019-3-4 tetramer with amylose in KCl") %>%
  walk(files, osf_upload, x = .)

I adjusted the code figuring it was a timeout issue and tried to complete the upload with:

 osf_retrieve_node("3r7nw") %>%
 osf_ls_files() %>%
 filter(name == "2019-3-4 tetramer with amylose in KCl") %>%
 walk(files, osf_upload, x = ., overwrite = TRUE)

and got this error:

Error: Internal Server Error
HTTP status code 500.

Note: overwrite = FALSE failed to work, which is why overwrite is set to TRUE

As I stated originally, the same basic code worked for 50 files, but larger uploads failed to fully complete.

Have enjoyed using the package and this won't stop me from using it, just 1000 files is a standard size project for me so batch uploads without having to use the web interface is really useful.

@aaronwolen
Copy link
Member

aaronwolen commented Mar 5, 2019

Thanks for reporting.

The latest version on GitHub (v0.2.3) will re-attempt the request a few times if the API throws a 500 error. Can you try updating and see if the problem persists?

I haven't tried uploading that many files before so I'll do a little testing on my end as well.

@CEBerndsen
Copy link
Author

Updated to v0.2.3 and tried uploading again two ways. In first attempt, tried to simply use the update in batch code and got this error:

osf_retrieve_node("3r7nw") %>%
 osf_ls_files() %>%
 filter(name == "2019-3-4 tetramer with amylose in KCl") %>%
 walk(files, osf_upload, x = ., overwrite = TRUE)

Error in data.matrix(data) :
(list) object cannot be coerced to type 'double'
In addition: Warning messages:
1: In data.matrix(data) : NAs introduced by coercion
2: In data.matrix(data) : NAs introduced by coercion

So I then deleted the folder via the web interface and tried the first code from above which makes the directory and then uploads files to it. Got the 500 error again and only 400 files uploaded.

osf_retrieve_node("3r7nw") %>%
  osf_mkdir(., path = "2019-3-4 tetramer with amylose in KCl") %>%
  walk(files, osf_upload, x = .)

Error: Internal Server Error
HTTP status code 500.

Let me know if I can try other approaches and help. Thanks!

@aaronwolen
Copy link
Member

Thanks. I ran a couple of tests that attempted to upload 1500 files and was able to reproduce the same error. Unfortunately, sometimes it worked and sometimes it failed. I'm going to leave this open for now. These HTTP codes > 500 correspond to "unexpected errors" on the server, so we may need to loop in one of the OSF devs to ultimately solve it. In the meantime, this highlighted some inefficiencies in osf_upload() that may partially address the issue by reducing the number of API calls made.

@brianjgeiger
Copy link

Are many of these files relatively small? Like in the "seconds or less to upload" size?

@CEBerndsen
Copy link
Author

CEBerndsen commented Jun 5, 2019 via email

@aaronwolen
Copy link
Member

Hi @brianjgeiger, thanks for checking into this. I used hundreds of small text files in my testing. Are you thinking it's rate limiting issue?

@brianjgeiger
Copy link

Hi, @aaronwolen, no, I think it's because we have an inefficiency or two on capturing provenance data for file uploads, and it's causing the thread to eventually time out. It should be fixed in an upcoming version, but I don't have a date on that yet. But slowing down the requests will definitely keep you from seeing the error.

@aaronwolen
Copy link
Member

Thanks for the info.

It should be fixed in an upcoming version

Is there a relevant PR or Issue I can monitor to determine when it's fixed?

In the meantime, do you have recommendations for parameters I should use to moderate requests (eg, delay x seconds for every x files)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants