Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"datastore.upload_files" is deprecated after version 1.0.69 #1672

Open
urasandesu opened this issue Jan 23, 2022 · 6 comments
Open

"datastore.upload_files" is deprecated after version 1.0.69 #1672

urasandesu opened this issue Jan 23, 2022 · 6 comments

Comments

@urasandesu
Copy link

In azureml-core 1.37.0.post1, we have gotten the warning message like the below:

"datastore.upload_files" is deprecated after version 1.0.69. Please use "FileDatasetFactory.upload_directory" instead. See Dataset API change notice at https://aka.ms/dataset-deprecation.

It seems that the URL points to this page, but there is no information for migration.

datastore.upload_files could specify files as a list explicitly even if the directory contains the file we don't want to upload like credential data, the file which contains personal data before processing, and so on.

How to use FileDatasetFactory.upload_directory as same as datastore.upload_files?

@urasandesu
Copy link
Author

I tried some codes and my understanding is the following.

If the code uses datastore.upload_files as the below... :

file = './train.csv'
now = datetime.now(timezone('UTC'))
target_path = 'UI/' + now.strftime('%m-%d-%Y_%H%M%S_UTC')

default_datastore.upload_files([file], target_path=target_path, overwrite=True)

Then use FileDatasetFactory.upload_directory instead as the below :

file = './train.csv'
now = datetime.now(timezone('UTC'))
target_path = 'UI/' + now.strftime('%m-%d-%Y_%H%M%S_UTC')

Dataset.File.upload_directory('./', (default_datastore, target_path), pattern=file, overwrite=True)
# NOTE: In the parameter `pattern`, it seems that the string current directory indicates('./') is mandatory.

Is this correct?

@thomassantosh
Copy link

+1, would be curious how this is implemented.

@chengyu-liu-cs
Copy link

+1, having the same warning. Looking forward to seeing a replacement solution.

@maciejskorski
Copy link

maciejskorski commented Mar 14, 2022

Use Dataset.File.upload_directory, documented here. Here is a full example:

# configure Azure storage
ws = Workspace.from_config()
dstore = ws.datastores.get('your datastore')
dstore_path = 'relative datastore path'
target = (dstore,dstore_path)

# write to Azure storage
with tempfile.TemporaryDirectory() as tmpdir:
    df.to_parquet(f'{tmpdir}/df.parquet')
    ds=Dataset.File.upload_directory(tmpdir,target,overwrite=True)

@RWilsker
Copy link

Super helpful example code. Just what I needed for how to specify a target folder in the datastore. Thank you.

@mhaythornthwaite
Copy link

mhaythornthwaite commented Jun 4, 2024

+1 to this question.

Still having issues with many of the solutions posted above, currently getting the following error with azureml-dataprep == 5.1.6 installed. I've tried going back to azureml-dataprep == 5.1.0 but still face the same error. If I try to roll back the package any further I run into compatibility issues with my installations of azureml-fsspec == 1.3.1 and mltable == 1.6.1.

NotImplementedError: _path_to_get_files_block is no longer supported. 
Deprecated, downgrade to a previous version of azureml-dataprep.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants