Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cheatsheet] Example: Add path to data directory #35

Open
aminsaied opened this issue Feb 16, 2021 · 3 comments
Open

[Cheatsheet] Example: Add path to data directory #35

aminsaied opened this issue Feb 16, 2021 · 3 comments

Comments

@aminsaied
Copy link
Collaborator

Question

I am using ScriptRunConfig and with command argument. This expects that all the paths passed are relative to datastore mount. Following is the code which correctly mounts the training_dataset and test_dataset.

How can we specify the paths to the folders? Dataset always excepts files and DataPath doesn't work. Couldn't find an example online.

ds = Datastore.register_azure_file_share(workspace=ws, 
                                         datastore_name='NAME', 
                                         file_share_name='NAME',
                                         account_name='NAME', 
                                         account_key='key',
                                         create_if_not_exists=True)
 
ds = Datastore.get(ws,'NAME')
print(f'found the data datasource     {ds}')
 
 
training_dataset = Dataset.File.from_files(path=(ds, 'train_path_tsv'))
test_dataset = Dataset.File.from_files(path=(ds, 'test_path_tsv'))
output_dir = DataPath(datastore=ds, path_on_datastore='output/')
model_dir = DataPath(datastore=ds, path_on_datastore='model_path')
 
config = ScriptRunConfig(source_directory='.',
                         command=['python',
                                  'script.py',
                                  '--config',
                                  "config.yaml",
                                  '--output',
                                  output_dir,
                                  '--overwrite',
                                  '',                                  
                                  '--data.train_set.CSV.data_files',
                                  training_dataset.as_mount(),
                                   '--data.eval_set.CSV.data_files',
                                  test_dataset.as_mount(),
                                  '--model.model_name_or_path',
                                  model_dir],
                         compute_target=compute_target,
                         environment=env)

Potential answer

from azureml.pipeline.core import PipelineData
        output_dir = PipelineData(
            name="output_dir",
            datastore=pipeline_datastore,
            pipeline_output_name="output_dir",
            is_directory=True,
        )
@aminsaied aminsaied changed the title [Cheatsheet] [Cheatsheet] Example: Add path to data directory Feb 16, 2021
@aminsaied
Copy link
Collaborator Author

Two things:

  • We should have an example of using multiple files in a dataset
  • and an OutputDatasetConsumptionConfig example for writing outputs (not using DataPath)

@parimalak
Copy link

@aminsaied did you found solution to your problem? i am having issue with multiple inputs with data path.

@aminsaied
Copy link
Collaborator Author

It's recommended to use datasets to provide inputs. It's possible to provide a path to a directory (or indeed to multiple files) in that way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants