[Cheatsheet] Example: Add path to data directory #35

aminsaied · 2021-02-16T16:58:51Z

Question

I am using ScriptRunConfig and with command argument. This expects that all the paths passed are relative to datastore mount. Following is the code which correctly mounts the training_dataset and test_dataset.

How can we specify the paths to the folders? Dataset always excepts files and DataPath doesn't work. Couldn't find an example online.

ds = Datastore.register_azure_file_share(workspace=ws, 
                                         datastore_name='NAME', 
                                         file_share_name='NAME',
                                         account_name='NAME', 
                                         account_key='key',
                                         create_if_not_exists=True)
 
ds = Datastore.get(ws,'NAME')
print(f'found the data datasource     {ds}')
 
 
training_dataset = Dataset.File.from_files(path=(ds, 'train_path_tsv'))
test_dataset = Dataset.File.from_files(path=(ds, 'test_path_tsv'))
output_dir = DataPath(datastore=ds, path_on_datastore='output/')
model_dir = DataPath(datastore=ds, path_on_datastore='model_path')
 
config = ScriptRunConfig(source_directory='.',
                         command=['python',
                                  'script.py',
                                  '--config',
                                  "config.yaml",
                                  '--output',
                                  output_dir,
                                  '--overwrite',
                                  '',                                  
                                  '--data.train_set.CSV.data_files',
                                  training_dataset.as_mount(),
                                   '--data.eval_set.CSV.data_files',
                                  test_dataset.as_mount(),
                                  '--model.model_name_or_path',
                                  model_dir],
                         compute_target=compute_target,
                         environment=env)

Potential answer

from azureml.pipeline.core import PipelineData
        output_dir = PipelineData(
            name="output_dir",
            datastore=pipeline_datastore,
            pipeline_output_name="output_dir",
            is_directory=True,
        )

The text was updated successfully, but these errors were encountered:

aminsaied · 2021-06-10T18:54:09Z

Two things:

We should have an example of using multiple files in a dataset
and an OutputDatasetConsumptionConfig example for writing outputs (not using DataPath)

parimalak · 2022-03-07T14:29:52Z

@aminsaied did you found solution to your problem? i am having issue with multiple inputs with data path.

aminsaied · 2022-03-07T15:14:34Z

It's recommended to use datasets to provide inputs. It's possible to provide a path to a directory (or indeed to multiple files) in that way.

aminsaied added aml-ds cheatsheet labels Feb 16, 2021

aminsaied changed the title ~~[Cheatsheet]~~ [Cheatsheet] Example: Add path to data directory Feb 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cheatsheet] Example: Add path to data directory #35

[Cheatsheet] Example: Add path to data directory #35

aminsaied commented Feb 16, 2021

aminsaied commented Jun 10, 2021

parimalak commented Mar 7, 2022

aminsaied commented Mar 7, 2022

[Cheatsheet] Example: Add path to data directory #35

[Cheatsheet] Example: Add path to data directory #35

Comments

aminsaied commented Feb 16, 2021

Question

Potential answer

aminsaied commented Jun 10, 2021

parimalak commented Mar 7, 2022

aminsaied commented Mar 7, 2022