Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Support aml #2615

Merged
merged 109 commits into from
Jul 1, 2020
Merged

Support aml #2615

merged 109 commits into from
Jul 1, 2020

Conversation

SparkSnail
Copy link
Contributor

@SparkSnail SparkSnail commented Jun 30, 2020

No description provided.

SparkSnail and others added 30 commits May 29, 2020 17:02
1. rename storage file name
2. add more log on status changes
3. change isEnd to isAlive for better naming
add internal prefix for internal storage methods for clear usage.
fix pylint errors
minor fixes
rename methods of storageService
move trial to a seperated file
fix some bugs.
fix openPAI breaking changes
fix minor bugs
 to router training service for better understanding.
trialService is used to support different submission types like AML.
TrialDispatcher is easier to understand it's purpose.
}

export class AMLEnvironmentInformation extends EnvironmentInformation {
public amlClient?: AMLClient;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad indentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

trial:
command: python3 mnist.py
codeDir: .
computeTarget: ussc40rscl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace it with a placeholder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

}
const amlEnvironment: AMLEnvironmentInformation = environment as AMLEnvironmentInformation;
const environmentLocalTempFolder = path.join(this.experimentRootDir, this.experimentId, "environment-temp");
environment.command = `import os\nos.system('${amlEnvironment.command}')`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to escape special characters like ' here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need to process this command, this is environment command, not trial's command here.

command: python3 mnist.py
codeDir: .
computeTarget: ussc40rscl
nodeCount: 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each trial will use one node, i.e., all 8 GPUs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

Compared with [LocalMode](LocalMode.md) trial configuration in aml mode have these additional keys:
* computeTarget
* required key. The computer cluster name you want to use in your AML workspace.
* nodeCount
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think nodeCount can default to 1 because multi-machine runs are seldom used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, perhaps hide this variable is better, has removed.

command: python3 mnist.py
codeDir: .
computeTarget: ussc40rscl
nodeCount: 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is docker image?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, missed this variable in doc.

tools/nni_trial_tool/trial_runner.py Outdated Show resolved Hide resolved
@@ -58,6 +59,8 @@ class TrialDispatcher implements TrainingService {
this.environments = new Map<string, EnvironmentInformation>();
this.metricsEmitter = new EventEmitter();
this.experimentId = getExperimentId();
this.experimentRootDir = getExperimentRootDir();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be changed to a local variable, as it's used only once in run.

computeTarget: ussc40rscl
nodeCount: 1
computeTarget: ${replace_to_your_computeTarget}
image: msranni/nni
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is aml installed in this image?

@SparkSnail SparkSnail merged commit 93f96d4 into microsoft:master Jul 1, 2020
@liuzhe-lz liuzhe-lz mentioned this pull request Jul 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants