Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

分布式训练运行指令问题 #33

Open
1835969208 opened this issue Aug 4, 2024 · 1 comment
Open

分布式训练运行指令问题 #33

1835969208 opened this issue Aug 4, 2024 · 1 comment

Comments

@1835969208
Copy link

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8
--nnodes=1 --master_port=10001 --master_addr = [server ip] main_pretrain.py
--backbone 'resnet50' --decoder 'upernet'
--datasets 'sota' 'sior' 'fast'
--batch_size 12 --batch_size_val 12 --workers 8
--save_path '[SEP model save path]'
--distributed 'True' --end_iter 80000
--image_size 224 --init_backbone 'imp'
作者大人,这个分布式预训练运行脚本是一机多卡式还是多机多卡式的。如果我想只用一个gpu单卡运行,可以吗?需要对main_pretrain.py文件进行修改吗?

@DotWang
Copy link
Collaborator

DotWang commented Aug 4, 2024

@1835969208 你给的这个例子是一机多卡,当然可以

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants