Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

单卡训练问题 #99

Open
hongtaofly opened this issue Dec 4, 2021 · 1 comment
Open

单卡训练问题 #99

hongtaofly opened this issue Dec 4, 2021 · 1 comment

Comments

@hongtaofly
Copy link

首先在Ocean下问siamban训练的问题有点冒昧,但确实经验不足,忘见谅!
由于整个实验室共用两个卡,而其他同门的任务默认卡0,导致出现卡0显存已满,卡1还有很多显存的情况。而算法在第11个epoch开始解冻backbone时,显存溢出,所以请问siamban单卡训练时的训练命令是怎样的?

原多卡训练命令:
CUDA_VISIBLE_DEVICES=0,1
python -m torch.distributed.launch
--nproc_per_node=2
--master_port=2333
../../tools/train.py --cfg config.yaml

@JudasDie
Copy link
Contributor

首先在Ocean下问siamban训练的问题有点冒昧,但确实经验不足,忘见谅! 由于整个实验室共用两个卡,而其他同门的任务默认卡0,导致出现卡0显存已满,卡1还有很多显存的情况。而算法在第11个epoch开始解冻backbone时,显存溢出,所以请问siamban单卡训练时的训练命令是怎样的?

原多卡训练命令: CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=2333 ../../tools/train.py --cfg config.yaml

抱歉没有用过siamban的code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants