New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

单卡训练问题 #99

Open

hongtaofly opened this issue Dec 4, 2021 · 1 comment

hongtaofly commented Dec 4, 2021

首先在Ocean下问siamban训练的问题有点冒昧，但确实经验不足，忘见谅！
由于整个实验室共用两个卡，而其他同门的任务默认卡0，导致出现卡0显存已满，卡1还有很多显存的情况。而算法在第11个epoch开始解冻backbone时，显存溢出，所以请问siamban单卡训练时的训练命令是怎样的？

原多卡训练命令：
CUDA_VISIBLE_DEVICES=0,1
python -m torch.distributed.launch
--nproc_per_node=2
--master_port=2333
../../tools/train.py --cfg config.yaml

Contributor

JudasDie commented Dec 20, 2021

首先在Ocean下问siamban训练的问题有点冒昧，但确实经验不足，忘见谅！由于整个实验室共用两个卡，而其他同门的任务默认卡0，导致出现卡0显存已满，卡1还有很多显存的情况。而算法在第11个epoch开始解冻backbone时，显存溢出，所以请问siamban单卡训练时的训练命令是怎样的？

原多卡训练命令： CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=2333 ../../tools/train.py --cfg config.yaml

抱歉没有用过siamban的code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment