You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
if device.type != 'cpu' and torch.cuda.device_count() > 1 and torch.distributed.is_available():
# add: model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model)
dist.init_process_group(backend='nccl', # distributed backend
init_method='tcp://127.0.0.1:9999', # init method
world_size=1, # number of nodes
rank=0) # node rank
model = torch.nn.parallel.DistributedDataParallel(model,find_unused_parameters=True)
Motivation
I found this when I am using an efficientDet model , they have this feature.
Pitch
it seems better using this, but I didn't do an ablation test.
Alternatives
Additional context
The text was updated successfully, but these errors were encountered:
@AlexWang1900 thanks for the suggestion. We've done a recent PR #401 that introduced much more multi-gpu functionality including syncbatchnorm using python train.py --sync
The dev work there is still ongoing though and may change significantly again soon to introduce an mp.spawn based approach.
馃殌 Feature
in train.py:
add:
Motivation
I found this when I am using an efficientDet model , they have this feature.
Pitch
it seems better using this, but I didn't do an ablation test.
Alternatives
Additional context
The text was updated successfully, but these errors were encountered: