-
Notifications
You must be signed in to change notification settings - Fork 340
/
300_act3_mobilenetv3_small.log
6108 lines (6108 loc) · 724 KB
/
300_act3_mobilenetv3_small.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
WARNING:__main__:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
| distributed init (rank 7): env://, gpu 7
| distributed init (rank 0): env://, gpu 0
| distributed init (rank 5): env://, gpu 5
| distributed init (rank 1): env://, gpu 1
| distributed init (rank 3): env://, gpu 3
| distributed init (rank 4): env://, gpu 4
| distributed init (rank 2): env://, gpu 2
| distributed init (rank 6): env://, gpu 6
Namespace(aa='rand-m9-mstd0.5-inc1', auto_resume=True, batch_size=256, clip_grad=None, color_jitter=0.4, crop_pct=None, cutmix=0.0, cutmix_minmax=None, data_path='/data/benchmarks/ILSVRC2012_LMDB', data_set='IMNET_LMDB', device='cuda', disable_eval=False, dist_backend='nccl', dist_eval=True, dist_on_itp=False, dist_url='env://', distributed=True, drop_path=0.2, enable_wandb=False, epochs=300, eval=False, eval_data_path=None, finetune='', gpu=0, head_init_scale=1.0, imagenet_default_mean_and_std=True, input_size=224, layer_decay=1.0, layer_scale_init_value=1e-06, local_rank=-1, log_dir=None, lr=0.004, min_lr=1e-06, mixup=0.0, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='convnext_tiny', model_ema=False, model_ema_decay=0.9999, model_ema_eval=False, model_ema_force_cpu=False, model_key='model|module', model_prefix='', momentum=0.9, nb_classes=1000, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='./checkpoint', pin_mem=True, project='convnext', rank=0, recount=1, remode='pixel', reprob=0.25, resplit=False, resume='', save_ckpt=True, save_ckpt_freq=1, save_ckpt_num=3, seed=0, smoothing=0.1, start_epoch=0, train_interpolation='bicubic', update_freq=2, use_amp=True, wandb_ckpt=False, warmup_epochs=20, warmup_steps=-1, weight_decay=0.05, weight_decay_end=None, world_size=8)
Transform =
RandomResizedCropAndInterpolation(size=(224, 224), scale=(0.08, 1.0), ratio=(0.75, 1.3333), interpolation=bicubic)
RandomHorizontalFlip(p=0.5)
RandAugment(n=2, ops=
AugmentOp(name=AutoContrast, p=0.5, m=9, mstd=0.5)
AugmentOp(name=Equalize, p=0.5, m=9, mstd=0.5)
AugmentOp(name=Invert, p=0.5, m=9, mstd=0.5)
AugmentOp(name=Rotate, p=0.5, m=9, mstd=0.5)
AugmentOp(name=PosterizeIncreasing, p=0.5, m=9, mstd=0.5)
AugmentOp(name=SolarizeIncreasing, p=0.5, m=9, mstd=0.5)
AugmentOp(name=SolarizeAdd, p=0.5, m=9, mstd=0.5)
AugmentOp(name=ColorIncreasing, p=0.5, m=9, mstd=0.5)
AugmentOp(name=ContrastIncreasing, p=0.5, m=9, mstd=0.5)
AugmentOp(name=BrightnessIncreasing, p=0.5, m=9, mstd=0.5)
AugmentOp(name=SharpnessIncreasing, p=0.5, m=9, mstd=0.5)
AugmentOp(name=ShearX, p=0.5, m=9, mstd=0.5)
AugmentOp(name=ShearY, p=0.5, m=9, mstd=0.5)
AugmentOp(name=TranslateXRel, p=0.5, m=9, mstd=0.5)
AugmentOp(name=TranslateYRel, p=0.5, m=9, mstd=0.5))
ToTensor()
Normalize(mean=tensor([0.4850, 0.4560, 0.4060]), std=tensor([0.2290, 0.2240, 0.2250]))
RandomErasing(p=0.25, mode=pixel, count=(1, 1))
---------------------------
reading from datapath /data/benchmarks/ILSVRC2012_LMDB
Number of the class = 1000
Transform =
Resize(size=256, interpolation=bicubic, max_size=None, antialias=None)
CenterCrop(size=(224, 224))
ToTensor()
Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
---------------------------
reading from datapath /data/benchmarks/ILSVRC2012_LMDB
Number of the class = 1000
Sampler_train = <torch.utils.data.distributed.DistributedSampler object at 0x7f6084969c40>
Model = MobileNetV3_Small(
(conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(hs1): Hardswish()
(bneck): Sequential(
(0): Block(
(conv1): Conv2d(16, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): ReLU(inplace=True)
(conv2): Conv2d(16, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=16, bias=False)
(bn2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): ReLU(inplace=True)
(se): SeModule(
(se): Sequential(
(0): AdaptiveAvgPool2d(output_size=1)
(1): Conv2d(16, 8, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
(4): Conv2d(8, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
(5): Hardsigmoid()
)
)
(conv3): Conv2d(16, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): ReLU(inplace=True)
(skip): Sequential(
(0): Conv2d(16, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=16, bias=False)
(1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Block(
(conv1): Conv2d(16, 72, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(72, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): ReLU(inplace=True)
(conv2): Conv2d(72, 72, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=72, bias=False)
(bn2): BatchNorm2d(72, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): ReLU(inplace=True)
(se): Identity()
(conv3): Conv2d(72, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): ReLU(inplace=True)
(skip): Sequential(
(0): Conv2d(16, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=16, bias=False)
(1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): Conv2d(16, 24, kernel_size=(1, 1), stride=(1, 1))
(3): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(2): Block(
(conv1): Conv2d(24, 88, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(88, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): ReLU(inplace=True)
(conv2): Conv2d(88, 88, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=88, bias=False)
(bn2): BatchNorm2d(88, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): ReLU(inplace=True)
(se): Identity()
(conv3): Conv2d(88, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): ReLU(inplace=True)
)
(3): Block(
(conv1): Conv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): Hardswish()
(conv2): Conv2d(96, 96, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=96, bias=False)
(bn2): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): Hardswish()
(se): SeModule(
(se): Sequential(
(0): AdaptiveAvgPool2d(output_size=1)
(1): Conv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
(4): Conv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
(5): Hardsigmoid()
)
)
(conv3): Conv2d(96, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): Hardswish()
(skip): Sequential(
(0): Conv2d(24, 24, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=24, bias=False)
(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): Conv2d(24, 40, kernel_size=(1, 1), stride=(1, 1))
(3): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(4): Block(
(conv1): Conv2d(40, 240, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): Hardswish()
(conv2): Conv2d(240, 240, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=240, bias=False)
(bn2): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): Hardswish()
(se): SeModule(
(se): Sequential(
(0): AdaptiveAvgPool2d(output_size=1)
(1): Conv2d(240, 60, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): BatchNorm2d(60, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
(4): Conv2d(60, 240, kernel_size=(1, 1), stride=(1, 1), bias=False)
(5): Hardsigmoid()
)
)
(conv3): Conv2d(240, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): Hardswish()
)
(5): Block(
(conv1): Conv2d(40, 240, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): Hardswish()
(conv2): Conv2d(240, 240, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=240, bias=False)
(bn2): BatchNorm2d(240, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): Hardswish()
(se): SeModule(
(se): Sequential(
(0): AdaptiveAvgPool2d(output_size=1)
(1): Conv2d(240, 60, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): BatchNorm2d(60, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
(4): Conv2d(60, 240, kernel_size=(1, 1), stride=(1, 1), bias=False)
(5): Hardsigmoid()
)
)
(conv3): Conv2d(240, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): Hardswish()
)
(6): Block(
(conv1): Conv2d(40, 120, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): Hardswish()
(conv2): Conv2d(120, 120, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=120, bias=False)
(bn2): BatchNorm2d(120, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): Hardswish()
(se): SeModule(
(se): Sequential(
(0): AdaptiveAvgPool2d(output_size=1)
(1): Conv2d(120, 30, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): BatchNorm2d(30, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
(4): Conv2d(30, 120, kernel_size=(1, 1), stride=(1, 1), bias=False)
(5): Hardsigmoid()
)
)
(conv3): Conv2d(120, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): Hardswish()
(skip): Sequential(
(0): Conv2d(40, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(7): Block(
(conv1): Conv2d(48, 144, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): Hardswish()
(conv2): Conv2d(144, 144, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=144, bias=False)
(bn2): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): Hardswish()
(se): SeModule(
(se): Sequential(
(0): AdaptiveAvgPool2d(output_size=1)
(1): Conv2d(144, 36, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): BatchNorm2d(36, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
(4): Conv2d(36, 144, kernel_size=(1, 1), stride=(1, 1), bias=False)
(5): Hardsigmoid()
)
)
(conv3): Conv2d(144, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): Hardswish()
)
(8): Block(
(conv1): Conv2d(48, 288, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(288, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): Hardswish()
(conv2): Conv2d(288, 288, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=288, bias=False)
(bn2): BatchNorm2d(288, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): Hardswish()
(se): SeModule(
(se): Sequential(
(0): AdaptiveAvgPool2d(output_size=1)
(1): Conv2d(288, 72, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): BatchNorm2d(72, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
(4): Conv2d(72, 288, kernel_size=(1, 1), stride=(1, 1), bias=False)
(5): Hardsigmoid()
)
)
(conv3): Conv2d(288, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): Hardswish()
(skip): Sequential(
(0): Conv2d(48, 48, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=48, bias=False)
(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): Conv2d(48, 96, kernel_size=(1, 1), stride=(1, 1))
(3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(9): Block(
(conv1): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): Hardswish()
(conv2): Conv2d(576, 576, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=576, bias=False)
(bn2): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): Hardswish()
(se): SeModule(
(se): Sequential(
(0): AdaptiveAvgPool2d(output_size=1)
(1): Conv2d(576, 144, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
(4): Conv2d(144, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
(5): Hardsigmoid()
)
)
(conv3): Conv2d(576, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): Hardswish()
)
(10): Block(
(conv1): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): Hardswish()
(conv2): Conv2d(576, 576, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=576, bias=False)
(bn2): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): Hardswish()
(se): SeModule(
(se): Sequential(
(0): AdaptiveAvgPool2d(output_size=1)
(1): Conv2d(576, 144, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace=True)
(4): Conv2d(144, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
(5): Hardsigmoid()
)
)
(conv3): Conv2d(576, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): Hardswish()
)
)
(conv2): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn2): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(hs2): Hardswish()
(gap): AdaptiveAvgPool2d(output_size=1)
(linear3): Linear(in_features=576, out_features=1280, bias=False)
(bn3): BatchNorm1d(1280, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(hs3): Hardswish()
(drop): Dropout(p=0.2, inplace=False)
(linear4): Linear(in_features=1280, out_features=1000, bias=True)
)
number of params: 2950524
LR = 0.00400000
Batch size = 4096
Update frequent = 2
Number of training examples = 1281167
Number of training training per epoch = 312
Param groups = {
"decay": {
"weight_decay": 0.05,
"params": [
"conv1.weight",
"bneck.0.conv1.weight",
"bneck.0.conv2.weight",
"bneck.0.se.se.1.weight",
"bneck.0.se.se.4.weight",
"bneck.0.conv3.weight",
"bneck.0.skip.0.weight",
"bneck.1.conv1.weight",
"bneck.1.conv2.weight",
"bneck.1.conv3.weight",
"bneck.1.skip.0.weight",
"bneck.1.skip.2.weight",
"bneck.2.conv1.weight",
"bneck.2.conv2.weight",
"bneck.2.conv3.weight",
"bneck.3.conv1.weight",
"bneck.3.conv2.weight",
"bneck.3.se.se.1.weight",
"bneck.3.se.se.4.weight",
"bneck.3.conv3.weight",
"bneck.3.skip.0.weight",
"bneck.3.skip.2.weight",
"bneck.4.conv1.weight",
"bneck.4.conv2.weight",
"bneck.4.se.se.1.weight",
"bneck.4.se.se.4.weight",
"bneck.4.conv3.weight",
"bneck.5.conv1.weight",
"bneck.5.conv2.weight",
"bneck.5.se.se.1.weight",
"bneck.5.se.se.4.weight",
"bneck.5.conv3.weight",
"bneck.6.conv1.weight",
"bneck.6.conv2.weight",
"bneck.6.se.se.1.weight",
"bneck.6.se.se.4.weight",
"bneck.6.conv3.weight",
"bneck.6.skip.0.weight",
"bneck.7.conv1.weight",
"bneck.7.conv2.weight",
"bneck.7.se.se.1.weight",
"bneck.7.se.se.4.weight",
"bneck.7.conv3.weight",
"bneck.8.conv1.weight",
"bneck.8.conv2.weight",
"bneck.8.se.se.1.weight",
"bneck.8.se.se.4.weight",
"bneck.8.conv3.weight",
"bneck.8.skip.0.weight",
"bneck.8.skip.2.weight",
"bneck.9.conv1.weight",
"bneck.9.conv2.weight",
"bneck.9.se.se.1.weight",
"bneck.9.se.se.4.weight",
"bneck.9.conv3.weight",
"bneck.10.conv1.weight",
"bneck.10.conv2.weight",
"bneck.10.se.se.1.weight",
"bneck.10.se.se.4.weight",
"bneck.10.conv3.weight",
"conv2.weight",
"linear3.weight",
"linear4.weight"
],
"lr_scale": 1.0
},
"no_decay": {
"weight_decay": 0.0,
"params": [
"bn1.weight",
"bn1.bias",
"bneck.0.bn1.weight",
"bneck.0.bn1.bias",
"bneck.0.bn2.weight",
"bneck.0.bn2.bias",
"bneck.0.se.se.2.weight",
"bneck.0.se.se.2.bias",
"bneck.0.bn3.weight",
"bneck.0.bn3.bias",
"bneck.0.skip.1.weight",
"bneck.0.skip.1.bias",
"bneck.1.bn1.weight",
"bneck.1.bn1.bias",
"bneck.1.bn2.weight",
"bneck.1.bn2.bias",
"bneck.1.bn3.weight",
"bneck.1.bn3.bias",
"bneck.1.skip.1.weight",
"bneck.1.skip.1.bias",
"bneck.1.skip.2.bias",
"bneck.1.skip.3.weight",
"bneck.1.skip.3.bias",
"bneck.2.bn1.weight",
"bneck.2.bn1.bias",
"bneck.2.bn2.weight",
"bneck.2.bn2.bias",
"bneck.2.bn3.weight",
"bneck.2.bn3.bias",
"bneck.3.bn1.weight",
"bneck.3.bn1.bias",
"bneck.3.bn2.weight",
"bneck.3.bn2.bias",
"bneck.3.se.se.2.weight",
"bneck.3.se.se.2.bias",
"bneck.3.bn3.weight",
"bneck.3.bn3.bias",
"bneck.3.skip.1.weight",
"bneck.3.skip.1.bias",
"bneck.3.skip.2.bias",
"bneck.3.skip.3.weight",
"bneck.3.skip.3.bias",
"bneck.4.bn1.weight",
"bneck.4.bn1.bias",
"bneck.4.bn2.weight",
"bneck.4.bn2.bias",
"bneck.4.se.se.2.weight",
"bneck.4.se.se.2.bias",
"bneck.4.bn3.weight",
"bneck.4.bn3.bias",
"bneck.5.bn1.weight",
"bneck.5.bn1.bias",
"bneck.5.bn2.weight",
"bneck.5.bn2.bias",
"bneck.5.se.se.2.weight",
"bneck.5.se.se.2.bias",
"bneck.5.bn3.weight",
"bneck.5.bn3.bias",
"bneck.6.bn1.weight",
"bneck.6.bn1.bias",
"bneck.6.bn2.weight",
"bneck.6.bn2.bias",
"bneck.6.se.se.2.weight",
"bneck.6.se.se.2.bias",
"bneck.6.bn3.weight",
"bneck.6.bn3.bias",
"bneck.6.skip.1.weight",
"bneck.6.skip.1.bias",
"bneck.7.bn1.weight",
"bneck.7.bn1.bias",
"bneck.7.bn2.weight",
"bneck.7.bn2.bias",
"bneck.7.se.se.2.weight",
"bneck.7.se.se.2.bias",
"bneck.7.bn3.weight",
"bneck.7.bn3.bias",
"bneck.8.bn1.weight",
"bneck.8.bn1.bias",
"bneck.8.bn2.weight",
"bneck.8.bn2.bias",
"bneck.8.se.se.2.weight",
"bneck.8.se.se.2.bias",
"bneck.8.bn3.weight",
"bneck.8.bn3.bias",
"bneck.8.skip.1.weight",
"bneck.8.skip.1.bias",
"bneck.8.skip.2.bias",
"bneck.8.skip.3.weight",
"bneck.8.skip.3.bias",
"bneck.9.bn1.weight",
"bneck.9.bn1.bias",
"bneck.9.bn2.weight",
"bneck.9.bn2.bias",
"bneck.9.se.se.2.weight",
"bneck.9.se.se.2.bias",
"bneck.9.bn3.weight",
"bneck.9.bn3.bias",
"bneck.10.bn1.weight",
"bneck.10.bn1.bias",
"bneck.10.bn2.weight",
"bneck.10.bn2.bias",
"bneck.10.se.se.2.weight",
"bneck.10.se.se.2.bias",
"bneck.10.bn3.weight",
"bneck.10.bn3.bias",
"bn2.weight",
"bn2.bias",
"bn3.weight",
"bn3.bias",
"linear4.bias"
],
"lr_scale": 1.0
}
}
Use Cosine LR scheduler
Set warmup steps = 6240
Set warmup steps = 0
Max WD = 0.0500000, Min WD = 0.0500000
criterion = LabelSmoothingCrossEntropy()
Auto resume checkpoint:
Start training for 300 epochs
Epoch: [0] [ 0/625] eta: 5:22:53 lr: 0.000000 min_lr: 0.000000 loss: 6.9073 (6.9073) class_acc: 0.0000 (0.0000) weight_decay: 0.0500 (0.0500) time: 30.9969 data: 24.4808 max mem: 2905
Epoch: [0] [200/625] eta: 0:15:28 lr: 0.000064 min_lr: 0.000064 loss: 6.8783 (6.8985) class_acc: 0.0000 (0.0013) weight_decay: 0.0500 (0.0500) grad_norm: 0.4079 (0.4513) time: 1.9811 data: 0.3600 max mem: 2905
Epoch: [0] [400/625] eta: 0:08:01 lr: 0.000128 min_lr: 0.000128 loss: 6.7819 (6.8701) class_acc: 0.0039 (0.0018) weight_decay: 0.0500 (0.0500) grad_norm: 0.5901 (0.4735) time: 2.2324 data: 0.0009 max mem: 2905
Epoch: [0] [600/625] eta: 0:00:53 lr: 0.000192 min_lr: 0.000192 loss: 6.5924 (6.8041) class_acc: 0.0039 (0.0032) weight_decay: 0.0500 (0.0500) grad_norm: 0.9315 (0.5781) time: 2.0935 data: 0.0452 max mem: 2905
Epoch: [0] [624/625] eta: 0:00:02 lr: 0.000199 min_lr: 0.000199 loss: 6.5503 (6.7951) class_acc: 0.0078 (0.0034) weight_decay: 0.0500 (0.0500) grad_norm: 0.9210 (0.5922) time: 0.6318 data: 0.0064 max mem: 2905
Epoch: [0] Total time: 0:21:52 (2.1002 s / it)
Averaged stats: lr: 0.000199 min_lr: 0.000199 loss: 6.5503 (6.7943) class_acc: 0.0078 (0.0036) weight_decay: 0.0500 (0.0500) grad_norm: 0.9210 (0.5922)
Test: [ 0/50] eta: 0:11:29 loss: 6.2122 (6.2122) acc1: 0.0000 (0.0000) acc5: 2.4000 (2.4000) time: 13.7935 data: 13.0781 max mem: 2905
Test: [10/50] eta: 0:01:21 loss: 6.2798 (6.2349) acc1: 0.8000 (2.0364) acc5: 5.6000 (6.4727) time: 2.0441 data: 1.9611 max mem: 2905
Test: [20/50] eta: 0:00:46 loss: 6.2798 (6.2465) acc1: 1.6000 (1.9048) acc5: 5.6000 (6.4381) time: 0.9553 data: 0.9362 max mem: 2905
Test: [30/50] eta: 0:00:27 loss: 6.2106 (6.2254) acc1: 1.6000 (2.1161) acc5: 6.4000 (6.7097) time: 1.0293 data: 1.0112 max mem: 2905
Test: [40/50] eta: 0:00:12 loss: 6.2730 (6.2426) acc1: 1.6000 (1.9707) acc5: 5.6000 (6.1463) time: 0.9834 data: 0.9653 max mem: 2905
Test: [49/50] eta: 0:00:01 loss: 6.2752 (6.2454) acc1: 0.8000 (1.8560) acc5: 4.8000 (6.0320) time: 0.9657 data: 0.9474 max mem: 2905
Test: Total time: 0:00:59 (1.1806 s / it)
* Acc@1 1.620 Acc@5 5.984 loss 6.224
Accuracy of the model on the 50000 test images: 1.6%
Max accuracy: 1.62%
Epoch: [1] [ 0/625] eta: 3:30:46 lr: 0.000200 min_lr: 0.000200 loss: 6.5212 (6.5212) class_acc: 0.0078 (0.0078) weight_decay: 0.0500 (0.0500) time: 20.2340 data: 18.8082 max mem: 2905
Epoch: [1] [200/625] eta: 0:14:09 lr: 0.000264 min_lr: 0.000264 loss: 6.3831 (6.4747) class_acc: 0.0156 (0.0131) weight_decay: 0.0500 (0.0500) grad_norm: 1.1870 (1.0887) time: 1.9414 data: 0.0293 max mem: 2905
Epoch: [1] [400/625] eta: 0:07:25 lr: 0.000328 min_lr: 0.000328 loss: 6.3248 (6.4075) class_acc: 0.0234 (0.0163) weight_decay: 0.0500 (0.0500) grad_norm: 1.2754 (1.1543) time: 1.9028 data: 0.0008 max mem: 2905
Epoch: [1] [600/625] eta: 0:00:49 lr: 0.000392 min_lr: 0.000392 loss: 6.1402 (6.3414) class_acc: 0.0273 (0.0195) weight_decay: 0.0500 (0.0500) grad_norm: 1.3534 (1.2217) time: 2.0527 data: 0.0007 max mem: 2905
Epoch: [1] [624/625] eta: 0:00:01 lr: 0.000399 min_lr: 0.000399 loss: 6.1052 (6.3331) class_acc: 0.0312 (0.0201) weight_decay: 0.0500 (0.0500) grad_norm: 1.4380 (1.2335) time: 0.8558 data: 0.0018 max mem: 2905
Epoch: [1] Total time: 0:20:10 (1.9367 s / it)
Averaged stats: lr: 0.000399 min_lr: 0.000399 loss: 6.1052 (6.3370) class_acc: 0.0312 (0.0199) weight_decay: 0.0500 (0.0500) grad_norm: 1.4380 (1.2335)
Test: [ 0/50] eta: 0:10:19 loss: 5.3883 (5.3883) acc1: 5.6000 (5.6000) acc5: 20.8000 (20.8000) time: 12.3925 data: 12.3576 max mem: 2905
Test: [10/50] eta: 0:01:24 loss: 5.4225 (5.4007) acc1: 6.4000 (7.0545) acc5: 20.0000 (18.8364) time: 2.1014 data: 2.0793 max mem: 2905
Test: [20/50] eta: 0:00:51 loss: 5.4212 (5.4345) acc1: 5.6000 (5.9048) acc5: 15.2000 (17.8286) time: 1.1690 data: 1.1492 max mem: 2905
Test: [30/50] eta: 0:00:30 loss: 5.3942 (5.4341) acc1: 4.8000 (5.8839) acc5: 15.2000 (17.3161) time: 1.2309 data: 1.2122 max mem: 2905
Test: [40/50] eta: 0:00:13 loss: 5.4350 (5.4472) acc1: 4.8000 (5.6976) acc5: 14.4000 (16.6049) time: 0.8849 data: 0.8655 max mem: 2905
Test: [49/50] eta: 0:00:01 loss: 5.5120 (5.4625) acc1: 4.8000 (5.6800) acc5: 14.4000 (16.3680) time: 0.8166 data: 0.7956 max mem: 2905
Test: Total time: 0:00:54 (1.0842 s / it)
* Acc@1 6.006 Acc@5 17.070 loss 5.431
Accuracy of the model on the 50000 test images: 6.0%
Max accuracy: 6.01%
Epoch: [2] [ 0/625] eta: 3:47:03 lr: 0.000400 min_lr: 0.000400 loss: 6.1433 (6.1433) class_acc: 0.0234 (0.0234) weight_decay: 0.0500 (0.0500) time: 21.7982 data: 18.5213 max mem: 2905
Epoch: [2] [200/625] eta: 0:13:53 lr: 0.000464 min_lr: 0.000464 loss: 6.0624 (6.0685) class_acc: 0.0391 (0.0372) weight_decay: 0.0500 (0.0500) grad_norm: 1.6316 (1.5256) time: 1.8131 data: 0.0244 max mem: 2905
Epoch: [2] [400/625] eta: 0:07:09 lr: 0.000528 min_lr: 0.000528 loss: 5.8892 (6.0085) class_acc: 0.0508 (0.0422) weight_decay: 0.0500 (0.0500) grad_norm: 1.5450 (1.5541) time: 1.8028 data: 0.0042 max mem: 2905
Epoch: [2] [600/625] eta: 0:00:47 lr: 0.000592 min_lr: 0.000592 loss: 5.7767 (5.9473) class_acc: 0.0586 (0.0475) weight_decay: 0.0500 (0.0500) grad_norm: 1.7937 (1.6152) time: 1.9841 data: 0.0006 max mem: 2905
Epoch: [2] [624/625] eta: 0:00:01 lr: 0.000599 min_lr: 0.000599 loss: 5.7209 (5.9391) class_acc: 0.0625 (0.0482) weight_decay: 0.0500 (0.0500) grad_norm: 1.7196 (1.6232) time: 0.8288 data: 0.0015 max mem: 2905
Epoch: [2] Total time: 0:19:34 (1.8798 s / it)
Averaged stats: lr: 0.000599 min_lr: 0.000599 loss: 5.7209 (5.9395) class_acc: 0.0625 (0.0480) weight_decay: 0.0500 (0.0500) grad_norm: 1.7196 (1.6232)
Test: [ 0/50] eta: 0:09:46 loss: 4.8598 (4.8598) acc1: 11.2000 (11.2000) acc5: 27.2000 (27.2000) time: 11.7263 data: 11.6884 max mem: 2905
Test: [10/50] eta: 0:01:17 loss: 4.8561 (4.7968) acc1: 12.0000 (12.5818) acc5: 28.8000 (29.4545) time: 1.9294 data: 1.9089 max mem: 2905
Test: [20/50] eta: 0:00:44 loss: 4.8561 (4.8427) acc1: 11.2000 (11.2762) acc5: 28.8000 (28.1524) time: 0.9804 data: 0.9605 max mem: 2905
Test: [30/50] eta: 0:00:25 loss: 4.8037 (4.8281) acc1: 10.4000 (11.1742) acc5: 28.0000 (28.5419) time: 0.9209 data: 0.9003 max mem: 2905
Test: [40/50] eta: 0:00:10 loss: 4.8505 (4.8551) acc1: 8.8000 (10.9659) acc5: 27.2000 (28.1756) time: 0.6264 data: 0.6058 max mem: 2905
Test: [49/50] eta: 0:00:00 loss: 4.9349 (4.8786) acc1: 9.6000 (10.8160) acc5: 24.8000 (27.5040) time: 0.6236 data: 0.6034 max mem: 2905
Test: Total time: 0:00:46 (0.9271 s / it)
* Acc@1 11.498 Acc@5 28.242 loss 4.837
Accuracy of the model on the 50000 test images: 11.5%
Max accuracy: 11.50%
Epoch: [3] [ 0/625] eta: 3:34:40 lr: 0.000600 min_lr: 0.000600 loss: 5.7872 (5.7872) class_acc: 0.0508 (0.0508) weight_decay: 0.0500 (0.0500) time: 20.6085 data: 20.4796 max mem: 2905
Epoch: [3] [200/625] eta: 0:13:29 lr: 0.000664 min_lr: 0.000664 loss: 5.6046 (5.6750) class_acc: 0.0742 (0.0743) weight_decay: 0.0500 (0.0500) grad_norm: 1.9255 (1.8777) time: 1.8261 data: 0.2726 max mem: 2905
Epoch: [3] [400/625] eta: 0:07:08 lr: 0.000728 min_lr: 0.000728 loss: 5.4507 (5.6158) class_acc: 0.1055 (0.0807) weight_decay: 0.0500 (0.0500) grad_norm: 2.1282 (1.9889) time: 1.8797 data: 0.0230 max mem: 2905
Epoch: [3] [600/625] eta: 0:00:48 lr: 0.000792 min_lr: 0.000792 loss: 5.4681 (5.5621) class_acc: 0.1016 (0.0856) weight_decay: 0.0500 (0.0500) grad_norm: 1.9530 (2.0324) time: 1.9466 data: 0.0006 max mem: 2905
Epoch: [3] [624/625] eta: 0:00:01 lr: 0.000799 min_lr: 0.000799 loss: 5.3850 (5.5557) class_acc: 0.1055 (0.0865) weight_decay: 0.0500 (0.0500) grad_norm: 2.1287 (2.0300) time: 0.7554 data: 0.0016 max mem: 2905
Epoch: [3] Total time: 0:19:31 (1.8738 s / it)
Averaged stats: lr: 0.000799 min_lr: 0.000799 loss: 5.3850 (5.5575) class_acc: 0.1055 (0.0864) weight_decay: 0.0500 (0.0500) grad_norm: 2.1287 (2.0300)
Test: [ 0/50] eta: 0:10:33 loss: 4.4218 (4.4218) acc1: 12.8000 (12.8000) acc5: 36.0000 (36.0000) time: 12.6686 data: 12.6427 max mem: 2905
Test: [10/50] eta: 0:01:15 loss: 4.3667 (4.2635) acc1: 19.2000 (18.6182) acc5: 40.0000 (40.8727) time: 1.8861 data: 1.8650 max mem: 2905
Test: [20/50] eta: 0:00:40 loss: 4.3667 (4.3161) acc1: 16.0000 (16.3810) acc5: 39.2000 (38.1333) time: 0.7907 data: 0.7708 max mem: 2905
Test: [30/50] eta: 0:00:25 loss: 4.3468 (4.2962) acc1: 15.2000 (16.4903) acc5: 36.8000 (38.0903) time: 0.9523 data: 0.9335 max mem: 2905
Test: [40/50] eta: 0:00:12 loss: 4.2393 (4.3090) acc1: 16.8000 (16.9366) acc5: 36.8000 (37.7951) time: 1.0633 data: 1.0448 max mem: 2905
Test: [49/50] eta: 0:00:01 loss: 4.4039 (4.3490) acc1: 16.0000 (16.4960) acc5: 34.4000 (37.3280) time: 0.7165 data: 0.6985 max mem: 2905
Test: Total time: 0:00:54 (1.0855 s / it)
* Acc@1 17.338 Acc@5 37.778 loss 4.303
Accuracy of the model on the 50000 test images: 17.3%
Max accuracy: 17.34%
Epoch: [4] [ 0/625] eta: 3:15:07 lr: 0.000800 min_lr: 0.000800 loss: 5.2683 (5.2683) class_acc: 0.1016 (0.1016) weight_decay: 0.0500 (0.0500) time: 18.7315 data: 18.1657 max mem: 2905
Epoch: [4] [200/625] eta: 0:13:45 lr: 0.000864 min_lr: 0.000864 loss: 5.3035 (5.3295) class_acc: 0.1133 (0.1126) weight_decay: 0.0500 (0.0500) grad_norm: 2.0196 (2.1965) time: 1.8294 data: 0.0499 max mem: 2905
Epoch: [4] [400/625] eta: 0:07:13 lr: 0.000928 min_lr: 0.000928 loss: 5.2115 (5.2869) class_acc: 0.1328 (0.1181) weight_decay: 0.0500 (0.0500) grad_norm: 2.1358 (inf) time: 1.9779 data: 0.0099 max mem: 2905
Epoch: [4] [600/625] eta: 0:00:48 lr: 0.000992 min_lr: 0.000992 loss: 5.1308 (5.2411) class_acc: 0.1406 (0.1237) weight_decay: 0.0500 (0.0500) grad_norm: 2.1777 (inf) time: 1.9683 data: 0.0520 max mem: 2905
Epoch: [4] [624/625] eta: 0:00:01 lr: 0.001000 min_lr: 0.001000 loss: 5.1033 (5.2360) class_acc: 0.1484 (0.1244) weight_decay: 0.0500 (0.0500) grad_norm: 2.3770 (inf) time: 0.8879 data: 0.0299 max mem: 2905
Epoch: [4] Total time: 0:19:35 (1.8809 s / it)
Averaged stats: lr: 0.001000 min_lr: 0.001000 loss: 5.1033 (5.2305) class_acc: 0.1484 (0.1258) weight_decay: 0.0500 (0.0500) grad_norm: 2.3770 (inf)
Test: [ 0/50] eta: 0:10:10 loss: 3.8296 (3.8296) acc1: 24.8000 (24.8000) acc5: 44.0000 (44.0000) time: 12.2025 data: 12.1732 max mem: 2905
Test: [10/50] eta: 0:01:29 loss: 3.8296 (3.8120) acc1: 24.8000 (25.7455) acc5: 48.0000 (47.7091) time: 2.2404 data: 2.2214 max mem: 2905
Test: [20/50] eta: 0:00:52 loss: 3.9097 (3.9069) acc1: 21.6000 (22.8571) acc5: 46.4000 (45.1810) time: 1.2378 data: 1.2191 max mem: 2905
Test: [30/50] eta: 0:00:29 loss: 3.9239 (3.8921) acc1: 20.8000 (23.0968) acc5: 41.6000 (45.0839) time: 1.0447 data: 1.0252 max mem: 2905
Test: [40/50] eta: 0:00:12 loss: 3.9548 (3.9210) acc1: 22.4000 (22.8683) acc5: 41.6000 (44.7610) time: 0.6728 data: 0.6523 max mem: 2905
Test: [49/50] eta: 0:00:01 loss: 4.0196 (3.9464) acc1: 20.8000 (22.3360) acc5: 42.4000 (44.3680) time: 0.6484 data: 0.6262 max mem: 2905
Test: Total time: 0:00:52 (1.0593 s / it)
* Acc@1 22.582 Acc@5 44.934 loss 3.915
Accuracy of the model on the 50000 test images: 22.6%
Max accuracy: 22.58%
Epoch: [5] [ 0/625] eta: 3:37:49 lr: 0.001000 min_lr: 0.001000 loss: 5.1298 (5.1298) class_acc: 0.1367 (0.1367) weight_decay: 0.0500 (0.0500) time: 20.9108 data: 18.1586 max mem: 2905
Epoch: [5] [200/625] eta: 0:13:51 lr: 0.001064 min_lr: 0.001064 loss: 5.0483 (5.0432) class_acc: 0.1602 (0.1535) weight_decay: 0.0500 (0.0500) grad_norm: 2.8175 (2.6947) time: 1.9335 data: 0.0055 max mem: 2905
Epoch: [5] [400/625] eta: 0:07:11 lr: 0.001128 min_lr: 0.001128 loss: 4.9352 (5.0098) class_acc: 0.1641 (0.1558) weight_decay: 0.0500 (0.0500) grad_norm: 2.1319 (2.5978) time: 2.0246 data: 0.0043 max mem: 2905
Epoch: [5] [600/625] eta: 0:00:48 lr: 0.001192 min_lr: 0.001192 loss: 4.8698 (4.9720) class_acc: 0.1641 (0.1608) weight_decay: 0.0500 (0.0500) grad_norm: 2.8755 (2.6583) time: 1.9903 data: 0.0335 max mem: 2905
Epoch: [5] [624/625] eta: 0:00:01 lr: 0.001200 min_lr: 0.001200 loss: 4.8149 (4.9663) class_acc: 0.1719 (0.1614) weight_decay: 0.0500 (0.0500) grad_norm: 2.6388 (2.6477) time: 0.7885 data: 0.0014 max mem: 2905
Epoch: [5] Total time: 0:19:32 (1.8765 s / it)
Averaged stats: lr: 0.001200 min_lr: 0.001200 loss: 4.8149 (4.9633) class_acc: 0.1719 (0.1627) weight_decay: 0.0500 (0.0500) grad_norm: 2.6388 (2.6477)
Test: [ 0/50] eta: 0:10:16 loss: 3.5700 (3.5700) acc1: 28.8000 (28.8000) acc5: 50.4000 (50.4000) time: 12.3370 data: 12.3068 max mem: 2905
Test: [10/50] eta: 0:01:23 loss: 3.5700 (3.5258) acc1: 28.8000 (27.9273) acc5: 52.0000 (52.3636) time: 2.0841 data: 2.0649 max mem: 2905
Test: [20/50] eta: 0:00:48 loss: 3.6373 (3.6148) acc1: 25.6000 (25.3714) acc5: 50.4000 (50.6667) time: 1.0670 data: 1.0480 max mem: 2905
Test: [30/50] eta: 0:00:27 loss: 3.6760 (3.5991) acc1: 24.8000 (26.0387) acc5: 49.6000 (50.6839) time: 0.9670 data: 0.9467 max mem: 2905
Test: [40/50] eta: 0:00:11 loss: 3.6760 (3.6394) acc1: 25.6000 (26.1659) acc5: 49.6000 (49.9122) time: 0.6360 data: 0.6151 max mem: 2905
Test: [49/50] eta: 0:00:00 loss: 3.6795 (3.6541) acc1: 25.6000 (25.9040) acc5: 50.4000 (50.0000) time: 0.5791 data: 0.5590 max mem: 2905
Test: Total time: 0:00:48 (0.9651 s / it)
* Acc@1 26.876 Acc@5 50.772 loss 3.622
Accuracy of the model on the 50000 test images: 26.9%
Max accuracy: 26.88%
Epoch: [6] [ 0/625] eta: 3:29:02 lr: 0.001200 min_lr: 0.001200 loss: 4.9009 (4.9009) class_acc: 0.1836 (0.1836) weight_decay: 0.0500 (0.0500) time: 20.0682 data: 19.9396 max mem: 2905
Epoch: [6] [200/625] eta: 0:13:53 lr: 0.001264 min_lr: 0.001264 loss: 4.7983 (4.7987) class_acc: 0.1953 (0.1882) weight_decay: 0.0500 (0.0500) grad_norm: 2.6991 (2.5532) time: 1.9000 data: 1.6467 max mem: 2905
Epoch: [6] [400/625] eta: 0:07:08 lr: 0.001328 min_lr: 0.001328 loss: 4.7366 (4.7691) class_acc: 0.1992 (0.1925) weight_decay: 0.0500 (0.0500) grad_norm: 2.5836 (2.5758) time: 1.6145 data: 1.4355 max mem: 2905
Epoch: [6] [600/625] eta: 0:00:47 lr: 0.001393 min_lr: 0.001393 loss: 4.6280 (4.7387) class_acc: 0.2070 (0.1974) weight_decay: 0.0500 (0.0500) grad_norm: 2.2042 (2.5552) time: 2.0588 data: 1.8792 max mem: 2905
Epoch: [6] [624/625] eta: 0:00:01 lr: 0.001400 min_lr: 0.001400 loss: 4.6438 (4.7372) class_acc: 0.2070 (0.1974) weight_decay: 0.0500 (0.0500) grad_norm: 2.6381 (2.5659) time: 0.6559 data: 0.4986 max mem: 2905
Epoch: [6] Total time: 0:19:26 (1.8664 s / it)
Averaged stats: lr: 0.001400 min_lr: 0.001400 loss: 4.6438 (4.7416) class_acc: 0.2070 (0.1962) weight_decay: 0.0500 (0.0500) grad_norm: 2.6381 (2.5659)
Test: [ 0/50] eta: 0:09:43 loss: 3.2068 (3.2068) acc1: 33.6000 (33.6000) acc5: 59.2000 (59.2000) time: 11.6622 data: 11.6334 max mem: 2905
Test: [10/50] eta: 0:01:24 loss: 3.2586 (3.2402) acc1: 34.4000 (33.4545) acc5: 60.0000 (59.3455) time: 2.1058 data: 2.0845 max mem: 2905
Test: [20/50] eta: 0:00:49 loss: 3.3495 (3.3518) acc1: 30.4000 (30.8571) acc5: 57.6000 (56.6476) time: 1.1618 data: 1.1412 max mem: 2905
Test: [30/50] eta: 0:00:26 loss: 3.3963 (3.3467) acc1: 29.6000 (30.6065) acc5: 54.4000 (56.4903) time: 0.9146 data: 0.8941 max mem: 2905
Test: [40/50] eta: 0:00:10 loss: 3.3060 (3.3673) acc1: 29.6000 (30.2829) acc5: 55.2000 (55.8829) time: 0.4684 data: 0.4486 max mem: 2905
Test: [49/50] eta: 0:00:00 loss: 3.4843 (3.3869) acc1: 28.8000 (29.8400) acc5: 53.6000 (55.4240) time: 0.4594 data: 0.4407 max mem: 2905
Test: Total time: 0:00:45 (0.9178 s / it)
* Acc@1 30.902 Acc@5 55.788 loss 3.356
Accuracy of the model on the 50000 test images: 30.9%
Max accuracy: 30.90%
Epoch: [7] [ 0/625] eta: 4:03:25 lr: 0.001400 min_lr: 0.001400 loss: 4.6585 (4.6585) class_acc: 0.2344 (0.2344) weight_decay: 0.0500 (0.0500) time: 23.3680 data: 16.2731 max mem: 2905
Epoch: [7] [200/625] eta: 0:13:40 lr: 0.001464 min_lr: 0.001464 loss: 4.5794 (4.6002) class_acc: 0.2070 (0.2162) weight_decay: 0.0500 (0.0500) grad_norm: 2.3339 (2.6517) time: 2.0130 data: 0.0005 max mem: 2905
Epoch: [7] [400/625] eta: 0:07:09 lr: 0.001528 min_lr: 0.001528 loss: 4.5334 (4.5822) class_acc: 0.2305 (0.2191) weight_decay: 0.0500 (0.0500) grad_norm: 2.3935 (2.6114) time: 1.9204 data: 0.0005 max mem: 2905
Epoch: [7] [600/625] eta: 0:00:47 lr: 0.001593 min_lr: 0.001593 loss: 4.4742 (4.5610) class_acc: 0.2422 (0.2231) weight_decay: 0.0500 (0.0500) grad_norm: 2.2934 (2.5822) time: 2.0638 data: 0.0005 max mem: 2905
Epoch: [7] [624/625] eta: 0:00:01 lr: 0.001600 min_lr: 0.001600 loss: 4.4444 (4.5575) class_acc: 0.2383 (0.2238) weight_decay: 0.0500 (0.0500) grad_norm: 2.4272 (2.5888) time: 0.7204 data: 0.0013 max mem: 2905
Epoch: [7] Total time: 0:19:30 (1.8731 s / it)
Averaged stats: lr: 0.001600 min_lr: 0.001600 loss: 4.4444 (4.5618) class_acc: 0.2383 (0.2250) weight_decay: 0.0500 (0.0500) grad_norm: 2.4272 (2.5888)
Test: [ 0/50] eta: 0:10:50 loss: 2.9093 (2.9093) acc1: 36.8000 (36.8000) acc5: 63.2000 (63.2000) time: 13.0051 data: 12.9644 max mem: 2905
Test: [10/50] eta: 0:01:27 loss: 2.9527 (3.0195) acc1: 37.6000 (37.1636) acc5: 62.4000 (62.3273) time: 2.1962 data: 2.1751 max mem: 2905
Test: [20/50] eta: 0:00:51 loss: 3.2389 (3.1744) acc1: 35.2000 (34.0952) acc5: 58.4000 (59.2762) time: 1.1524 data: 1.1297 max mem: 2905
Test: [30/50] eta: 0:00:29 loss: 3.3120 (3.1727) acc1: 31.2000 (34.0129) acc5: 56.8000 (59.1226) time: 1.0619 data: 1.0387 max mem: 2905
Test: [40/50] eta: 0:00:11 loss: 3.2125 (3.1923) acc1: 32.0000 (34.0098) acc5: 57.6000 (58.9463) time: 0.6523 data: 0.6316 max mem: 2905
Test: [49/50] eta: 0:00:00 loss: 3.3117 (3.2221) acc1: 32.0000 (33.6320) acc5: 57.6000 (58.6240) time: 0.6373 data: 0.6169 max mem: 2905
Test: Total time: 0:00:49 (0.9976 s / it)
* Acc@1 34.238 Acc@5 59.060 loss 3.189
Accuracy of the model on the 50000 test images: 34.2%
Max accuracy: 34.24%
Epoch: [8] [ 0/625] eta: 3:37:37 lr: 0.001600 min_lr: 0.001600 loss: 4.4698 (4.4698) class_acc: 0.2773 (0.2773) weight_decay: 0.0500 (0.0500) time: 20.8919 data: 17.4404 max mem: 2905
Epoch: [8] [200/625] eta: 0:13:38 lr: 0.001664 min_lr: 0.001664 loss: 4.4204 (4.4612) class_acc: 0.2422 (0.2399) weight_decay: 0.0500 (0.0500) grad_norm: 2.5189 (2.6102) time: 1.8606 data: 0.0007 max mem: 2905
Epoch: [8] [400/625] eta: 0:07:10 lr: 0.001728 min_lr: 0.001728 loss: 4.4202 (4.4448) class_acc: 0.2461 (0.2436) weight_decay: 0.0500 (0.0500) grad_norm: 2.3623 (2.5957) time: 1.9448 data: 0.0006 max mem: 2905
Epoch: [8] [600/625] eta: 0:00:48 lr: 0.001793 min_lr: 0.001793 loss: 4.3859 (4.4276) class_acc: 0.2617 (0.2474) weight_decay: 0.0500 (0.0500) grad_norm: 2.4784 (2.5829) time: 2.0012 data: 0.0006 max mem: 2905
Epoch: [8] [624/625] eta: 0:00:01 lr: 0.001800 min_lr: 0.001800 loss: 4.3258 (4.4244) class_acc: 0.2656 (0.2481) weight_decay: 0.0500 (0.0500) grad_norm: 2.2915 (2.5702) time: 0.7980 data: 0.0013 max mem: 2905
Epoch: [8] Total time: 0:19:42 (1.8919 s / it)
Averaged stats: lr: 0.001800 min_lr: 0.001800 loss: 4.3258 (4.4180) class_acc: 0.2656 (0.2493) weight_decay: 0.0500 (0.0500) grad_norm: 2.2915 (2.5702)
Test: [ 0/50] eta: 0:09:58 loss: 2.8935 (2.8935) acc1: 39.2000 (39.2000) acc5: 61.6000 (61.6000) time: 11.9733 data: 11.9122 max mem: 2905
Test: [10/50] eta: 0:01:29 loss: 2.8935 (2.9067) acc1: 40.0000 (40.0727) acc5: 67.2000 (64.8727) time: 2.2268 data: 2.2035 max mem: 2905
Test: [20/50] eta: 0:00:54 loss: 3.0836 (3.0588) acc1: 36.0000 (36.6095) acc5: 61.6000 (61.8667) time: 1.3003 data: 1.2802 max mem: 2905
Test: [30/50] eta: 0:00:31 loss: 3.0992 (3.0312) acc1: 34.4000 (36.9032) acc5: 60.8000 (62.0387) time: 1.1798 data: 1.1599 max mem: 2905
Test: [40/50] eta: 0:00:12 loss: 3.0874 (3.0463) acc1: 38.4000 (36.7415) acc5: 60.8000 (61.8341) time: 0.7516 data: 0.7325 max mem: 2905
Test: [49/50] eta: 0:00:01 loss: 3.0975 (3.0647) acc1: 35.2000 (36.2720) acc5: 60.8000 (61.3920) time: 0.6948 data: 0.6749 max mem: 2905
Test: Total time: 0:00:54 (1.0808 s / it)
* Acc@1 36.742 Acc@5 61.776 loss 3.036
Accuracy of the model on the 50000 test images: 36.7%
Max accuracy: 36.74%
Epoch: [9] [ 0/625] eta: 3:36:02 lr: 0.001800 min_lr: 0.001800 loss: 4.3272 (4.3272) class_acc: 0.2500 (0.2500) weight_decay: 0.0500 (0.0500) time: 20.7408 data: 16.6444 max mem: 2905
Epoch: [9] [200/625] eta: 0:14:23 lr: 0.001864 min_lr: 0.001864 loss: 4.2635 (4.3240) class_acc: 0.2734 (0.2654) weight_decay: 0.0500 (0.0500) grad_norm: 2.4228 (2.4175) time: 1.9545 data: 0.0091 max mem: 2905
Epoch: [9] [400/625] eta: 0:07:29 lr: 0.001929 min_lr: 0.001929 loss: 4.2678 (4.3155) class_acc: 0.2578 (0.2662) weight_decay: 0.0500 (0.0500) grad_norm: 2.1543 (2.5136) time: 1.9232 data: 0.0007 max mem: 2905
Epoch: [9] [600/625] eta: 0:00:49 lr: 0.001993 min_lr: 0.001993 loss: 4.2901 (4.3029) class_acc: 0.2617 (0.2689) weight_decay: 0.0500 (0.0500) grad_norm: 2.5049 (2.5141) time: 1.9850 data: 0.0731 max mem: 2905
Epoch: [9] [624/625] eta: 0:00:01 lr: 0.002000 min_lr: 0.002000 loss: 4.2756 (4.3026) class_acc: 0.2656 (0.2688) weight_decay: 0.0500 (0.0500) grad_norm: 2.2845 (2.5103) time: 0.7254 data: 0.0210 max mem: 2905
Epoch: [9] Total time: 0:20:10 (1.9361 s / it)
Averaged stats: lr: 0.002000 min_lr: 0.002000 loss: 4.2756 (4.2989) class_acc: 0.2656 (0.2701) weight_decay: 0.0500 (0.0500) grad_norm: 2.2845 (2.5103)
Test: [ 0/50] eta: 0:08:55 loss: 2.7651 (2.7651) acc1: 43.2000 (43.2000) acc5: 67.2000 (67.2000) time: 10.7198 data: 10.6874 max mem: 2905
Test: [10/50] eta: 0:01:19 loss: 2.7537 (2.7308) acc1: 40.8000 (41.3818) acc5: 67.2000 (66.6909) time: 1.9963 data: 1.9748 max mem: 2905
Test: [20/50] eta: 0:00:48 loss: 2.9111 (2.8878) acc1: 37.6000 (38.1333) acc5: 63.2000 (64.1905) time: 1.1607 data: 1.1401 max mem: 2905
Test: [30/50] eta: 0:00:28 loss: 2.9691 (2.8784) acc1: 36.0000 (38.1935) acc5: 61.6000 (64.1548) time: 1.1207 data: 1.1007 max mem: 2905
Test: [40/50] eta: 0:00:11 loss: 3.0237 (2.9040) acc1: 37.6000 (38.1073) acc5: 61.6000 (63.2781) time: 0.6884 data: 0.6686 max mem: 2905
Test: [49/50] eta: 0:00:00 loss: 3.0316 (2.9240) acc1: 36.8000 (37.6480) acc5: 60.8000 (62.9120) time: 0.5287 data: 0.5094 max mem: 2905
Test: Total time: 0:00:48 (0.9770 s / it)
* Acc@1 38.540 Acc@5 63.834 loss 2.886
Accuracy of the model on the 50000 test images: 38.5%
Max accuracy: 38.54%
Epoch: [10] [ 0/625] eta: 3:53:22 lr: 0.002000 min_lr: 0.002000 loss: 4.2604 (4.2604) class_acc: 0.2812 (0.2812) weight_decay: 0.0500 (0.0500) time: 22.4036 data: 16.9625 max mem: 2905
Epoch: [10] [200/625] eta: 0:14:06 lr: 0.002064 min_lr: 0.002064 loss: 4.2511 (4.2439) class_acc: 0.2852 (0.2789) weight_decay: 0.0500 (0.0500) grad_norm: 2.1119 (2.3423) time: 1.9231 data: 0.0280 max mem: 2905
Epoch: [10] [400/625] eta: 0:07:13 lr: 0.002129 min_lr: 0.002129 loss: 4.1512 (4.2185) class_acc: 0.2969 (0.2835) weight_decay: 0.0500 (0.0500) grad_norm: 2.5338 (2.3170) time: 1.8732 data: 0.0006 max mem: 2905
Epoch: [10] [600/625] eta: 0:00:48 lr: 0.002193 min_lr: 0.002193 loss: 4.1442 (4.2089) class_acc: 0.3008 (0.2857) weight_decay: 0.0500 (0.0500) grad_norm: 1.9480 (inf) time: 2.0251 data: 0.0221 max mem: 2905
Epoch: [10] [624/625] eta: 0:00:01 lr: 0.002200 min_lr: 0.002200 loss: 4.1306 (4.2070) class_acc: 0.2812 (0.2856) weight_decay: 0.0500 (0.0500) grad_norm: 1.9993 (inf) time: 0.7978 data: 0.0014 max mem: 2905
Epoch: [10] Total time: 0:19:35 (1.8811 s / it)
Averaged stats: lr: 0.002200 min_lr: 0.002200 loss: 4.1306 (4.2042) class_acc: 0.2812 (0.2866) weight_decay: 0.0500 (0.0500) grad_norm: 1.9993 (inf)
Test: [ 0/50] eta: 0:10:14 loss: 2.6962 (2.6962) acc1: 43.2000 (43.2000) acc5: 64.8000 (64.8000) time: 12.2856 data: 12.2574 max mem: 2905
Test: [10/50] eta: 0:01:22 loss: 2.6962 (2.6542) acc1: 46.4000 (44.2909) acc5: 68.0000 (68.0000) time: 2.0653 data: 2.0455 max mem: 2905
Test: [20/50] eta: 0:00:48 loss: 2.8521 (2.8162) acc1: 39.2000 (40.6476) acc5: 65.6000 (66.1333) time: 1.0829 data: 1.0643 max mem: 2905
Test: [30/50] eta: 0:00:28 loss: 2.8902 (2.8010) acc1: 37.6000 (40.8000) acc5: 65.6000 (66.2710) time: 1.0456 data: 1.0258 max mem: 2905
Test: [40/50] eta: 0:00:11 loss: 2.8382 (2.8281) acc1: 38.4000 (40.6829) acc5: 65.6000 (65.5024) time: 0.7540 data: 0.7338 max mem: 2905
Test: [49/50] eta: 0:00:01 loss: 2.9067 (2.8491) acc1: 37.6000 (40.1280) acc5: 65.6000 (65.2000) time: 0.7291 data: 0.7101 max mem: 2905
Test: Total time: 0:00:50 (1.0079 s / it)
* Acc@1 40.592 Acc@5 65.836 loss 2.813
Accuracy of the model on the 50000 test images: 40.6%
Max accuracy: 40.59%
Epoch: [11] [ 0/625] eta: 3:43:47 lr: 0.002200 min_lr: 0.002200 loss: 3.9268 (3.9268) class_acc: 0.3516 (0.3516) weight_decay: 0.0500 (0.0500) time: 21.4832 data: 17.0127 max mem: 2905
Epoch: [11] [200/625] eta: 0:14:26 lr: 0.002264 min_lr: 0.002264 loss: 4.1573 (4.1289) class_acc: 0.3008 (0.3006) weight_decay: 0.0500 (0.0500) grad_norm: 2.0794 (2.3988) time: 1.9358 data: 0.0012 max mem: 2905
Epoch: [11] [400/625] eta: 0:07:24 lr: 0.002329 min_lr: 0.002329 loss: 4.1037 (4.1273) class_acc: 0.2930 (0.2998) weight_decay: 0.0500 (0.0500) grad_norm: 2.4230 (2.4047) time: 1.8472 data: 0.0245 max mem: 2905
Epoch: [11] [600/625] eta: 0:00:49 lr: 0.002393 min_lr: 0.002393 loss: 4.1005 (4.1241) class_acc: 0.2891 (0.3001) weight_decay: 0.0500 (0.0500) grad_norm: 1.6508 (2.3153) time: 2.1331 data: 0.0204 max mem: 2905
Epoch: [11] [624/625] eta: 0:00:01 lr: 0.002400 min_lr: 0.002400 loss: 4.0749 (4.1226) class_acc: 0.3047 (0.3005) weight_decay: 0.0500 (0.0500) grad_norm: 2.1353 (2.3125) time: 0.7291 data: 0.0015 max mem: 2905
Epoch: [11] Total time: 0:20:11 (1.9381 s / it)
Averaged stats: lr: 0.002400 min_lr: 0.002400 loss: 4.0749 (4.1200) class_acc: 0.3047 (0.3013) weight_decay: 0.0500 (0.0500) grad_norm: 2.1353 (2.3125)
Test: [ 0/50] eta: 0:10:36 loss: 2.4763 (2.4763) acc1: 46.4000 (46.4000) acc5: 71.2000 (71.2000) time: 12.7394 data: 12.7065 max mem: 2905
Test: [10/50] eta: 0:01:27 loss: 2.5527 (2.5122) acc1: 46.4000 (46.8364) acc5: 71.2000 (70.9091) time: 2.1897 data: 2.1654 max mem: 2905
Test: [20/50] eta: 0:00:53 loss: 2.6607 (2.6901) acc1: 44.8000 (43.1238) acc5: 67.2000 (68.1524) time: 1.2351 data: 1.2141 max mem: 2905
Test: [30/50] eta: 0:00:32 loss: 2.7662 (2.6840) acc1: 40.8000 (42.6581) acc5: 65.6000 (67.9484) time: 1.3126 data: 1.2931 max mem: 2905
Test: [40/50] eta: 0:00:14 loss: 2.7282 (2.7012) acc1: 41.6000 (42.7707) acc5: 68.0000 (67.7659) time: 1.0129 data: 0.9931 max mem: 2905
Test: [49/50] eta: 0:00:01 loss: 2.7282 (2.7157) acc1: 40.8000 (42.4960) acc5: 68.0000 (67.6000) time: 0.8635 data: 0.8440 max mem: 2905
Test: Total time: 0:00:58 (1.1762 s / it)
* Acc@1 42.952 Acc@5 67.950 loss 2.687
Accuracy of the model on the 50000 test images: 43.0%
Max accuracy: 42.95%
Epoch: [12] [ 0/625] eta: 3:48:46 lr: 0.002400 min_lr: 0.002400 loss: 4.2393 (4.2393) class_acc: 0.2773 (0.2773) weight_decay: 0.0500 (0.0500) time: 21.9627 data: 18.4014 max mem: 2905
Epoch: [12] [200/625] eta: 0:14:12 lr: 0.002464 min_lr: 0.002464 loss: 4.0651 (4.0561) class_acc: 0.3047 (0.3143) weight_decay: 0.0500 (0.0500) grad_norm: 2.0893 (2.3006) time: 1.7633 data: 0.3592 max mem: 2905
Epoch: [12] [400/625] eta: 0:07:15 lr: 0.002529 min_lr: 0.002529 loss: 4.0072 (4.0528) class_acc: 0.3164 (0.3130) weight_decay: 0.0500 (0.0500) grad_norm: 2.0570 (2.3277) time: 1.8035 data: 0.0569 max mem: 2905
Epoch: [12] [600/625] eta: 0:00:48 lr: 0.002593 min_lr: 0.002593 loss: 4.0482 (4.0528) class_acc: 0.3203 (0.3136) weight_decay: 0.0500 (0.0500) grad_norm: 1.9792 (2.3293) time: 2.1068 data: 0.0625 max mem: 2905
Epoch: [12] [624/625] eta: 0:00:01 lr: 0.002600 min_lr: 0.002600 loss: 4.0376 (4.0532) class_acc: 0.3125 (0.3135) weight_decay: 0.0500 (0.0500) grad_norm: 2.0785 (2.3239) time: 0.8143 data: 0.0186 max mem: 2905
Epoch: [12] Total time: 0:20:07 (1.9318 s / it)
Averaged stats: lr: 0.002600 min_lr: 0.002600 loss: 4.0376 (4.0536) class_acc: 0.3125 (0.3139) weight_decay: 0.0500 (0.0500) grad_norm: 2.0785 (2.3239)
Test: [ 0/50] eta: 0:11:07 loss: 2.4376 (2.4376) acc1: 46.4000 (46.4000) acc5: 71.2000 (71.2000) time: 13.3431 data: 13.3089 max mem: 2905
Test: [10/50] eta: 0:01:31 loss: 2.4719 (2.5471) acc1: 47.2000 (46.0364) acc5: 70.4000 (69.7455) time: 2.2773 data: 2.2560 max mem: 2905
Test: [20/50] eta: 0:00:55 loss: 2.7218 (2.7209) acc1: 41.6000 (41.6762) acc5: 67.2000 (67.6952) time: 1.2851 data: 1.2659 max mem: 2905
Test: [30/50] eta: 0:00:31 loss: 2.8736 (2.7055) acc1: 39.2000 (41.8581) acc5: 66.4000 (67.5097) time: 1.1669 data: 1.1483 max mem: 2905
Test: [40/50] eta: 0:00:12 loss: 2.7616 (2.7235) acc1: 40.8000 (41.5024) acc5: 67.2000 (67.0829) time: 0.6777 data: 0.6549 max mem: 2905
Test: [49/50] eta: 0:00:01 loss: 2.8082 (2.7487) acc1: 40.0000 (41.5040) acc5: 65.6000 (66.6880) time: 0.6488 data: 0.6265 max mem: 2905
Test: Total time: 0:00:53 (1.0601 s / it)
* Acc@1 42.014 Acc@5 67.190 loss 2.726
Accuracy of the model on the 50000 test images: 42.0%
Max accuracy: 42.95%
Epoch: [13] [ 0/625] eta: 4:13:58 lr: 0.002600 min_lr: 0.002600 loss: 3.8410 (3.8410) class_acc: 0.3633 (0.3633) weight_decay: 0.0500 (0.0500) time: 24.3811 data: 23.5717 max mem: 2905
Epoch: [13] [200/625] eta: 0:15:25 lr: 0.002665 min_lr: 0.002665 loss: 3.9449 (3.9901) class_acc: 0.3320 (0.3244) weight_decay: 0.0500 (0.0500) grad_norm: 2.1871 (2.3120) time: 1.8859 data: 0.0125 max mem: 2905
Epoch: [13] [400/625] eta: 0:07:40 lr: 0.002729 min_lr: 0.002729 loss: 4.0058 (3.9963) class_acc: 0.3125 (0.3242) weight_decay: 0.0500 (0.0500) grad_norm: 2.2123 (2.2978) time: 1.8758 data: 0.0232 max mem: 2905
Epoch: [13] [600/625] eta: 0:00:52 lr: 0.002793 min_lr: 0.002793 loss: 3.9816 (3.9945) class_acc: 0.3125 (0.3250) weight_decay: 0.0500 (0.0500) grad_norm: 2.0018 (2.2742) time: 2.2631 data: 0.0006 max mem: 2905
Epoch: [13] [624/625] eta: 0:00:02 lr: 0.002800 min_lr: 0.002800 loss: 3.9411 (3.9933) class_acc: 0.3320 (0.3251) weight_decay: 0.0500 (0.0500) grad_norm: 1.9408 (2.2684) time: 1.0124 data: 0.0016 max mem: 2905
Epoch: [13] Total time: 0:21:28 (2.0615 s / it)
Averaged stats: lr: 0.002800 min_lr: 0.002800 loss: 3.9411 (3.9959) class_acc: 0.3320 (0.3249) weight_decay: 0.0500 (0.0500) grad_norm: 1.9408 (2.2684)
Test: [ 0/50] eta: 0:11:37 loss: 2.5089 (2.5089) acc1: 45.6000 (45.6000) acc5: 76.8000 (76.8000) time: 13.9592 data: 13.9329 max mem: 2905
Test: [10/50] eta: 0:01:35 loss: 2.5266 (2.5857) acc1: 47.2000 (47.0545) acc5: 70.4000 (70.5455) time: 2.3968 data: 2.3777 max mem: 2905
Test: [20/50] eta: 0:00:54 loss: 2.6529 (2.6943) acc1: 44.0000 (43.2381) acc5: 68.8000 (69.2571) time: 1.2106 data: 1.1920 max mem: 2905
Test: [30/50] eta: 0:00:31 loss: 2.6981 (2.6633) acc1: 40.0000 (43.2774) acc5: 68.8000 (69.6516) time: 1.1508 data: 1.1319 max mem: 2905
Test: [40/50] eta: 0:00:13 loss: 2.7048 (2.6942) acc1: 40.8000 (42.6732) acc5: 68.0000 (68.8195) time: 0.9040 data: 0.8853 max mem: 2905
Test: [49/50] eta: 0:00:01 loss: 2.7685 (2.7166) acc1: 40.8000 (42.3680) acc5: 66.4000 (68.2560) time: 0.8507 data: 0.8325 max mem: 2905
Test: Total time: 0:01:00 (1.2075 s / it)
* Acc@1 42.820 Acc@5 68.040 loss 2.698
Accuracy of the model on the 50000 test images: 42.8%
Max accuracy: 42.95%
Epoch: [14] [ 0/625] eta: 4:22:44 lr: 0.002800 min_lr: 0.002800 loss: 3.6967 (3.6967) class_acc: 0.3594 (0.3594) weight_decay: 0.0500 (0.0500) time: 25.2232 data: 25.0372 max mem: 2905
Epoch: [14] [200/625] eta: 0:15:03 lr: 0.002865 min_lr: 0.002865 loss: 3.8945 (3.9471) class_acc: 0.3359 (0.3323) weight_decay: 0.0500 (0.0500) grad_norm: 1.8518 (2.3200) time: 1.9372 data: 0.0008 max mem: 2905
Epoch: [14] [400/625] eta: 0:07:35 lr: 0.002929 min_lr: 0.002929 loss: 3.9635 (3.9485) class_acc: 0.3164 (0.3321) weight_decay: 0.0500 (0.0500) grad_norm: 2.0855 (2.2405) time: 2.0097 data: 0.0007 max mem: 2905
Epoch: [14] [600/625] eta: 0:00:50 lr: 0.002993 min_lr: 0.002993 loss: 3.9552 (3.9478) class_acc: 0.3242 (0.3329) weight_decay: 0.0500 (0.0500) grad_norm: 1.8658 (2.2335) time: 2.0167 data: 0.0008 max mem: 2905
Epoch: [14] [624/625] eta: 0:00:01 lr: 0.003000 min_lr: 0.003000 loss: 3.8815 (3.9471) class_acc: 0.3359 (0.3331) weight_decay: 0.0500 (0.0500) grad_norm: 2.0965 (2.2444) time: 0.6867 data: 0.0016 max mem: 2905
Epoch: [14] Total time: 0:20:22 (1.9556 s / it)
Averaged stats: lr: 0.003000 min_lr: 0.003000 loss: 3.8815 (3.9474) class_acc: 0.3359 (0.3333) weight_decay: 0.0500 (0.0500) grad_norm: 2.0965 (2.2444)
Test: [ 0/50] eta: 0:09:58 loss: 2.2925 (2.2925) acc1: 44.8000 (44.8000) acc5: 78.4000 (78.4000) time: 11.9628 data: 11.9327 max mem: 2905
Test: [10/50] eta: 0:01:20 loss: 2.3968 (2.4221) acc1: 48.0000 (48.2182) acc5: 72.8000 (72.1455) time: 2.0113 data: 1.9916 max mem: 2905
Test: [20/50] eta: 0:00:45 loss: 2.5434 (2.5883) acc1: 42.4000 (43.9619) acc5: 67.2000 (69.7524) time: 0.9887 data: 0.9698 max mem: 2905
Test: [30/50] eta: 0:00:26 loss: 2.6395 (2.5606) acc1: 40.8000 (43.7677) acc5: 68.8000 (70.0645) time: 0.9109 data: 0.8925 max mem: 2905
Test: [40/50] eta: 0:00:11 loss: 2.6302 (2.5910) acc1: 41.6000 (43.3951) acc5: 69.6000 (69.3268) time: 0.8205 data: 0.8026 max mem: 2905
Test: [49/50] eta: 0:00:01 loss: 2.6377 (2.6034) acc1: 41.6000 (43.1360) acc5: 67.2000 (69.1040) time: 0.6856 data: 0.6677 max mem: 2905
Test: Total time: 0:00:52 (1.0425 s / it)
* Acc@1 44.156 Acc@5 69.362 loss 2.579
Accuracy of the model on the 50000 test images: 44.2%
Max accuracy: 44.16%
Epoch: [15] [ 0/625] eta: 3:41:04 lr: 0.003000 min_lr: 0.003000 loss: 3.8014 (3.8014) class_acc: 0.3711 (0.3711) weight_decay: 0.0500 (0.0500) time: 21.2231 data: 20.8451 max mem: 2905
Epoch: [15] [200/625] eta: 0:14:34 lr: 0.003065 min_lr: 0.003065 loss: 3.9123 (3.8989) class_acc: 0.3398 (0.3444) weight_decay: 0.0500 (0.0500) grad_norm: 1.8742 (2.2813) time: 1.9342 data: 0.0007 max mem: 2905
Epoch: [15] [400/625] eta: 0:07:34 lr: 0.003129 min_lr: 0.003129 loss: 4.0027 (3.9085) class_acc: 0.3203 (0.3407) weight_decay: 0.0500 (0.0500) grad_norm: 2.6344 (2.2771) time: 2.1218 data: 0.0007 max mem: 2905
Epoch: [15] [600/625] eta: 0:00:50 lr: 0.003193 min_lr: 0.003193 loss: 3.8798 (3.9093) class_acc: 0.3477 (0.3401) weight_decay: 0.0500 (0.0500) grad_norm: 2.2548 (2.2944) time: 2.0437 data: 0.0156 max mem: 2905
Epoch: [15] [624/625] eta: 0:00:01 lr: 0.003200 min_lr: 0.003200 loss: 3.9158 (3.9099) class_acc: 0.3359 (0.3400) weight_decay: 0.0500 (0.0500) grad_norm: 1.7865 (2.2740) time: 0.7800 data: 0.0014 max mem: 2905
Epoch: [15] Total time: 0:20:22 (1.9558 s / it)
Averaged stats: lr: 0.003200 min_lr: 0.003200 loss: 3.9158 (3.9047) class_acc: 0.3359 (0.3413) weight_decay: 0.0500 (0.0500) grad_norm: 1.7865 (2.2740)
Test: [ 0/50] eta: 0:10:48 loss: 2.4163 (2.4163) acc1: 52.8000 (52.8000) acc5: 72.8000 (72.8000) time: 12.9715 data: 12.9431 max mem: 2905
Test: [10/50] eta: 0:01:28 loss: 2.4163 (2.3852) acc1: 49.6000 (49.6727) acc5: 73.6000 (72.9455) time: 2.2191 data: 2.1984 max mem: 2905
Test: [20/50] eta: 0:00:52 loss: 2.6175 (2.5718) acc1: 44.0000 (44.9143) acc5: 69.6000 (70.2857) time: 1.1909 data: 1.1703 max mem: 2905
Test: [30/50] eta: 0:00:30 loss: 2.6738 (2.5493) acc1: 42.4000 (44.8516) acc5: 67.2000 (70.0645) time: 1.1545 data: 1.1335 max mem: 2905
Test: [40/50] eta: 0:00:12 loss: 2.5792 (2.5740) acc1: 42.4000 (44.2732) acc5: 68.8000 (69.7756) time: 0.7458 data: 0.7242 max mem: 2905
Test: [49/50] eta: 0:00:01 loss: 2.5827 (2.5912) acc1: 42.4000 (44.1600) acc5: 68.8000 (69.5680) time: 0.6467 data: 0.6245 max mem: 2905
Test: Total time: 0:00:52 (1.0484 s / it)
* Acc@1 44.670 Acc@5 69.832 loss 2.567
Accuracy of the model on the 50000 test images: 44.7%
Max accuracy: 44.67%
Epoch: [16] [ 0/625] eta: 4:06:53 lr: 0.003201 min_lr: 0.003201 loss: 3.8835 (3.8835) class_acc: 0.3281 (0.3281) weight_decay: 0.0500 (0.0500) time: 23.7022 data: 23.5759 max mem: 2905
Epoch: [16] [200/625] eta: 0:14:28 lr: 0.003265 min_lr: 0.003265 loss: 3.8391 (3.8736) class_acc: 0.3516 (0.3472) weight_decay: 0.0500 (0.0500) grad_norm: 2.2771 (2.2100) time: 1.9181 data: 0.0462 max mem: 2905
Epoch: [16] [400/625] eta: 0:07:21 lr: 0.003329 min_lr: 0.003329 loss: 3.8863 (3.8710) class_acc: 0.3477 (0.3481) weight_decay: 0.0500 (0.0500) grad_norm: 2.0969 (2.1265) time: 1.9864 data: 0.1882 max mem: 2905
Epoch: [16] [600/625] eta: 0:00:48 lr: 0.003393 min_lr: 0.003393 loss: 3.8983 (3.8678) class_acc: 0.3477 (0.3492) weight_decay: 0.0500 (0.0500) grad_norm: 2.1502 (2.1549) time: 2.0331 data: 0.0263 max mem: 2905
Epoch: [16] [624/625] eta: 0:00:01 lr: 0.003400 min_lr: 0.003400 loss: 3.8396 (3.8680) class_acc: 0.3516 (0.3491) weight_decay: 0.0500 (0.0500) grad_norm: 1.6281 (2.1470) time: 0.7000 data: 0.0013 max mem: 2905
Epoch: [16] Total time: 0:19:51 (1.9058 s / it)
Averaged stats: lr: 0.003400 min_lr: 0.003400 loss: 3.8396 (3.8673) class_acc: 0.3516 (0.3485) weight_decay: 0.0500 (0.0500) grad_norm: 1.6281 (2.1470)
Test: [ 0/50] eta: 0:10:17 loss: 2.2174 (2.2174) acc1: 55.2000 (55.2000) acc5: 76.0000 (76.0000) time: 12.3581 data: 12.3300 max mem: 2905
Test: [10/50] eta: 0:01:19 loss: 2.3906 (2.3496) acc1: 50.4000 (50.4000) acc5: 75.2000 (73.8182) time: 1.9922 data: 1.9721 max mem: 2905
Test: [20/50] eta: 0:00:45 loss: 2.4548 (2.5181) acc1: 44.8000 (45.7524) acc5: 71.2000 (71.4286) time: 0.9715 data: 0.9526 max mem: 2905
Test: [30/50] eta: 0:00:26 loss: 2.6267 (2.5054) acc1: 40.8000 (45.4968) acc5: 69.6000 (71.3806) time: 0.9471 data: 0.9276 max mem: 2905
Test: [40/50] eta: 0:00:11 loss: 2.5787 (2.5398) acc1: 43.2000 (45.0146) acc5: 70.4000 (70.8488) time: 0.7296 data: 0.7099 max mem: 2905
Test: [49/50] eta: 0:00:00 loss: 2.5787 (2.5577) acc1: 43.2000 (44.6720) acc5: 69.6000 (70.3680) time: 0.6492 data: 0.6307 max mem: 2905
Test: Total time: 0:00:49 (0.9956 s / it)
* Acc@1 46.040 Acc@5 71.156 loss 2.523
Accuracy of the model on the 50000 test images: 46.0%
Max accuracy: 46.04%
Epoch: [17] [ 0/625] eta: 3:13:39 lr: 0.003401 min_lr: 0.003401 loss: 3.8910 (3.8910) class_acc: 0.3594 (0.3594) weight_decay: 0.0500 (0.0500) time: 18.5909 data: 18.4678 max mem: 2905
Epoch: [17] [200/625] eta: 0:14:01 lr: 0.003465 min_lr: 0.003465 loss: 3.8857 (3.8389) class_acc: 0.3477 (0.3538) weight_decay: 0.0500 (0.0500) grad_norm: 1.8196 (2.2447) time: 1.9227 data: 1.6070 max mem: 2905
Epoch: [17] [400/625] eta: 0:07:11 lr: 0.003529 min_lr: 0.003529 loss: 3.7862 (3.8369) class_acc: 0.3555 (0.3550) weight_decay: 0.0500 (0.0500) grad_norm: 1.8312 (inf) time: 1.8386 data: 1.6031 max mem: 2905
Epoch: [17] [600/625] eta: 0:00:47 lr: 0.003593 min_lr: 0.003593 loss: 3.8457 (3.8430) class_acc: 0.3516 (0.3534) weight_decay: 0.0500 (0.0500) grad_norm: 1.5092 (inf) time: 1.8567 data: 1.6188 max mem: 2905
Epoch: [17] [624/625] eta: 0:00:01 lr: 0.003600 min_lr: 0.003600 loss: 3.8486 (3.8432) class_acc: 0.3477 (0.3534) weight_decay: 0.0500 (0.0500) grad_norm: 2.1188 (inf) time: 0.7762 data: 0.6235 max mem: 2905
Epoch: [17] Total time: 0:19:21 (1.8588 s / it)
Averaged stats: lr: 0.003600 min_lr: 0.003600 loss: 3.8486 (3.8353) class_acc: 0.3477 (0.3548) weight_decay: 0.0500 (0.0500) grad_norm: 2.1188 (inf)
Test: [ 0/50] eta: 0:10:14 loss: 2.5408 (2.5408) acc1: 40.8000 (40.8000) acc5: 71.2000 (71.2000) time: 12.2809 data: 12.2577 max mem: 2905
Test: [10/50] eta: 0:01:22 loss: 2.5408 (2.5432) acc1: 46.4000 (45.6727) acc5: 68.8000 (69.8182) time: 2.0601 data: 2.0388 max mem: 2905
Test: [20/50] eta: 0:00:48 loss: 2.7378 (2.7221) acc1: 42.4000 (41.4095) acc5: 67.2000 (67.0476) time: 1.0946 data: 1.0739 max mem: 2905
Test: [30/50] eta: 0:00:28 loss: 2.8656 (2.7270) acc1: 39.2000 (41.2903) acc5: 64.0000 (66.9936) time: 1.0673 data: 1.0479 max mem: 2905
Test: [40/50] eta: 0:00:11 loss: 2.8656 (2.7829) acc1: 39.2000 (40.3902) acc5: 64.0000 (65.8146) time: 0.6944 data: 0.6748 max mem: 2905
Test: [49/50] eta: 0:00:00 loss: 2.8625 (2.7906) acc1: 40.0000 (40.6880) acc5: 64.0000 (65.7600) time: 0.6024 data: 0.5829 max mem: 2905
Test: Total time: 0:00:48 (0.9757 s / it)
* Acc@1 41.648 Acc@5 66.760 loss 2.742
Accuracy of the model on the 50000 test images: 41.6%
Max accuracy: 46.04%
Epoch: [18] [ 0/625] eta: 3:33:38 lr: 0.003601 min_lr: 0.003601 loss: 3.7695 (3.7695) class_acc: 0.3867 (0.3867) weight_decay: 0.0500 (0.0500) time: 20.5104 data: 18.1215 max mem: 2905
Epoch: [18] [200/625] eta: 0:13:40 lr: 0.003665 min_lr: 0.003665 loss: 3.7912 (3.8065) class_acc: 0.3633 (0.3593) weight_decay: 0.0500 (0.0500) grad_norm: 1.4906 (1.9935) time: 1.7403 data: 0.0926 max mem: 2905
Epoch: [18] [400/625] eta: 0:07:02 lr: 0.003729 min_lr: 0.003729 loss: 3.8296 (3.8077) class_acc: 0.3594 (0.3597) weight_decay: 0.0500 (0.0500) grad_norm: 1.6810 (2.0978) time: 1.9347 data: 0.0006 max mem: 2905
Epoch: [18] [600/625] eta: 0:00:47 lr: 0.003793 min_lr: 0.003793 loss: 3.7772 (3.8113) class_acc: 0.3438 (0.3588) weight_decay: 0.0500 (0.0500) grad_norm: 1.7198 (2.0771) time: 2.0009 data: 0.0006 max mem: 2905
Epoch: [18] [624/625] eta: 0:00:01 lr: 0.003800 min_lr: 0.003800 loss: 3.8208 (3.8127) class_acc: 0.3516 (0.3589) weight_decay: 0.0500 (0.0500) grad_norm: 2.2762 (2.1282) time: 0.7129 data: 0.0014 max mem: 2905
Epoch: [18] Total time: 0:19:17 (1.8526 s / it)
Averaged stats: lr: 0.003800 min_lr: 0.003800 loss: 3.8208 (3.8136) class_acc: 0.3516 (0.3592) weight_decay: 0.0500 (0.0500) grad_norm: 2.2762 (2.1282)
Test: [ 0/50] eta: 0:09:54 loss: 2.6247 (2.6247) acc1: 45.6000 (45.6000) acc5: 70.4000 (70.4000) time: 11.8913 data: 11.8576 max mem: 2905
Test: [10/50] eta: 0:01:20 loss: 2.6708 (2.6645) acc1: 44.0000 (43.0545) acc5: 68.8000 (68.4364) time: 2.0214 data: 2.0011 max mem: 2905
Test: [20/50] eta: 0:00:47 loss: 2.8277 (2.7971) acc1: 38.4000 (40.0381) acc5: 66.4000 (66.6667) time: 1.0846 data: 1.0656 max mem: 2905
Test: [30/50] eta: 0:00:27 loss: 2.8516 (2.7834) acc1: 38.4000 (40.4903) acc5: 64.8000 (66.4000) time: 1.0508 data: 1.0310 max mem: 2905
Test: [40/50] eta: 0:00:11 loss: 2.7912 (2.8077) acc1: 40.8000 (39.9220) acc5: 63.2000 (66.0293) time: 0.7072 data: 0.6882 max mem: 2905
Test: [49/50] eta: 0:00:00 loss: 2.8734 (2.8130) acc1: 36.8000 (39.8080) acc5: 65.6000 (66.0320) time: 0.6628 data: 0.6440 max mem: 2905
Test: Total time: 0:00:48 (0.9739 s / it)
* Acc@1 41.306 Acc@5 66.252 loss 2.793
Accuracy of the model on the 50000 test images: 41.3%
Max accuracy: 46.04%
Epoch: [19] [ 0/625] eta: 3:27:37 lr: 0.003801 min_lr: 0.003801 loss: 3.7461 (3.7461) class_acc: 0.3789 (0.3789) weight_decay: 0.0500 (0.0500) time: 19.9313 data: 19.0259 max mem: 2905
Epoch: [19] [200/625] eta: 0:13:18 lr: 0.003865 min_lr: 0.003865 loss: 3.7814 (3.7913) class_acc: 0.3633 (0.3657) weight_decay: 0.0500 (0.0500) grad_norm: 1.7745 (2.0692) time: 1.7525 data: 0.0081 max mem: 2905
Epoch: [19] [400/625] eta: 0:06:59 lr: 0.003929 min_lr: 0.003929 loss: 3.7692 (3.7926) class_acc: 0.3672 (0.3659) weight_decay: 0.0500 (0.0500) grad_norm: 1.5919 (2.0602) time: 1.7423 data: 0.0006 max mem: 2905
Epoch: [19] [600/625] eta: 0:00:46 lr: 0.003993 min_lr: 0.003993 loss: 3.7743 (3.7874) class_acc: 0.3672 (0.3665) weight_decay: 0.0500 (0.0500) grad_norm: 1.8381 (2.0593) time: 1.7971 data: 0.0006 max mem: 2905
Epoch: [19] [624/625] eta: 0:00:01 lr: 0.004000 min_lr: 0.004000 loss: 3.7813 (3.7878) class_acc: 0.3633 (0.3663) weight_decay: 0.0500 (0.0500) grad_norm: 1.7304 (2.0414) time: 0.9912 data: 0.0013 max mem: 2905
Epoch: [19] Total time: 0:19:13 (1.8450 s / it)
Averaged stats: lr: 0.004000 min_lr: 0.004000 loss: 3.7813 (3.7908) class_acc: 0.3633 (0.3636) weight_decay: 0.0500 (0.0500) grad_norm: 1.7304 (2.0414)
Test: [ 0/50] eta: 0:10:33 loss: 2.4847 (2.4847) acc1: 39.2000 (39.2000) acc5: 69.6000 (69.6000) time: 12.6661 data: 12.6366 max mem: 2905
Test: [10/50] eta: 0:01:30 loss: 2.3167 (2.3055) acc1: 51.2000 (48.5818) acc5: 73.6000 (74.7636) time: 2.2625 data: 2.2422 max mem: 2905
Test: [20/50] eta: 0:00:54 loss: 2.5198 (2.4631) acc1: 45.6000 (45.0286) acc5: 72.0000 (72.3048) time: 1.2663 data: 1.2454 max mem: 2905
Test: [30/50] eta: 0:00:29 loss: 2.6153 (2.4630) acc1: 43.2000 (45.3161) acc5: 69.6000 (71.8194) time: 1.0732 data: 1.0521 max mem: 2905
Test: [40/50] eta: 0:00:12 loss: 2.5907 (2.4913) acc1: 44.8000 (45.3659) acc5: 68.0000 (71.2781) time: 0.5919 data: 0.5697 max mem: 2905
Test: [49/50] eta: 0:00:01 loss: 2.6554 (2.5000) acc1: 44.8000 (45.2480) acc5: 68.0000 (70.8800) time: 0.5238 data: 0.5017 max mem: 2905
Test: Total time: 0:00:50 (1.0104 s / it)
* Acc@1 46.238 Acc@5 71.376 loss 2.472
Accuracy of the model on the 50000 test images: 46.2%
Max accuracy: 46.24%
Epoch: [20] [ 0/625] eta: 3:41:23 lr: 0.004000 min_lr: 0.004000 loss: 3.5793 (3.5793) class_acc: 0.3906 (0.3906) weight_decay: 0.0500 (0.0500) time: 21.2543 data: 16.9177 max mem: 2905
Epoch: [20] [200/625] eta: 0:14:01 lr: 0.004000 min_lr: 0.004000 loss: 3.7860 (3.7508) class_acc: 0.3516 (0.3733) weight_decay: 0.0500 (0.0500) grad_norm: 1.7203 (1.9910) time: 1.8292 data: 0.0011 max mem: 2905
Epoch: [20] [400/625] eta: 0:07:15 lr: 0.004000 min_lr: 0.004000 loss: 3.7795 (3.7539) class_acc: 0.3711 (0.3730) weight_decay: 0.0500 (0.0500) grad_norm: 1.7622 (2.0548) time: 1.9608 data: 0.0008 max mem: 2905
Epoch: [20] [600/625] eta: 0:00:48 lr: 0.004000 min_lr: 0.004000 loss: 3.7585 (3.7597) class_acc: 0.3672 (0.3716) weight_decay: 0.0500 (0.0500) grad_norm: 1.5971 (2.0682) time: 1.8815 data: 0.0011 max mem: 2905
Epoch: [20] [624/625] eta: 0:00:01 lr: 0.004000 min_lr: 0.004000 loss: 3.8036 (3.7596) class_acc: 0.3633 (0.3715) weight_decay: 0.0500 (0.0500) grad_norm: 1.7657 (2.0750) time: 0.7967 data: 0.0013 max mem: 2905
Epoch: [20] Total time: 0:19:32 (1.8759 s / it)
Averaged stats: lr: 0.004000 min_lr: 0.004000 loss: 3.8036 (3.7625) class_acc: 0.3633 (0.3694) weight_decay: 0.0500 (0.0500) grad_norm: 1.7657 (2.0750)
Test: [ 0/50] eta: 0:10:29 loss: 3.0842 (3.0842) acc1: 32.8000 (32.8000) acc5: 60.8000 (60.8000) time: 12.5883 data: 12.5592 max mem: 2905
Test: [10/50] eta: 0:01:21 loss: 2.8227 (2.7515) acc1: 44.0000 (41.7455) acc5: 65.6000 (67.2000) time: 2.0425 data: 2.0229 max mem: 2905
Test: [20/50] eta: 0:00:46 loss: 2.8696 (2.9127) acc1: 39.2000 (38.8571) acc5: 64.8000 (65.2571) time: 0.9919 data: 0.9734 max mem: 2905
Test: [30/50] eta: 0:00:27 loss: 3.0171 (2.8860) acc1: 36.8000 (39.7419) acc5: 62.4000 (64.8258) time: 0.9784 data: 0.9600 max mem: 2905
Test: [40/50] eta: 0:00:11 loss: 2.8158 (2.9152) acc1: 36.8000 (38.9073) acc5: 61.6000 (64.4293) time: 0.7555 data: 0.7357 max mem: 2905
Test: [49/50] eta: 0:00:01 loss: 2.8438 (2.9174) acc1: 36.8000 (38.8960) acc5: 64.8000 (64.5600) time: 0.5881 data: 0.5677 max mem: 2905
Test: Total time: 0:00:50 (1.0061 s / it)
* Acc@1 39.712 Acc@5 64.622 loss 2.871
Accuracy of the model on the 50000 test images: 39.7%
Max accuracy: 46.24%
Epoch: [21] [ 0/625] eta: 4:05:07 lr: 0.004000 min_lr: 0.004000 loss: 3.7904 (3.7904) class_acc: 0.3672 (0.3672) weight_decay: 0.0500 (0.0500) time: 23.5315 data: 17.3222 max mem: 2905
Epoch: [21] [200/625] eta: 0:14:13 lr: 0.004000 min_lr: 0.004000 loss: 3.6553 (3.7189) class_acc: 0.3828 (0.3741) weight_decay: 0.0500 (0.0500) grad_norm: 1.8959 (1.9880) time: 2.0153 data: 0.0016 max mem: 2905
Epoch: [21] [400/625] eta: 0:07:19 lr: 0.004000 min_lr: 0.004000 loss: 3.6694 (3.7235) class_acc: 0.3750 (0.3756) weight_decay: 0.0500 (0.0500) grad_norm: 2.2716 (2.1337) time: 1.9323 data: 0.0006 max mem: 2905
Epoch: [21] [600/625] eta: 0:00:48 lr: 0.004000 min_lr: 0.004000 loss: 3.7389 (3.7282) class_acc: 0.3750 (0.3741) weight_decay: 0.0500 (0.0500) grad_norm: 1.7289 (2.0225) time: 1.9361 data: 0.0006 max mem: 2905
Epoch: [21] [624/625] eta: 0:00:01 lr: 0.003999 min_lr: 0.003999 loss: 3.6977 (3.7290) class_acc: 0.3750 (0.3740) weight_decay: 0.0500 (0.0500) grad_norm: 1.8270 (2.0189) time: 0.8300 data: 0.0016 max mem: 2905
Epoch: [21] Total time: 0:19:52 (1.9078 s / it)
Averaged stats: lr: 0.003999 min_lr: 0.003999 loss: 3.6977 (3.7321) class_acc: 0.3750 (0.3755) weight_decay: 0.0500 (0.0500) grad_norm: 1.8270 (2.0189)
Test: [ 0/50] eta: 0:10:33 loss: 2.4033 (2.4033) acc1: 46.4000 (46.4000) acc5: 70.4000 (70.4000) time: 12.6693 data: 12.6438 max mem: 2905
Test: [10/50] eta: 0:01:30 loss: 2.3841 (2.4315) acc1: 47.2000 (47.6364) acc5: 71.2000 (70.9818) time: 2.2617 data: 2.2409 max mem: 2905
Test: [20/50] eta: 0:00:53 loss: 2.5151 (2.5363) acc1: 46.4000 (45.3333) acc5: 71.2000 (70.4381) time: 1.2458 data: 1.2264 max mem: 2905
Test: [30/50] eta: 0:00:29 loss: 2.6047 (2.5368) acc1: 44.0000 (45.2129) acc5: 70.4000 (70.3226) time: 1.0357 data: 1.0170 max mem: 2905
Test: [40/50] eta: 0:00:11 loss: 2.6984 (2.5929) acc1: 43.2000 (44.3317) acc5: 68.0000 (69.2293) time: 0.5806 data: 0.5621 max mem: 2905
Test: [49/50] eta: 0:00:00 loss: 2.5740 (2.5983) acc1: 44.0000 (44.4480) acc5: 68.8000 (69.2640) time: 0.5218 data: 0.5029 max mem: 2905
Test: Total time: 0:00:50 (1.0020 s / it)
* Acc@1 44.660 Acc@5 69.776 loss 2.574
Accuracy of the model on the 50000 test images: 44.7%
Max accuracy: 46.24%
Epoch: [22] [ 0/625] eta: 3:27:10 lr: 0.003999 min_lr: 0.003999 loss: 3.8757 (3.8757) class_acc: 0.3516 (0.3516) weight_decay: 0.0500 (0.0500) time: 19.8884 data: 18.6958 max mem: 2905
Epoch: [22] [200/625] eta: 0:13:47 lr: 0.003999 min_lr: 0.003999 loss: 3.7881 (3.7122) class_acc: 0.3672 (0.3787) weight_decay: 0.0500 (0.0500) grad_norm: 1.7065 (1.9623) time: 1.8335 data: 0.0006 max mem: 2905
Epoch: [22] [400/625] eta: 0:07:11 lr: 0.003999 min_lr: 0.003999 loss: 3.6754 (3.7072) class_acc: 0.3750 (0.3799) weight_decay: 0.0500 (0.0500) grad_norm: 1.7173 (1.9558) time: 1.8211 data: 0.0008 max mem: 2905
Epoch: [22] [600/625] eta: 0:00:47 lr: 0.003999 min_lr: 0.003999 loss: 3.6931 (3.7025) class_acc: 0.3828 (0.3805) weight_decay: 0.0500 (0.0500) grad_norm: 1.7012 (1.9935) time: 1.7528 data: 0.0110 max mem: 2905
Epoch: [22] [624/625] eta: 0:00:01 lr: 0.003999 min_lr: 0.003999 loss: 3.6958 (3.7020) class_acc: 0.3906 (0.3807) weight_decay: 0.0500 (0.0500) grad_norm: 1.8089 (1.9925) time: 0.8893 data: 0.0082 max mem: 2905
Epoch: [22] Total time: 0:19:43 (1.8940 s / it)
Averaged stats: lr: 0.003999 min_lr: 0.003999 loss: 3.6958 (3.7013) class_acc: 0.3906 (0.3812) weight_decay: 0.0500 (0.0500) grad_norm: 1.8089 (1.9925)
Test: [ 0/50] eta: 0:09:19 loss: 2.1642 (2.1642) acc1: 53.6000 (53.6000) acc5: 73.6000 (73.6000) time: 11.1931 data: 11.1673 max mem: 2905
Test: [10/50] eta: 0:01:08 loss: 2.2309 (2.1733) acc1: 53.6000 (53.8182) acc5: 75.2000 (76.0000) time: 1.7047 data: 1.6841 max mem: 2905
Test: [20/50] eta: 0:00:41 loss: 2.2977 (2.3197) acc1: 49.6000 (49.9048) acc5: 72.8000 (74.2857) time: 0.8950 data: 0.8757 max mem: 2905
Test: [30/50] eta: 0:00:26 loss: 2.4963 (2.3411) acc1: 45.6000 (49.4452) acc5: 72.0000 (73.4710) time: 1.0874 data: 1.0689 max mem: 2905
Test: [40/50] eta: 0:00:11 loss: 2.4343 (2.3794) acc1: 47.2000 (48.4293) acc5: 71.2000 (72.8390) time: 0.9650 data: 0.9471 max mem: 2905
Test: [49/50] eta: 0:00:01 loss: 2.3533 (2.3700) acc1: 48.0000 (48.4800) acc5: 72.0000 (73.0880) time: 0.5751 data: 0.5571 max mem: 2905
Test: Total time: 0:00:52 (1.0429 s / it)
* Acc@1 48.866 Acc@5 73.468 loss 2.344
Accuracy of the model on the 50000 test images: 48.9%
Max accuracy: 48.87%
Epoch: [23] [ 0/625] eta: 3:39:53 lr: 0.003999 min_lr: 0.003999 loss: 3.5973 (3.5973) class_acc: 0.4336 (0.4336) weight_decay: 0.0500 (0.0500) time: 21.1095 data: 20.3723 max mem: 2905
Epoch: [23] [200/625] eta: 0:14:14 lr: 0.003999 min_lr: 0.003999 loss: 3.6491 (3.6711) class_acc: 0.3789 (0.3890) weight_decay: 0.0500 (0.0500) grad_norm: 1.9880 (2.2326) time: 1.9598 data: 0.6415 max mem: 2905
Epoch: [23] [400/625] eta: 0:07:16 lr: 0.003998 min_lr: 0.003998 loss: 3.7368 (3.6840) class_acc: 0.3750 (0.3860) weight_decay: 0.0500 (0.0500) grad_norm: 1.6813 (2.1713) time: 1.8327 data: 0.0009 max mem: 2905
Epoch: [23] [600/625] eta: 0:00:48 lr: 0.003998 min_lr: 0.003998 loss: 3.6761 (3.6815) class_acc: 0.3789 (0.3858) weight_decay: 0.0500 (0.0500) grad_norm: 1.5312 (inf) time: 1.8845 data: 0.0417 max mem: 2905
Epoch: [23] [624/625] eta: 0:00:01 lr: 0.003998 min_lr: 0.003998 loss: 3.6482 (3.6813) class_acc: 0.3789 (0.3858) weight_decay: 0.0500 (0.0500) grad_norm: 1.7138 (inf) time: 0.8022 data: 0.0152 max mem: 2905
Epoch: [23] Total time: 0:19:34 (1.8799 s / it)
Averaged stats: lr: 0.003998 min_lr: 0.003998 loss: 3.6482 (3.6764) class_acc: 0.3789 (0.3861) weight_decay: 0.0500 (0.0500) grad_norm: 1.7138 (inf)
Test: [ 0/50] eta: 0:10:17 loss: 2.3520 (2.3520) acc1: 49.6000 (49.6000) acc5: 72.8000 (72.8000) time: 12.3509 data: 12.3117 max mem: 2905
Test: [10/50] eta: 0:01:09 loss: 2.3520 (2.3535) acc1: 48.8000 (48.9455) acc5: 72.0000 (73.0909) time: 1.7484 data: 1.7276 max mem: 2905
Test: [20/50] eta: 0:00:38 loss: 2.5317 (2.5242) acc1: 44.0000 (45.4476) acc5: 71.2000 (70.1714) time: 0.7383 data: 0.7184 max mem: 2905
Test: [30/50] eta: 0:00:24 loss: 2.6113 (2.5221) acc1: 43.2000 (45.2903) acc5: 68.0000 (70.1419) time: 0.9622 data: 0.9422 max mem: 2905
Test: [40/50] eta: 0:00:10 loss: 2.6022 (2.5431) acc1: 43.2000 (44.9561) acc5: 68.0000 (70.0488) time: 0.8729 data: 0.8520 max mem: 2905
Test: [49/50] eta: 0:00:00 loss: 2.6396 (2.5664) acc1: 43.2000 (44.6560) acc5: 68.8000 (69.9200) time: 0.4563 data: 0.4363 max mem: 2905
Test: Total time: 0:00:47 (0.9547 s / it)
* Acc@1 45.160 Acc@5 70.234 loss 2.548
Accuracy of the model on the 50000 test images: 45.2%
Max accuracy: 48.87%
Epoch: [24] [ 0/625] eta: 4:03:15 lr: 0.003998 min_lr: 0.003998 loss: 3.6114 (3.6114) class_acc: 0.4141 (0.4141) weight_decay: 0.0500 (0.0500) time: 23.3534 data: 19.9394 max mem: 2905
Epoch: [24] [200/625] eta: 0:14:07 lr: 0.003998 min_lr: 0.003998 loss: 3.6764 (3.6450) class_acc: 0.3828 (0.3905) weight_decay: 0.0500 (0.0500) grad_norm: 1.4303 (2.1383) time: 1.8211 data: 0.0007 max mem: 2905
Epoch: [24] [400/625] eta: 0:07:12 lr: 0.003997 min_lr: 0.003997 loss: 3.6918 (3.6442) class_acc: 0.3789 (0.3912) weight_decay: 0.0500 (0.0500) grad_norm: 1.6180 (2.1398) time: 1.8983 data: 0.0186 max mem: 2905
Epoch: [24] [600/625] eta: 0:00:48 lr: 0.003997 min_lr: 0.003997 loss: 3.5939 (3.6418) class_acc: 0.3828 (0.3918) weight_decay: 0.0500 (0.0500) grad_norm: 1.8641 (2.1125) time: 2.1217 data: 0.0462 max mem: 2905
Epoch: [24] [624/625] eta: 0:00:01 lr: 0.003997 min_lr: 0.003997 loss: 3.6427 (3.6419) class_acc: 0.3906 (0.3918) weight_decay: 0.0500 (0.0500) grad_norm: 1.6812 (2.1027) time: 0.6184 data: 0.0013 max mem: 2905
Epoch: [24] Total time: 0:20:08 (1.9335 s / it)
Averaged stats: lr: 0.003997 min_lr: 0.003997 loss: 3.6427 (3.6528) class_acc: 0.3906 (0.3904) weight_decay: 0.0500 (0.0500) grad_norm: 1.6812 (2.1027)
Test: [ 0/50] eta: 0:11:01 loss: 2.1656 (2.1656) acc1: 52.8000 (52.8000) acc5: 79.2000 (79.2000) time: 13.2280 data: 13.1921 max mem: 2905
Test: [10/50] eta: 0:01:27 loss: 2.1610 (2.1598) acc1: 54.4000 (54.2545) acc5: 78.4000 (76.4364) time: 2.1764 data: 2.1563 max mem: 2905
Test: [20/50] eta: 0:00:51 loss: 2.3438 (2.3316) acc1: 50.4000 (49.6381) acc5: 73.6000 (74.5143) time: 1.1365 data: 1.1169 max mem: 2905
Test: [30/50] eta: 0:00:30 loss: 2.4677 (2.3443) acc1: 44.8000 (48.9290) acc5: 72.8000 (74.0645) time: 1.1834 data: 1.1620 max mem: 2905
Test: [40/50] eta: 0:00:13 loss: 2.3905 (2.3670) acc1: 46.4000 (48.8976) acc5: 72.0000 (73.4634) time: 0.9036 data: 0.8819 max mem: 2905
Test: [49/50] eta: 0:00:01 loss: 2.3807 (2.3688) acc1: 48.8000 (49.2160) acc5: 72.0000 (73.4560) time: 0.8287 data: 0.8077 max mem: 2905
Test: Total time: 0:00:54 (1.0976 s / it)
* Acc@1 48.894 Acc@5 73.396 loss 2.367
Accuracy of the model on the 50000 test images: 48.9%
Max accuracy: 48.89%
Epoch: [25] [ 0/625] eta: 4:01:03 lr: 0.003997 min_lr: 0.003997 loss: 3.5686 (3.5686) class_acc: 0.3945 (0.3945) weight_decay: 0.0500 (0.0500) time: 23.1420 data: 22.6885 max mem: 2905
Epoch: [25] [200/625] eta: 0:14:33 lr: 0.003996 min_lr: 0.003996 loss: 3.6051 (3.6239) class_acc: 0.4023 (0.3945) weight_decay: 0.0500 (0.0500) grad_norm: 1.9610 (2.0712) time: 1.9828 data: 0.0008 max mem: 2905
Epoch: [25] [400/625] eta: 0:07:28 lr: 0.003996 min_lr: 0.003996 loss: 3.6724 (3.6338) class_acc: 0.3750 (0.3932) weight_decay: 0.0500 (0.0500) grad_norm: 2.1317 (2.0402) time: 1.9872 data: 0.0009 max mem: 2905
Epoch: [25] [600/625] eta: 0:00:49 lr: 0.003996 min_lr: 0.003996 loss: 3.6125 (3.6347) class_acc: 0.3711 (0.3930) weight_decay: 0.0500 (0.0500) grad_norm: 2.1590 (2.0136) time: 1.9020 data: 0.0012 max mem: 2905
Epoch: [25] [624/625] eta: 0:00:01 lr: 0.003995 min_lr: 0.003995 loss: 3.6628 (3.6352) class_acc: 0.3867 (0.3928) weight_decay: 0.0500 (0.0500) grad_norm: 1.7407 (2.0130) time: 0.7558 data: 0.0026 max mem: 2905
Epoch: [25] Total time: 0:20:02 (1.9239 s / it)
Averaged stats: lr: 0.003995 min_lr: 0.003995 loss: 3.6628 (3.6307) class_acc: 0.3867 (0.3951) weight_decay: 0.0500 (0.0500) grad_norm: 1.7407 (2.0130)
Test: [ 0/50] eta: 0:09:56 loss: 2.2252 (2.2252) acc1: 52.0000 (52.0000) acc5: 78.4000 (78.4000) time: 11.9393 data: 11.9016 max mem: 2905
Test: [10/50] eta: 0:01:23 loss: 2.3495 (2.3328) acc1: 49.6000 (50.1091) acc5: 73.6000 (73.4545) time: 2.0886 data: 2.0668 max mem: 2905
Test: [20/50] eta: 0:00:50 loss: 2.4747 (2.4821) acc1: 47.2000 (46.3238) acc5: 71.2000 (72.1143) time: 1.1564 data: 1.1366 max mem: 2905
Test: [30/50] eta: 0:00:28 loss: 2.5827 (2.4736) acc1: 43.2000 (46.0645) acc5: 69.6000 (72.1032) time: 1.0338 data: 1.0141 max mem: 2905
Test: [40/50] eta: 0:00:11 loss: 2.5845 (2.4907) acc1: 44.8000 (46.1463) acc5: 68.0000 (71.5902) time: 0.5820 data: 0.5610 max mem: 2905
Test: [49/50] eta: 0:00:00 loss: 2.4665 (2.4988) acc1: 45.6000 (46.0480) acc5: 71.2000 (71.3280) time: 0.4792 data: 0.4575 max mem: 2905
Test: Total time: 0:00:47 (0.9536 s / it)
* Acc@1 46.482 Acc@5 71.380 loss 2.475
Accuracy of the model on the 50000 test images: 46.5%
Max accuracy: 48.89%
Epoch: [26] [ 0/625] eta: 3:26:09 lr: 0.003995 min_lr: 0.003995 loss: 3.7965 (3.7965) class_acc: 0.3867 (0.3867) weight_decay: 0.0500 (0.0500) time: 19.7910 data: 18.9403 max mem: 2905
Epoch: [26] [200/625] eta: 0:13:49 lr: 0.003995 min_lr: 0.003995 loss: 3.6161 (3.5845) class_acc: 0.4062 (0.4035) weight_decay: 0.0500 (0.0500) grad_norm: 1.5103 (2.1486) time: 2.0134 data: 0.7893 max mem: 2905
Epoch: [26] [400/625] eta: 0:07:11 lr: 0.003994 min_lr: 0.003994 loss: 3.5623 (3.5993) class_acc: 0.4023 (0.4004) weight_decay: 0.0500 (0.0500) grad_norm: 1.4205 (2.0950) time: 1.9136 data: 0.5641 max mem: 2905
Epoch: [26] [600/625] eta: 0:00:48 lr: 0.003994 min_lr: 0.003994 loss: 3.5630 (3.6026) class_acc: 0.3984 (0.3999) weight_decay: 0.0500 (0.0500) grad_norm: 1.7351 (2.1099) time: 1.8916 data: 0.0527 max mem: 2905
Epoch: [26] [624/625] eta: 0:00:01 lr: 0.003994 min_lr: 0.003994 loss: 3.5775 (3.6028) class_acc: 0.3984 (0.3997) weight_decay: 0.0500 (0.0500) grad_norm: 1.6100 (2.0984) time: 0.6861 data: 0.0291 max mem: 2905
Epoch: [26] Total time: 0:19:50 (1.9047 s / it)
Averaged stats: lr: 0.003994 min_lr: 0.003994 loss: 3.5775 (3.6107) class_acc: 0.3984 (0.3985) weight_decay: 0.0500 (0.0500) grad_norm: 1.6100 (2.0984)
Test: [ 0/50] eta: 0:10:45 loss: 2.3278 (2.3278) acc1: 48.8000 (48.8000) acc5: 74.4000 (74.4000) time: 12.9197 data: 12.8940 max mem: 2905
Test: [10/50] eta: 0:01:25 loss: 2.3480 (2.3127) acc1: 48.8000 (49.0182) acc5: 74.4000 (74.4000) time: 2.1292 data: 2.1102 max mem: 2905
Test: [20/50] eta: 0:00:49 loss: 2.4423 (2.4436) acc1: 46.4000 (46.5524) acc5: 71.2000 (72.4952) time: 1.0935 data: 1.0744 max mem: 2905
Test: [30/50] eta: 0:00:29 loss: 2.5074 (2.4321) acc1: 45.6000 (46.9936) acc5: 70.4000 (72.2839) time: 1.1002 data: 1.0800 max mem: 2905
Test: [40/50] eta: 0:00:12 loss: 2.4564 (2.4516) acc1: 45.6000 (46.9268) acc5: 68.8000 (71.8439) time: 0.7694 data: 0.7475 max mem: 2905
Test: [49/50] eta: 0:00:01 loss: 2.5118 (2.4673) acc1: 44.8000 (46.4320) acc5: 68.0000 (71.2640) time: 0.7267 data: 0.7061 max mem: 2905
Test: Total time: 0:00:50 (1.0174 s / it)
* Acc@1 47.270 Acc@5 72.132 loss 2.443
Accuracy of the model on the 50000 test images: 47.3%
Max accuracy: 48.89%
Epoch: [27] [ 0/625] eta: 3:49:47 lr: 0.003994 min_lr: 0.003994 loss: 3.5339 (3.5339) class_acc: 0.4141 (0.4141) weight_decay: 0.0500 (0.0500) time: 22.0595 data: 15.4624 max mem: 2905
Epoch: [27] [200/625] eta: 0:14:15 lr: 0.003993 min_lr: 0.003993 loss: 3.5980 (3.5782) class_acc: 0.3984 (0.4067) weight_decay: 0.0500 (0.0500) grad_norm: 2.2039 (inf) time: 1.8993 data: 0.0070 max mem: 2905
Epoch: [27] [400/625] eta: 0:07:14 lr: 0.003993 min_lr: 0.003993 loss: 3.6139 (3.5753) class_acc: 0.3867 (0.4067) weight_decay: 0.0500 (0.0500) grad_norm: 1.8785 (inf) time: 1.8946 data: 0.0008 max mem: 2905
Epoch: [27] [600/625] eta: 0:00:47 lr: 0.003992 min_lr: 0.003992 loss: 3.6045 (3.5791) class_acc: 0.3984 (0.4064) weight_decay: 0.0500 (0.0500) grad_norm: 1.4802 (inf) time: 2.0196 data: 0.0316 max mem: 2905
Epoch: [27] [624/625] eta: 0:00:01 lr: 0.003992 min_lr: 0.003992 loss: 3.5842 (3.5794) class_acc: 0.3867 (0.4060) weight_decay: 0.0500 (0.0500) grad_norm: 1.4526 (inf) time: 0.3925 data: 0.0258 max mem: 2905
Epoch: [27] Total time: 0:19:41 (1.8902 s / it)
Averaged stats: lr: 0.003992 min_lr: 0.003992 loss: 3.5842 (3.5927) class_acc: 0.3867 (0.4023) weight_decay: 0.0500 (0.0500) grad_norm: 1.4526 (inf)
Test: [ 0/50] eta: 0:10:33 loss: 2.0578 (2.0578) acc1: 54.4000 (54.4000) acc5: 79.2000 (79.2000) time: 12.6712 data: 12.6441 max mem: 2905
Test: [10/50] eta: 0:01:26 loss: 2.1337 (2.1419) acc1: 54.4000 (53.8182) acc5: 76.0000 (76.3636) time: 2.1702 data: 2.1514 max mem: 2905
Test: [20/50] eta: 0:00:52 loss: 2.2473 (2.2373) acc1: 51.2000 (51.1238) acc5: 76.0000 (75.6191) time: 1.1922 data: 1.1741 max mem: 2905
Test: [30/50] eta: 0:00:30 loss: 2.2980 (2.2494) acc1: 48.0000 (50.6065) acc5: 74.4000 (75.2000) time: 1.1977 data: 1.1790 max mem: 2905
Test: [40/50] eta: 0:00:12 loss: 2.2357 (2.2689) acc1: 47.2000 (50.3415) acc5: 72.8000 (74.7707) time: 0.7709 data: 0.7522 max mem: 2905
Test: [49/50] eta: 0:00:01 loss: 2.2537 (2.2768) acc1: 51.2000 (50.1760) acc5: 73.6000 (74.5120) time: 0.7491 data: 0.7296 max mem: 2905
Test: Total time: 0:00:52 (1.0513 s / it)
* Acc@1 50.538 Acc@5 75.020 loss 2.257
Accuracy of the model on the 50000 test images: 50.5%
Max accuracy: 50.54%
Epoch: [28] [ 0/625] eta: 3:31:58 lr: 0.003992 min_lr: 0.003992 loss: 3.6734 (3.6734) class_acc: 0.3750 (0.3750) weight_decay: 0.0500 (0.0500) time: 20.3503 data: 20.0512 max mem: 2905
Epoch: [28] [200/625] eta: 0:14:28 lr: 0.003991 min_lr: 0.003991 loss: 3.6027 (3.5692) class_acc: 0.3906 (0.4049) weight_decay: 0.0500 (0.0500) grad_norm: 1.6694 (2.1667) time: 1.9741 data: 0.4945 max mem: 2905
Epoch: [28] [400/625] eta: 0:07:31 lr: 0.003991 min_lr: 0.003991 loss: 3.5839 (3.5724) class_acc: 0.4141 (0.4070) weight_decay: 0.0500 (0.0500) grad_norm: 1.7739 (2.0792) time: 2.1572 data: 0.0009 max mem: 2905
Epoch: [28] [600/625] eta: 0:00:49 lr: 0.003990 min_lr: 0.003990 loss: 3.5119 (3.5714) class_acc: 0.4219 (0.4068) weight_decay: 0.0500 (0.0500) grad_norm: 1.7313 (2.2045) time: 1.8744 data: 0.0008 max mem: 2905
Epoch: [28] [624/625] eta: 0:00:01 lr: 0.003990 min_lr: 0.003990 loss: 3.5770 (3.5711) class_acc: 0.4102 (0.4070) weight_decay: 0.0500 (0.0500) grad_norm: 1.5188 (2.1853) time: 0.8201 data: 0.0022 max mem: 2905
Epoch: [28] Total time: 0:20:14 (1.9439 s / it)
Averaged stats: lr: 0.003990 min_lr: 0.003990 loss: 3.5770 (3.5763) class_acc: 0.4102 (0.4058) weight_decay: 0.0500 (0.0500) grad_norm: 1.5188 (2.1853)
Test: [ 0/50] eta: 0:10:08 loss: 2.6075 (2.6075) acc1: 41.6000 (41.6000) acc5: 72.0000 (72.0000) time: 12.1648 data: 12.1361 max mem: 2905
Test: [10/50] eta: 0:01:18 loss: 2.3099 (2.2855) acc1: 52.0000 (51.3455) acc5: 73.6000 (74.2545) time: 1.9731 data: 1.9537 max mem: 2905
Test: [20/50] eta: 0:00:45 loss: 2.4136 (2.4097) acc1: 51.2000 (48.9905) acc5: 72.0000 (72.5714) time: 0.9957 data: 0.9773 max mem: 2905
Test: [30/50] eta: 0:00:26 loss: 2.4623 (2.4255) acc1: 45.6000 (47.8968) acc5: 71.2000 (72.2323) time: 0.9975 data: 0.9782 max mem: 2905
Test: [40/50] eta: 0:00:11 loss: 2.4623 (2.4689) acc1: 44.0000 (46.8683) acc5: 69.6000 (71.4537) time: 0.7210 data: 0.7008 max mem: 2905
Test: [49/50] eta: 0:00:00 loss: 2.5017 (2.4676) acc1: 44.0000 (46.8640) acc5: 72.0000 (71.5840) time: 0.6412 data: 0.6217 max mem: 2905
Test: Total time: 0:00:48 (0.9798 s / it)
* Acc@1 47.210 Acc@5 72.040 loss 2.436
Accuracy of the model on the 50000 test images: 47.2%
Max accuracy: 50.54%
Epoch: [29] [ 0/625] eta: 3:32:55 lr: 0.003990 min_lr: 0.003990 loss: 3.7204 (3.7204) class_acc: 0.3633 (0.3633) weight_decay: 0.0500 (0.0500) time: 20.4412 data: 17.6682 max mem: 2905
Epoch: [29] [200/625] eta: 0:13:53 lr: 0.003989 min_lr: 0.003989 loss: 3.6205 (3.5465) class_acc: 0.4062 (0.4141) weight_decay: 0.0500 (0.0500) grad_norm: 1.8717 (2.3388) time: 1.8569 data: 0.0008 max mem: 2905
Epoch: [29] [400/625] eta: 0:07:16 lr: 0.003988 min_lr: 0.003988 loss: 3.6023 (3.5596) class_acc: 0.4102 (0.4110) weight_decay: 0.0500 (0.0500) grad_norm: 1.2989 (2.1954) time: 1.9144 data: 0.0195 max mem: 2905
Epoch: [29] [600/625] eta: 0:00:49 lr: 0.003988 min_lr: 0.003988 loss: 3.5655 (3.5613) class_acc: 0.3945 (0.4096) weight_decay: 0.0500 (0.0500) grad_norm: 1.4800 (2.1547) time: 2.0340 data: 0.0050 max mem: 2905
Epoch: [29] [624/625] eta: 0:00:01 lr: 0.003987 min_lr: 0.003987 loss: 3.5300 (3.5614) class_acc: 0.4102 (0.4095) weight_decay: 0.0500 (0.0500) grad_norm: 1.5709 (2.1348) time: 0.4527 data: 0.0014 max mem: 2905