代码拉取完成,页面将自动刷新
Following papers are implemented using PyTorch.
pip install -r requirements.txt
python train.py --config configs/cifar/resnet_preact.yaml
Model | Test Error (median of 3 runs) | Test Error (in paper) | Training Time |
---|---|---|---|
VGG-like (depth 15, w/ BN, channel 64) | 7.29 | N/A | 1h20m |
ResNet-110 | 6.52 | 6.43 (best), 6.61 +/- 0.16 | 3h06m |
ResNet-preact-110 | 6.47 | 6.37 (median of 5 runs) | 3h05m |
ResNet-preact-164 bottleneck | 5.90 | 5.46 (median of 5 runs) | 4h01m |
ResNet-preact-1001 bottleneck | 4.62 (median of 5 runs), 4.69 +/- 0.20 | ||
WRN-28-10 | 4.03 | 4.00 (median of 5 runs) | 16h10m |
WRN-28-10 w/ dropout | 3.89 (median of 5 runs) | ||
DenseNet-100 (k=12) | 3.87 (1 run) | 4.10 (1 run) | 24h28m* |
DenseNet-100 (k=24) | 3.74 (1 run) | ||
DenseNet-BC-100 (k=12) | 4.69 | 4.51 (1 run) | 15h20m |
DenseNet-BC-250 (k=24) | 3.62 (1 run) | ||
DenseNet-BC-190 (k=40) | 3.46 (1 run) | ||
PyramidNet-110 (alpha=84) | 4.40 | 4.26 +/- 0.23 | 11h40m |
PyramidNet-110 (alpha=270) | 3.92 (1 run) | 3.73 +/- 0.04 | 24h12m* |
PyramidNet-164 bottleneck (alpha=270) | 3.44 (1 run) | 3.48 +/- 0.20 | 32h37m* |
PyramidNet-272 bottleneck (alpha=200) | 3.31 +/- 0.08 | ||
ResNeXt-29 4x64d | 3.89 | ~3.75 (from Figure 7) | 31h17m |
ResNeXt-29 8x64d | 3.97 (1 run) | 3.65 (average of 10 runs) | 42h50m* |
ResNeXt-29 16x64d | 3.58 (average of 10 runs) | ||
shake-shake-26 2x32d (S-S-I) | 3.68 | 3.55 (average of 3 runs) | 33h49m |
shake-shake-26 2x64d (S-S-I) | 2.88 (1 run) | 2.98 (average of 3 runs) | 78h48m |
shake-shake-26 2x96d (S-S-I) | 2.90 (1 run) | 2.86 (average of 5 runs) | 101h32m* |
python train.py --config configs/cifar/vgg.yaml
python train.py --config configs/cifar/resnet.yaml
python train.py --config configs/cifar/resnet_preact.yaml \
train.output_dir experiments/resnet_preact_basic_110/exp00
python train.py --config configs/cifar/resnet_preact.yaml \
model.resnet_preact.depth 164 \
model.resnet_preact.block_type bottleneck \
train.output_dir experiments/resnet_preact_bottleneck_164/exp00
python train.py --config configs/cifar/wrn.yaml
python train.py --config configs/cifar/densenet.yaml
python train.py --config configs/cifar/pyramidnet.yaml \
model.pyramidnet.depth 110 \
model.pyramidnet.block_type basic \
model.pyramidnet.alpha 84 \
train.output_dir experiments/pyramidnet_basic_110_84/exp00
python train.py --config configs/cifar/pyramidnet.yaml \
model.pyramidnet.depth 110 \
model.pyramidnet.block_type basic \
model.pyramidnet.alpha 270 \
train.output_dir experiments/pyramidnet_basic_110_270/exp00
python train.py --config configs/cifar/resnext.yaml \
model.resnext.cardinality 4 \
train.batch_size 32 \
train.base_lr 0.025 \
train.output_dir experiments/resnext_29_4x64d/exp00
python train.py --config configs/cifar/resnext.yaml \
train.batch_size 64 \
train.base_lr 0.05 \
train.output_dir experiments/resnext_29_8x64d/exp00
python train.py --config configs/cifar/shake_shake.yaml \
model.shake_shake.initial_channels 32 \
train.output_dir experiments/shake_shake_26_2x32d_SSI/exp00
python train.py --config configs/cifar/shake_shake.yaml \
model.shake_shake.initial_channels 64 \
train.batch_size 64 \
train.base_lr 0.1 \
train.output_dir experiments/shake_shake_26_2x64d_SSI/exp00
python train.py --config configs/cifar/shake_shake.yaml \
model.shake_shake.initial_channels 96 \
train.batch_size 64 \
train.base_lr 0.1 \
train.output_dir experiments/shake_shake_26_2x96d_SSI/exp00
Model | Test Error (1 run) | # of Epochs | Training Time |
---|---|---|---|
ResNet-preact-20, widening factor 4 | 4.91 | 200 | 1h26m |
ResNet-preact-20, widening factor 4 | 4.01 | 400 | 2h53m |
ResNet-preact-20, widening factor 4 | 3.99 | 1800 | 12h53m |
ResNet-preact-20, widening factor 4, Cutout 16 | 3.71 | 200 | 1h26m |
ResNet-preact-20, widening factor 4, Cutout 16 | 3.46 | 400 | 2h53m |
ResNet-preact-20, widening factor 4, Cutout 16 | 3.76 | 1800 | 12h53m |
ResNet-preact-20, widening factor 4, RICAP (beta=0.3) | 3.45 | 200 | 1h26m |
ResNet-preact-20, widening factor 4, RICAP (beta=0.3) | 3.11 | 400 | 2h53m |
ResNet-preact-20, widening factor 4, RICAP (beta=0.3) | 3.15 | 1800 | 12h53m |
Model | Test Error (1 run) | # of Epochs | Training Time |
---|---|---|---|
WRN-28-10, Cutout 16 | 3.19 | 200 | 6h35m |
WRN-28-10, mixup (alpha=1) | 3.32 | 200 | 6h35m |
WRN-28-10, RICAP (beta=0.3) | 2.83 | 200 | 6h35m |
WRN-28-10, Dual-Cutout (alpha=0.1) | 2.87 | 200 | 12h42m |
WRN-28-10, Cutout 16 | 3.07 | 400 | 13h10m |
WRN-28-10, mixup (alpha=1) | 3.04 | 400 | 13h08m |
WRN-28-10, RICAP (beta=0.3) | 2.71 | 400 | 13h08m |
WRN-28-10, Dual-Cutout (alpha=0.1) | 2.76 | 400 | 25h20m |
shake-shake-26 2x64d, Cutout 16 | 2.64 | 1800 | 78h55m* |
shake-shake-26 2x64d, mixup (alpha=1) | 2.63 | 1800 | 35h56m |
shake-shake-26 2x64d, RICAP (beta=0.3) | 2.29 | 1800 | 35h10m |
shake-shake-26 2x64d, Dual-Cutout (alpha=0.1) | 2.64 | 1800 | 68h34m |
shake-shake-26 2x96d, Cutout 16 | 2.50 | 1800 | 60h20m |
shake-shake-26 2x96d, mixup (alpha=1) | 2.36 | 1800 | 60h20m |
shake-shake-26 2x96d, RICAP (beta=0.3) | 2.10 | 1800 | 60h20m |
shake-shake-26 2x96d, Dual-Cutout (alpha=0.1) | 2.41 | 1800 | 113h09m |
shake-shake-26 2x128d, Cutout 16 | 2.58 | 1800 | 85h04m |
shake-shake-26 2x128d, RICAP (beta=0.3) | 1.97 | 1800 | 85h06m |
python train.py --config configs/cifar/wrn.yaml \
train.batch_size 64 \
train.output_dir experiments/wrn_28_10_cutout16 \
scheduler.type cosine \
augmentation.use_cutout True
python train.py --config configs/cifar/shake_shake.yaml \
model.shake_shake.initial_channels 64 \
train.batch_size 64 \
train.base_lr 0.1 \
scheduler.epochs 300 \
train.output_dir experiments/shake_shake_26_2x64d_SSI_cutout16/exp00 \
augmentation.use_cutout True
Model | batch size | #GPUs | Test Error (1 run) | # of Epochs | Training Time* |
---|---|---|---|---|---|
WRN-28-10, RICAP (beta=0.3) | 512 | 1 | 2.63 | 200 | 3h41m |
WRN-28-10, RICAP (beta=0.3) | 256 | 2 | 2.71 | 200 | 2h14m |
WRN-28-10, RICAP (beta=0.3) | 128 | 4 | 2.89 | 200 | 1h01m |
WRN-28-10, RICAP (beta=0.3) | 64 | 8 | 2.75 | 200 | 34m |
python train.py --config configs/cifar/wrn.yaml \
train.base_lr 0.2 \
train.batch_size 512 \
scheduler.epochs 200 \
scheduler.type cosine \
train.output_dir experiments/wrn_28_10_ricap_1gpu/exp00 \
augmentation.use_ricap True \
augmentation.use_random_crop False
python -m torch.distributed.launch --nproc_per_node 2 \
train.py --config configs/cifar/wrn.yaml \
train.distributed True \
train.base_lr 0.2 \
train.batch_size 256 \
scheduler.epochs 200 \
scheduler.type cosine \
train.output_dir experiments/wrn_28_10_ricap_2gpus/exp00 \
augmentation.use_ricap True \
augmentation.use_random_crop False
python -m torch.distributed.launch --nproc_per_node 4 \
train.py --config configs/cifar/wrn.yaml \
train.distributed True \
train.base_lr 0.2 \
train.batch_size 128 \
scheduler.epochs 200 \
scheduler.type cosine \
train.output_dir experiments/wrn_28_10_ricap_4gpus/exp00 \
augmentation.use_ricap True \
augmentation.use_random_crop False
python -m torch.distributed.launch --nproc_per_node 8 \
train.py --config configs/cifar/wrn.yaml \
train.distributed True \
train.base_lr 0.2 \
train.batch_size 64 \
scheduler.epochs 200 \
scheduler.type cosine \
train.output_dir experiments/wrn_28_10_ricap_8gpus/exp00 \
augmentation.use_ricap True \
augmentation.use_random_crop False
Model | Test Error (1 run) | # of Epochs | Training Time |
---|---|---|---|
ResNet-preact-20, widening factor 4, Cutout 12 | 4.17 | 200 | 1h32m |
ResNet-preact-20, widening factor 4, Cutout 14 | 4.11 | 200 | 1h32m |
ResNet-preact-50, Cutout 12 | 4.45 | 200 | 57m |
ResNet-preact-50, Cutout 14 | 4.38 | 200 | 57m |
ResNet-preact-50, widening factor 4,Cutout 12 | 4.07 | 200 | 3h37m |
ResNet-preact-50, widening factor 4,Cutout 14 | 4.13 | 200 | 3h39m |
shake-shake-26 2x32d (S-S-I), Cutout 12 | 4.08 | 400 | 3h41m |
shake-shake-26 2x32d (S-S-I), Cutout 14 | 4.05 | 400 | 3h39m |
shake-shake-26 2x96d (S-S-I), Cutout 12 | 3.72 | 400 | 13h46m |
shake-shake-26 2x96d (S-S-I), Cutout 14 | 3.85 | 400 | 13h39m |
shake-shake-26 2x96d (S-S-I), Cutout 12 | 3.65 | 800 | 26h42m |
shake-shake-26 2x96d (S-S-I), Cutout 14 | 3.60 | 800 | 26h42m |
Model | Test Error (median of 3 runs) | # of Epochs | Training Time |
---|---|---|---|
ResNet-preact-20 | 5.04 | 200 | 26m |
ResNet-preact-20, Cutout 6 | 4.84 | 200 | 26m |
ResNet-preact-20, Cutout 8 | 4.64 | 200 | 26m |
ResNet-preact-20, Cutout 10 | 4.74 | 200 | 26m |
ResNet-preact-20, Cutout 12 | 4.68 | 200 | 26m |
ResNet-preact-20, Cutout 14 | 4.64 | 200 | 26m |
ResNet-preact-20, Cutout 16 | 4.49 | 200 | 26m |
ResNet-preact-20, RandomErasing | 4.61 | 200 | 26m |
ResNet-preact-20, Mixup | 4.92 | 200 | 26m |
ResNet-preact-20, Mixup | 4.64 | 400 | 52m |
Model | Test Error (median of 3 runs) | # of Epochs | Training Time |
---|---|---|---|
ResNet-preact-20 | 0.40 | 100 | 12m |
ResNet-preact-20, Cutout 6 | 0.32 | 100 | 12m |
ResNet-preact-20, Cutout 8 | 0.25 | 100 | 12m |
ResNet-preact-20, Cutout 10 | 0.27 | 100 | 12m |
ResNet-preact-20, Cutout 12 | 0.26 | 100 | 12m |
ResNet-preact-20, Cutout 14 | 0.26 | 100 | 12m |
ResNet-preact-20, Cutout 16 | 0.25 | 100 | 12m |
ResNet-preact-20, Mixup (alpha=1) | 0.40 | 100 | 12m |
ResNet-preact-20, Mixup (alpha=0.5) | 0.38 | 100 | 12m |
ResNet-preact-20, widening factor 4, Cutout 14 | 0.26 | 100 | 45m |
ResNet-preact-50, Cutout 14 | 0.29 | 100 | 28m |
ResNet-preact-50, widening factor 4, Cutout 14 | 0.25 | 100 | 1h50m |
shake-shake-26 2x96d (S-S-I), Cutout 14 | 0.24 | 100 | 3h22m |
Model | Test Error (median of 3 runs) | # of Epochs | Training Time |
---|---|---|---|
ResNet-preact-20, Cutout 14 | 0.82 (best 0.67) | 200 | 24m |
ResNet-preact-20, widening factor 4, Cutout 14 | 0.72 (best 0.67) | 200 | 1h30m |
PyramidNet-110-270, Cutout 14 | 0.72 (best 0.70) | 200 | 10h05m |
shake-shake-26 2x96d (S-S-I), Cutout 14 | 0.66 (best 0.63) | 200 | 6h46m |
In this experiment, the effects of the following on classification accuracy are investigated:
ResNet-preact-56 is trained on CIFAR-10 with initial learning rate 0.2 in this experiment.
Model | Test Error (median of 5 runs) | Training Time |
---|---|---|
w/ 1st ReLU, w/o last BN, preactivate shortcut after downsampling | 6.45 | 95 min |
w/ 1st ReLU, w/o last BN | 6.47 | 95 min |
w/o 1st ReLU, w/o last BN | 6.14 | 89 min |
w/ 1st ReLU, w/ last BN | 6.43 | 104 min |
w/o 1st ReLU, w/ last BN | 5.85 | 98 min |
w/o 1st ReLU, w/ last BN, preactivate shortcut after downsampling | 6.27 | 98 min |
w/o 1st ReLU, w/ last BN, Cosine annealing | 5.72 | 98 min |
w/o 1st ReLU, w/ last BN, Cutout | 4.96 | 98 min |
w/o 1st ReLU, w/ last BN, RandomErasing | 5.22 | 98 min |
w/o 1st ReLU, w/ last BN, Mixup (300 epochs) | 5.11 | 191 min |
python train.py --config configs/cifar/resnet_preact.yaml \
train.base_lr 0.2 \
model.resnet_preact.depth 56 \
model.resnet_preact.preact_stage '[True, True, True]' \
model.resnet_preact.remove_first_relu False \
model.resnet_preact.add_last_bn False \
train.output_dir experiments/resnet_preact_after_downsampling/exp00
python train.py --config configs/cifar/resnet_preact.yaml \
train.base_lr 0.2 \
model.resnet_preact.depth 56 \
model.resnet_preact.preact_stage '[True, False, False]' \
model.resnet_preact.remove_first_relu False \
model.resnet_preact.add_last_bn False \
train.output_dir experiments/resnet_preact_w_relu_wo_bn/exp00
python train.py --config configs/cifar/resnet_preact.yaml \
train.base_lr 0.2 \
model.resnet_preact.depth 56 \
model.resnet_preact.preact_stage '[True, False, False]' \
model.resnet_preact.remove_first_relu True \
model.resnet_preact.add_last_bn False \
train.output_dir experiments/resnet_preact_wo_relu_wo_bn/exp00
python train.py --config configs/cifar/resnet_preact.yaml \
train.base_lr 0.2 \
model.resnet_preact.depth 56 \
model.resnet_preact.preact_stage '[True, False, False]' \
model.resnet_preact.remove_first_relu False \
model.resnet_preact.add_last_bn True \
train.output_dir experiments/resnet_preact_w_relu_w_bn/exp00
python train.py --config configs/cifar/resnet_preact.yaml \
train.base_lr 0.2 \
model.resnet_preact.depth 56 \
model.resnet_preact.preact_stage '[True, False, False]' \
model.resnet_preact.remove_first_relu True \
model.resnet_preact.add_last_bn True \
train.output_dir experiments/resnet_preact_wo_relu_w_bn/exp00
python train.py --config configs/cifar/resnet_preact.yaml \
train.base_lr 0.2 \
model.resnet_preact.depth 56 \
model.resnet_preact.preact_stage '[True, True, True]' \
model.resnet_preact.remove_first_relu True \
model.resnet_preact.add_last_bn True \
train.output_dir experiments/resnet_preact_after_downsampling_wo_relu_w_bn/exp00
python train.py --config configs/cifar/resnet_preact.yaml \
train.base_lr 0.2 \
model.resnet_preact.depth 56 \
model.resnet_preact.preact_stage '[True, False, False]' \
model.resnet_preact.remove_first_relu True \
model.resnet_preact.add_last_bn True \
scheduler.type cosine \
train.output_dir experiments/resnet_preact_wo_relu_w_bn_cosine/exp00
python train.py --config configs/cifar/resnet_preact.yaml \
train.base_lr 0.2 \
model.resnet_preact.depth 56 \
model.resnet_preact.preact_stage '[True, False, False]' \
model.resnet_preact.remove_first_relu True \
model.resnet_preact.add_last_bn True \
augmentation.use_cutout True \
train.output_dir experiments/resnet_preact_wo_relu_w_bn_cutout/exp00
python train.py --config configs/cifar/resnet_preact.yaml \
train.base_lr 0.2 \
model.resnet_preact.depth 56 \
model.resnet_preact.preact_stage '[True, False, False]' \
model.resnet_preact.remove_first_relu True \
model.resnet_preact.add_last_bn True \
augmentation.use_random_erasing True \
train.output_dir experiments/resnet_preact_wo_relu_w_bn_random_erasing/exp00
python train.py --config configs/cifar/resnet_preact.yaml \
train.base_lr 0.2 \
model.resnet_preact.depth 56 \
model.resnet_preact.preact_stage '[True, False, False]' \
model.resnet_preact.remove_first_relu True \
model.resnet_preact.add_last_bn True \
augmentation.use_mixup True \
train.output_dir experiments/resnet_preact_wo_relu_w_bn_mixup/exp00
Model | Test Error (median of 3 runs) | # of Epochs | Training Time |
---|---|---|---|
ResNet-preact-20 | 7.60 | 200 | 24m |
ResNet-preact-20, label smoothing (epsilon=0.001) | 7.51 | 200 | 25m |
ResNet-preact-20, label smoothing (epsilon=0.01) | 7.21 | 200 | 25m |
ResNet-preact-20, label smoothing (epsilon=0.1) | 7.57 | 200 | 25m |
ResNet-preact-20, mixup (alpha=1) | 7.24 | 200 | 26m |
ResNet-preact-20, RICAP (beta=0.3), w/ random crop | 6.88 | 200 | 28m |
ResNet-preact-20, RICAP (beta=0.3) | 6.77 | 200 | 28m |
ResNet-preact-20, Dual-Cutout 16 (alpha=0.1) | 6.24 | 200 | 45m |
ResNet-preact-20 | 7.05 | 400 | 49m |
ResNet-preact-20, label smoothing (epsilon=0.001) | 7.20 | 400 | 49m |
ResNet-preact-20, label smoothing (epsilon=0.01) | 6.97 | 400 | 49m |
ResNet-preact-20, label smoothing (epsilon=0.1) | 7.16 | 400 | 49m |
ResNet-preact-20, mixup (alpha=1) | 6.66 | 400 | 51m |
ResNet-preact-20, RICAP (beta=0.3), w/ random crop | 6.30 | 400 | 56m |
ResNet-preact-20, RICAP (beta=0.3) | 6.19 | 400 | 56m |
ResNet-preact-20, Dual-Cutout 16 (alpha=0.1) | 5.55 | 400 | 1h36m |
Model | batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|
ResNet-preact-20 | 4096 | 3.2 | cosine | 200 | 10.57 | 22m |
ResNet-preact-20 | 2048 | 1.6 | cosine | 200 | 8.87 | 21m |
ResNet-preact-20 | 1024 | 0.8 | cosine | 200 | 8.40 | 21m |
ResNet-preact-20 | 512 | 0.4 | cosine | 200 | 8.22 | 20m |
ResNet-preact-20 | 256 | 0.2 | cosine | 200 | 8.61 | 22m |
ResNet-preact-20 | 128 | 0.1 | cosine | 200 | 8.09 | 24m |
ResNet-preact-20 | 64 | 0.05 | cosine | 200 | 8.22 | 28m |
ResNet-preact-20 | 32 | 0.025 | cosine | 200 | 8.00 | 43m |
ResNet-preact-20 | 16 | 0.0125 | cosine | 200 | 7.75 | 1h17m |
ResNet-preact-20 | 8 | 0.006125 | cosine | 200 | 7.70 | 2h32m |
Model | batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|
ResNet-preact-20 | 4096 | 3.2 | multistep | 200 | 28.97 | 22m |
ResNet-preact-20 | 2048 | 1.6 | multistep | 200 | 9.07 | 21m |
ResNet-preact-20 | 1024 | 0.8 | multistep | 200 | 8.62 | 21m |
ResNet-preact-20 | 512 | 0.4 | multistep | 200 | 8.23 | 20m |
ResNet-preact-20 | 256 | 0.2 | multistep | 200 | 8.40 | 21m |
ResNet-preact-20 | 128 | 0.1 | multistep | 200 | 8.28 | 24m |
ResNet-preact-20 | 64 | 0.05 | multistep | 200 | 8.13 | 28m |
ResNet-preact-20 | 32 | 0.025 | multistep | 200 | 7.58 | 43m |
ResNet-preact-20 | 16 | 0.0125 | multistep | 200 | 7.93 | 1h18m |
ResNet-preact-20 | 8 | 0.006125 | multistep | 200 | 8.31 | 2h34m |
Model | batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|
ResNet-preact-20 | 4096 | 3.2 | cosine | 400 | 8.97 | 44m |
ResNet-preact-20 | 2048 | 1.6 | cosine | 400 | 7.85 | 43m |
ResNet-preact-20 | 1024 | 0.8 | cosine | 400 | 7.20 | 42m |
ResNet-preact-20 | 512 | 0.4 | cosine | 400 | 7.83 | 40m |
ResNet-preact-20 | 256 | 0.2 | cosine | 400 | 7.65 | 42m |
ResNet-preact-20 | 128 | 0.1 | cosine | 400 | 7.09 | 47m |
ResNet-preact-20 | 64 | 0.05 | cosine | 400 | 7.17 | 44m |
ResNet-preact-20 | 32 | 0.025 | cosine | 400 | 7.24 | 2h11m |
ResNet-preact-20 | 16 | 0.0125 | cosine | 400 | 7.26 | 4h10m |
ResNet-preact-20 | 8 | 0.006125 | cosine | 400 | 7.02 | 7h53m |
Model | batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|
ResNet-preact-20 | 4096 | 3.2 | cosine | 800 | 8.14 | 1h29m |
ResNet-preact-20 | 2048 | 1.6 | cosine | 800 | 7.74 | 1h23m |
ResNet-preact-20 | 1024 | 0.8 | cosine | 800 | 7.15 | 1h31m |
ResNet-preact-20 | 512 | 0.4 | cosine | 800 | 7.27 | 1h25m |
ResNet-preact-20 | 256 | 0.2 | cosine | 800 | 7.22 | 1h26m |
ResNet-preact-20 | 128 | 0.1 | cosine | 800 | 6.68 | 1h35m |
ResNet-preact-20 | 64 | 0.05 | cosine | 800 | 7.18 | 2h20m |
ResNet-preact-20 | 32 | 0.025 | cosine | 800 | 7.03 | 4h16m |
ResNet-preact-20 | 16 | 0.0125 | cosine | 800 | 6.78 | 8h37m |
ResNet-preact-20 | 8 | 0.006125 | cosine | 800 | 6.89 | 16h47m |
Model | batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|
ResNet-preact-20 | 4096 | 3.2 | cosine | 200 | 10.57 | 22m |
ResNet-preact-20 | 4096 | 1.6 | cosine | 200 | 10.32 | 22m |
ResNet-preact-20 | 4096 | 0.8 | cosine | 200 | 10.71 | 22m |
Model | batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|
ResNet-preact-20 | 2048 | 3.2 | cosine | 200 | 11.34 | 21m |
ResNet-preact-20 | 2048 | 2.4 | cosine | 200 | 8.69 | 21m |
ResNet-preact-20 | 2048 | 2.0 | cosine | 200 | 8.81 | 21m |
ResNet-preact-20 | 2048 | 1.6 | cosine | 200 | 8.73 | 22m |
ResNet-preact-20 | 2048 | 0.8 | cosine | 200 | 9.62 | 21m |
Model | batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|
ResNet-preact-20 | 1024 | 3.2 | cosine | 200 | 9.12 | 21m |
ResNet-preact-20 | 1024 | 2.4 | cosine | 200 | 8.42 | 22m |
ResNet-preact-20 | 1024 | 2.0 | cosine | 200 | 8.38 | 22m |
ResNet-preact-20 | 1024 | 1.6 | cosine | 200 | 8.07 | 22m |
ResNet-preact-20 | 1024 | 1.2 | cosine | 200 | 8.25 | 21m |
ResNet-preact-20 | 1024 | 0.8 | cosine | 200 | 8.08 | 22m |
ResNet-preact-20 | 1024 | 0.4 | cosine | 200 | 8.49 | 22m |
Model | batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|
ResNet-preact-20 | 512 | 3.2 | cosine | 200 | 8.51 | 21m |
ResNet-preact-20 | 512 | 1.6 | cosine | 200 | 7.73 | 20m |
ResNet-preact-20 | 512 | 0.8 | cosine | 200 | 7.73 | 21m |
ResNet-preact-20 | 512 | 0.4 | cosine | 200 | 8.22 | 20m |
Model | batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|
ResNet-preact-20 | 256 | 3.2 | cosine | 200 | 9.64 | 22m |
ResNet-preact-20 | 256 | 1.6 | cosine | 200 | 8.32 | 22m |
ResNet-preact-20 | 256 | 0.8 | cosine | 200 | 7.45 | 21m |
ResNet-preact-20 | 256 | 0.4 | cosine | 200 | 7.68 | 22m |
ResNet-preact-20 | 256 | 0.2 | cosine | 200 | 8.61 | 22m |
Model | batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|
ResNet-preact-20 | 128 | 1.6 | cosine | 200 | 9.03 | 24m |
ResNet-preact-20 | 128 | 0.8 | cosine | 200 | 7.54 | 24m |
ResNet-preact-20 | 128 | 0.4 | cosine | 200 | 7.28 | 24m |
ResNet-preact-20 | 128 | 0.2 | cosine | 200 | 7.96 | 24m |
ResNet-preact-20 | 128 | 0.1 | cosine | 200 | 8.09 | 24m |
ResNet-preact-20 | 128 | 0.05 | cosine | 200 | 8.81 | 24m |
ResNet-preact-20 | 128 | 0.025 | cosine | 200 | 10.07 | 24m |
Model | batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|
ResNet-preact-20 | 64 | 0.4 | cosine | 200 | 7.42 | 35m |
ResNet-preact-20 | 64 | 0.2 | cosine | 200 | 7.52 | 36m |
ResNet-preact-20 | 64 | 0.1 | cosine | 200 | 7.78 | 37m |
ResNet-preact-20 | 64 | 0.05 | cosine | 200 | 8.22 | 28m |
Model | batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|
ResNet-preact-20 | 32 | 0.2 | cosine | 200 | 7.64 | 1h05m |
ResNet-preact-20 | 32 | 0.1 | cosine | 200 | 7.25 | 1h08m |
ResNet-preact-20 | 32 | 0.05 | cosine | 200 | 7.45 | 1h07m |
ResNet-preact-20 | 32 | 0.025 | cosine | 200 | 8.00 | 43m |
Model | batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|
ResNet-preact-20 | 4096 | 1.6 | cosine | 200 | 10.32 | 22m |
ResNet-preact-20 | 2048 | 1.6 | cosine | 200 | 8.73 | 22m |
ResNet-preact-20 | 1024 | 1.6 | cosine | 200 | 8.07 | 22m |
ResNet-preact-20 | 1024 | 0.8 | cosine | 200 | 8.08 | 22m |
ResNet-preact-20 | 512 | 1.6 | cosine | 200 | 7.73 | 20m |
ResNet-preact-20 | 512 | 0.8 | cosine | 200 | 7.73 | 21m |
ResNet-preact-20 | 256 | 0.8 | cosine | 200 | 7.45 | 21m |
ResNet-preact-20 | 128 | 0.4 | cosine | 200 | 7.28 | 24m |
ResNet-preact-20 | 128 | 0.2 | cosine | 200 | 7.96 | 24m |
ResNet-preact-20 | 128 | 0.1 | cosine | 200 | 8.09 | 24m |
Model | batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|
ResNet-preact-20 | 4096 | 1.6 | cosine | 800 | 8.36 | 1h33m |
ResNet-preact-20 | 2048 | 1.6 | cosine | 800 | 7.53 | 1h27m |
ResNet-preact-20 | 1024 | 1.6 | cosine | 800 | 7.30 | 1h30m |
ResNet-preact-20 | 1024 | 0.8 | cosine | 800 | 7.42 | 1h30m |
ResNet-preact-20 | 512 | 1.6 | cosine | 800 | 6.69 | 1h26m |
ResNet-preact-20 | 512 | 0.8 | cosine | 800 | 6.77 | 1h26m |
ResNet-preact-20 | 256 | 0.8 | cosine | 800 | 6.84 | 1h28m |
ResNet-preact-20 | 128 | 0.4 | cosine | 800 | 6.86 | 1h35m |
ResNet-preact-20 | 128 | 0.2 | cosine | 800 | 7.05 | 1h38m |
ResNet-preact-20 | 128 | 0.1 | cosine | 800 | 6.68 | 1h35m |
Model | batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|
ResNet-preact-20 | 4096 | 1.6 | cosine | 1600 | 8.25 | 3h10m |
ResNet-preact-20 | 2048 | 1.6 | cosine | 1600 | 7.34 | 2h50m |
ResNet-preact-20 | 1024 | 1.6 | cosine | 1600 | 6.94 | 2h52m |
ResNet-preact-20 | 512 | 1.6 | cosine | 1600 | 6.99 | 2h44m |
ResNet-preact-20 | 256 | 0.8 | cosine | 1600 | 6.95 | 2h50m |
ResNet-preact-20 | 128 | 0.4 | cosine | 1600 | 6.64 | 3h09m |
Model | batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|
ResNet-preact-20 | 4096 | 1.6 | cosine | 3200 | 9.52 | 6h15m |
ResNet-preact-20 | 2048 | 1.6 | cosine | 3200 | 6.92 | 5h42m |
ResNet-preact-20 | 1024 | 1.6 | cosine | 3200 | 6.96 | 5h43m |
Model | batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|
ResNet-preact-20 | 2048 | 1.6 | cosine | 6400 | 7.45 | 11h44m |
python train.py --config configs/cifar/resnet_preact.yaml \
model.resnet_preact.depth 20 \
train.optimizer lars \
train.base_lr 0.02 \
train.batch_size 4096 \
scheduler.type cosine \
train.output_dir experiments/resnet_preact_lars/exp00
Model | optimizer | batch size | initial lr | lr schedule | # of Epochs | Test Error (median of 3 runs) | Training Time |
---|---|---|---|---|---|---|---|
ResNet-preact-20 | SGD | 4096 | 3.2 | cosine | 200 | 10.57 (1 run) | 22m |
ResNet-preact-20 | SGD | 4096 | 1.6 | cosine | 200 | 10.20 | 22m |
ResNet-preact-20 | SGD | 4096 | 0.8 | cosine | 200 | 10.71 (1 run) | 22m |
ResNet-preact-20 | LARS | 4096 | 0.04 | cosine | 200 | 9.58 | 22m |
ResNet-preact-20 | LARS | 4096 | 0.03 | cosine | 200 | 8.46 | 22m |
ResNet-preact-20 | LARS | 4096 | 0.02 | cosine | 200 | 8.21 | 22m |
ResNet-preact-20 | LARS | 4096 | 0.015 | cosine | 200 | 8.47 | 22m |
ResNet-preact-20 | LARS | 4096 | 0.01 | cosine | 200 | 9.33 | 22m |
ResNet-preact-20 | LARS | 4096 | 0.005 | cosine | 200 | 14.31 | 22m |
Model | optimizer | batch size | initial lr | lr schedule | # of Epochs | Test Error (median of 3 runs) | Training Time |
---|---|---|---|---|---|---|---|
ResNet-preact-20 | SGD | 2048 | 3.2 | cosine | 200 | 11.34 (1 run) | 21m |
ResNet-preact-20 | SGD | 2048 | 2.4 | cosine | 200 | 8.69 (1 run) | 21m |
ResNet-preact-20 | SGD | 2048 | 2.0 | cosine | 200 | 8.81 (1 run) | 21m |
ResNet-preact-20 | SGD | 2048 | 1.6 | cosine | 200 | 8.73 (1 run) | 22m |
ResNet-preact-20 | SGD | 2048 | 0.8 | cosine | 200 | 9.62 (1 run) | 21m |
ResNet-preact-20 | LARS | 2048 | 0.04 | cosine | 200 | 11.58 | 21m |
ResNet-preact-20 | LARS | 2048 | 0.02 | cosine | 200 | 8.05 | 22m |
ResNet-preact-20 | LARS | 2048 | 0.01 | cosine | 200 | 8.07 | 22m |
ResNet-preact-20 | LARS | 2048 | 0.005 | cosine | 200 | 9.65 | 22m |
Model | optimizer | batch size | initial lr | lr schedule | # of Epochs | Test Error (median of 3 runs) | Training Time |
---|---|---|---|---|---|---|---|
ResNet-preact-20 | SGD | 1024 | 3.2 | cosine | 200 | 9.12 (1 run) | 21m |
ResNet-preact-20 | SGD | 1024 | 2.4 | cosine | 200 | 8.42 (1 run) | 22m |
ResNet-preact-20 | SGD | 1024 | 2.0 | cosine | 200 | 8.38 (1 run) | 22m |
ResNet-preact-20 | SGD | 1024 | 1.6 | cosine | 200 | 8.07 (1 run) | 22m |
ResNet-preact-20 | SGD | 1024 | 1.2 | cosine | 200 | 8.25 (1 run) | 21m |
ResNet-preact-20 | SGD | 1024 | 0.8 | cosine | 200 | 8.08 (1 run) | 22m |
ResNet-preact-20 | SGD | 1024 | 0.4 | cosine | 200 | 8.49 (1 run) | 22m |
ResNet-preact-20 | LARS | 1024 | 0.02 | cosine | 200 | 9.30 | 22m |
ResNet-preact-20 | LARS | 1024 | 0.01 | cosine | 200 | 7.68 | 22m |
ResNet-preact-20 | LARS | 1024 | 0.005 | cosine | 200 | 8.88 | 23m |
Model | optimizer | batch size | initial lr | lr schedule | # of Epochs | Test Error (median of 3 runs) | Training Time |
---|---|---|---|---|---|---|---|
ResNet-preact-20 | SGD | 512 | 3.2 | cosine | 200 | 8.51 (1 run) | 21m |
ResNet-preact-20 | SGD | 512 | 1.6 | cosine | 200 | 7.73 (1 run) | 20m |
ResNet-preact-20 | SGD | 512 | 0.8 | cosine | 200 | 7.73 (1 run) | 21m |
ResNet-preact-20 | SGD | 512 | 0.4 | cosine | 200 | 8.22 (1 run) | 20m |
ResNet-preact-20 | LARS | 512 | 0.015 | cosine | 200 | 9.84 | 23m |
ResNet-preact-20 | LARS | 512 | 0.01 | cosine | 200 | 8.05 | 23m |
ResNet-preact-20 | LARS | 512 | 0.0075 | cosine | 200 | 7.58 | 23m |
ResNet-preact-20 | LARS | 512 | 0.005 | cosine | 200 | 7.96 | 23m |
ResNet-preact-20 | LARS | 512 | 0.0025 | cosine | 200 | 8.83 | 23m |
Model | optimizer | batch size | initial lr | lr schedule | # of Epochs | Test Error (median of 3 runs) | Training Time |
---|---|---|---|---|---|---|---|
ResNet-preact-20 | SGD | 256 | 3.2 | cosine | 200 | 9.64 (1 run) | 22m |
ResNet-preact-20 | SGD | 256 | 1.6 | cosine | 200 | 8.32 (1 run) | 22m |
ResNet-preact-20 | SGD | 256 | 0.8 | cosine | 200 | 7.45 (1 run) | 21m |
ResNet-preact-20 | SGD | 256 | 0.4 | cosine | 200 | 7.68 (1 run) | 22m |
ResNet-preact-20 | SGD | 256 | 0.2 | cosine | 200 | 8.61 (1 run) | 22m |
ResNet-preact-20 | LARS | 256 | 0.01 | cosine | 200 | 8.95 | 27m |
ResNet-preact-20 | LARS | 256 | 0.005 | cosine | 200 | 7.75 | 28m |
ResNet-preact-20 | LARS | 256 | 0.0025 | cosine | 200 | 8.21 | 28m |
Model | optimizer | batch size | initial lr | lr schedule | # of Epochs | Test Error (median of 3 runs) | Training Time |
---|---|---|---|---|---|---|---|
ResNet-preact-20 | SGD | 128 | 1.6 | cosine | 200 | 9.03 (1 run) | 24m |
ResNet-preact-20 | SGD | 128 | 0.8 | cosine | 200 | 7.54 (1 run) | 24m |
ResNet-preact-20 | SGD | 128 | 0.4 | cosine | 200 | 7.28 (1 run) | 24m |
ResNet-preact-20 | SGD | 128 | 0.2 | cosine | 200 | 7.96 (1 run) | 24m |
ResNet-preact-20 | LARS | 128 | 0.005 | cosine | 200 | 7.96 | 37m |
ResNet-preact-20 | LARS | 128 | 0.0025 | cosine | 200 | 7.98 | 37m |
ResNet-preact-20 | LARS | 128 | 0.00125 | cosine | 200 | 9.21 | 37m |
Model | optimizer | batch size | initial lr | lr schedule | # of Epochs | Test Error (median of 3 runs) | Training Time |
---|---|---|---|---|---|---|---|
ResNet-preact-20 | SGD | 4096 | 1.6 | cosine | 200 | 10.20 | 22m |
ResNet-preact-20 | SGD | 4096 | 1.6 | cosine | 800 | 8.36 (1 run) | 1h33m |
ResNet-preact-20 | SGD | 4096 | 1.6 | cosine | 1600 | 8.25 (1 run) | 3h10m |
ResNet-preact-20 | LARS | 4096 | 0.02 | cosine | 200 | 8.21 | 22m |
ResNet-preact-20 | LARS | 4096 | 0.02 | cosine | 400 | 7.53 | 44m |
ResNet-preact-20 | LARS | 4096 | 0.02 | cosine | 800 | 7.48 | 1h29m |
ResNet-preact-20 | LARS | 4096 | 0.02 | cosine | 1600 | 7.37 (1 run) | 2h58m |
python train.py --config configs/cifar/resnet_preact.yaml \
model.resnet_preact.depth 20 \
train.base_lr 1.5 \
train.batch_size 4096 \
train.subdivision 32 \
scheduler.type cosine \
train.output_dir experiments/resnet_preact_ghost_batch/exp00
Model | batch size | ghost batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|---|
ResNet-preact-20 | 8192 | N/A | 1.6 | cosine | 200 | 12.35 | 25m* |
ResNet-preact-20 | 4096 | N/A | 1.6 | cosine | 200 | 10.32 | 22m |
ResNet-preact-20 | 2048 | N/A | 1.6 | cosine | 200 | 8.73 | 22m |
ResNet-preact-20 | 1024 | N/A | 1.6 | cosine | 200 | 8.07 | 22m |
ResNet-preact-20 | 128 | N/A | 0.4 | cosine | 200 | 7.28 | 24m |
Model | batch size | ghost batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|---|
ResNet-preact-20 | 8192 | 128 | 1.6 | cosine | 200 | 11.51 | 27m |
ResNet-preact-20 | 4096 | 128 | 1.6 | cosine | 200 | 9.73 | 25m |
ResNet-preact-20 | 2048 | 128 | 1.6 | cosine | 200 | 8.77 | 24m |
ResNet-preact-20 | 1024 | 128 | 1.6 | cosine | 200 | 7.82 | 22m |
Model | batch size | ghost batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|---|
ResNet-preact-20 | 8192 | N/A | 1.6 | cosine | 1600 | ||
ResNet-preact-20 | 4096 | N/A | 1.6 | cosine | 1600 | 8.25 | 3h10m |
ResNet-preact-20 | 2048 | N/A | 1.6 | cosine | 1600 | 7.34 | 2h50m |
ResNet-preact-20 | 1024 | N/A | 1.6 | cosine | 1600 | 6.94 | 2h52m |
Model | batch size | ghost batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|---|
ResNet-preact-20 | 8192 | 128 | 1.6 | cosine | 1600 | 11.83 | 3h37m |
ResNet-preact-20 | 4096 | 128 | 1.6 | cosine | 1600 | 8.95 | 3h15m |
ResNet-preact-20 | 2048 | 128 | 1.6 | cosine | 1600 | 7.23 | 3h05m |
ResNet-preact-20 | 1024 | 128 | 1.6 | cosine | 1600 | 7.08 | 2h59m |
python train.py --config configs/cifar/resnet_preact.yaml \
model.resnet_preact.depth 20 \
train.base_lr 1.6 \
train.batch_size 4096 \
train.no_weight_decay_on_bn True \
train.weight_decay 5e-4 \
scheduler.type cosine \
train.output_dir experiments/resnet_preact_no_weight_decay_on_bn/exp00
Model | weight decay on BN | weight decay | batch size | initial lr | lr schedule | # of Epochs | Test Error (median of 3 runs) | Training Time |
---|---|---|---|---|---|---|---|---|
ResNet-preact-20 | yes | 5e-4 | 4096 | 1.6 | cosine | 200 | 10.81 | 22m |
ResNet-preact-20 | yes | 4e-4 | 4096 | 1.6 | cosine | 200 | 10.88 | 22m |
ResNet-preact-20 | yes | 3e-4 | 4096 | 1.6 | cosine | 200 | 10.96 | 22m |
ResNet-preact-20 | yes | 2e-4 | 4096 | 1.6 | cosine | 200 | 9.30 | 22m |
ResNet-preact-20 | yes | 1e-4 | 4096 | 1.6 | cosine | 200 | 10.20 | 22m |
ResNet-preact-20 | no | 5e-4 | 4096 | 1.6 | cosine | 200 | 8.78 | 22m |
ResNet-preact-20 | no | 4e-4 | 4096 | 1.6 | cosine | 200 | 9.83 | 22m |
ResNet-preact-20 | no | 3e-4 | 4096 | 1.6 | cosine | 200 | 9.90 | 22m |
ResNet-preact-20 | no | 2e-4 | 4096 | 1.6 | cosine | 200 | 9.64 | 22m |
ResNet-preact-20 | no | 1e-4 | 4096 | 1.6 | cosine | 200 | 10.38 | 22m |
Model | weight decay on BN | weight decay | batch size | initial lr | lr schedule | # of Epochs | Test Error (median of 3 runs) | Training Time |
---|---|---|---|---|---|---|---|---|
ResNet-preact-20 | yes | 5e-4 | 2048 | 1.6 | cosine | 200 | 8.46 | 20m |
ResNet-preact-20 | yes | 4e-4 | 2048 | 1.6 | cosine | 200 | 8.35 | 20m |
ResNet-preact-20 | yes | 3e-4 | 2048 | 1.6 | cosine | 200 | 7.76 | 20m |
ResNet-preact-20 | yes | 2e-4 | 2048 | 1.6 | cosine | 200 | 8.09 | 20m |
ResNet-preact-20 | yes | 1e-4 | 2048 | 1.6 | cosine | 200 | 8.83 | 20m |
ResNet-preact-20 | no | 5e-4 | 2048 | 1.6 | cosine | 200 | 8.49 | 20m |
ResNet-preact-20 | no | 4e-4 | 2048 | 1.6 | cosine | 200 | 7.98 | 20m |
ResNet-preact-20 | no | 3e-4 | 2048 | 1.6 | cosine | 200 | 8.26 | 20m |
ResNet-preact-20 | no | 2e-4 | 2048 | 1.6 | cosine | 200 | 8.47 | 20m |
ResNet-preact-20 | no | 1e-4 | 2048 | 1.6 | cosine | 200 | 9.27 | 20m |
Model | weight decay on BN | weight decay | batch size | initial lr | lr schedule | # of Epochs | Test Error (median of 3 runs) | Training Time |
---|---|---|---|---|---|---|---|---|
ResNet-preact-20 | yes | 5e-4 | 1024 | 1.6 | cosine | 200 | 8.45 | 21m |
ResNet-preact-20 | yes | 4e-4 | 1024 | 1.6 | cosine | 200 | 7.91 | 21m |
ResNet-preact-20 | yes | 3e-4 | 1024 | 1.6 | cosine | 200 | 7.81 | 21m |
ResNet-preact-20 | yes | 2e-4 | 1024 | 1.6 | cosine | 200 | 7.69 | 21m |
ResNet-preact-20 | yes | 1e-4 | 1024 | 1.6 | cosine | 200 | 8.26 | 21m |
ResNet-preact-20 | no | 5e-4 | 1024 | 1.6 | cosine | 200 | 8.08 | 21m |
ResNet-preact-20 | no | 4e-4 | 1024 | 1.6 | cosine | 200 | 7.73 | 21m |
ResNet-preact-20 | no | 3e-4 | 1024 | 1.6 | cosine | 200 | 7.92 | 21m |
ResNet-preact-20 | no | 2e-4 | 1024 | 1.6 | cosine | 200 | 7.93 | 21m |
ResNet-preact-20 | no | 1e-4 | 1024 | 1.6 | cosine | 200 | 8.53 | 21m |
python train.py --config configs/cifar/resnet_preact.yaml \
model.resnet_preact.depth 20 \
train.base_lr 1.6 \
train.batch_size 4096 \
train.precision O3 \
scheduler.type cosine \
train.output_dir experiments/resnet_preact_fp16/exp00
python train.py --config configs/cifar/resnet_preact.yaml \
model.resnet_preact.depth 20 \
train.base_lr 1.6 \
train.batch_size 4096 \
train.precision O1 \
scheduler.type cosine \
train.output_dir experiments/resnet_preact_mixed_precision/exp00
Model | precision | batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|---|
ResNet-preact-20 | FP32 | 8192 | 1.6 | cosine | 200 | ||
ResNet-preact-20 | FP32 | 4096 | 1.6 | cosine | 200 | 10.32 | 22m |
ResNet-preact-20 | FP32 | 2048 | 1.6 | cosine | 200 | 8.73 | 22m |
ResNet-preact-20 | FP32 | 1024 | 1.6 | cosine | 200 | 8.07 | 22m |
ResNet-preact-20 | FP32 | 512 | 0.8 | cosine | 200 | 7.73 | 21m |
ResNet-preact-20 | FP32 | 256 | 0.8 | cosine | 200 | 7.45 | 21m |
ResNet-preact-20 | FP32 | 128 | 0.4 | cosine | 200 | 7.28 | 24m |
Model | precision | batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|---|
ResNet-preact-20 | FP16 | 8192 | 1.6 | cosine | 200 | 48.52 | 33m |
ResNet-preact-20 | FP16 | 4096 | 1.6 | cosine | 200 | 49.84 | 28m |
ResNet-preact-20 | FP16 | 2048 | 1.6 | cosine | 200 | 75.63 | 27m |
ResNet-preact-20 | FP16 | 1024 | 1.6 | cosine | 200 | 19.09 | 27m |
ResNet-preact-20 | FP16 | 512 | 0.8 | cosine | 200 | 7.89 | 26m |
ResNet-preact-20 | FP16 | 256 | 0.8 | cosine | 200 | 7.40 | 28m |
ResNet-preact-20 | FP16 | 128 | 0.4 | cosine | 200 | 7.59 | 32m |
Model | precision | batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|---|
ResNet-preact-20 | mixed | 8192 | 1.6 | cosine | 200 | 11.78 | 28m |
ResNet-preact-20 | mixed | 4096 | 1.6 | cosine | 200 | 10.48 | 27m |
ResNet-preact-20 | mixed | 2048 | 1.6 | cosine | 200 | 8.98 | 26m |
ResNet-preact-20 | mixed | 1024 | 1.6 | cosine | 200 | 8.05 | 26m |
ResNet-preact-20 | mixed | 512 | 0.8 | cosine | 200 | 7.81 | 28m |
ResNet-preact-20 | mixed | 256 | 0.8 | cosine | 200 | 7.58 | 32m |
ResNet-preact-20 | mixed | 128 | 0.4 | cosine | 200 | 7.37 | 41m |
Model | precision | batch size | initial lr | lr schedule | # of Epochs | Test Error (1 run) | Training Time |
---|---|---|---|---|---|---|---|
ResNet-preact-20 | FP32 | 8192 | 1.6 | cosine | 200 | 12.35 | 25m |
ResNet-preact-20 | FP32 | 4096 | 1.6 | cosine | 200 | 9.88 | 19m |
ResNet-preact-20 | FP32 | 2048 | 1.6 | cosine | 200 | 8.87 | 17m |
ResNet-preact-20 | FP32 | 1024 | 1.6 | cosine | 200 | 8.45 | 18m |
ResNet-preact-20 | mixed | 8192 | 1.6 | cosine | 200 | 11.92 | 25m |
ResNet-preact-20 | mixed | 4096 | 1.6 | cosine | 200 | 10.16 | 19m |
ResNet-preact-20 | mixed | 2048 | 1.6 | cosine | 200 | 9.10 | 17m |
ResNet-preact-20 | mixed | 1024 | 1.6 | cosine | 200 | 7.84 | 16m |
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。