-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gradient checkpointing #711
Changes from 73 commits
012f92d
79ff583
7ec607a
0426f58
433167a
a2811ff
6ec564a
213f5ef
5269096
5fb26d1
2029e42
d2e864a
c690eb9
8336138
d74dc82
ef6584a
8ddff72
bea39c1
fc3c31f
1daf75f
a098a3c
7db4091
14bb3e1
7adff15
9cfe955
61f1bad
72b85f9
3dbee2e
dbb2066
4026376
e61e4df
752717b
6b97d6d
9694ffb
871d8dc
a919982
06888d1
7169be1
0da6486
b99215d
36ce273
294e53d
c0dbafa
ce4e7cd
d266e24
7fc4659
f1cdb2f
3babbf3
5d1dcf6
e5bda6c
08dd162
d514146
d0ad430
08ea86d
c346a8b
c9f4ab2
f007130
285dd5b
cb67af3
363dbe7
9f99e43
aff920d
ad5edd0
1b5ca8f
23d63d9
657c877
f768439
9771a8b
6026eae
697b0aa
e3dcadb
6eecfca
5214c15
cd09613
913b5bf
d7da5b1
ed54123
ffb122e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -56,15 +56,15 @@ We have provided a list of EfficientDet checkpoints and results as follows: | |
|
||
| Model | AP<sup>test</sup> | AP<sub>50</sub> | AP<sub>75</sub> |AP<sub>S</sub> | AP<sub>M</sub> | AP<sub>L</sub> | AP<sup>val</sup> | | #params | #FLOPs | | ||
|---------- |------ |------ |------ | -------- | ------| ------| ------ |------ |------ | :------: | | ||
| EfficientDet-D0 ([h5](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d0.h5), [ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d0.tar.gz), [val](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/val/d0_coco_val.txt), [test-dev](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/testdev/d0_coco_test-dev2017.txt)) | 34.6 | 53.0 | 37.1 | 12.4 | 39.0 | 52.7 | 34.3 | | 3.9M | 2.54B | | ||
| EfficientDet-D1 ([h5](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d1.h5), [ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d1.tar.gz), [val](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/val/d1_coco_val.txt), [test-dev](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/testdev/d1_coco_test-dev2017.txt)) | 40.5 | 59.1 | 43.7 | 18.3 | 45.0 | 57.5 | 40.2 | | 6.6M | 6.10B | | ||
| EfficientDet-D2 ([h5](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d2.h5), [ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d2.tar.gz), [val](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/val/d2_coco_val.txt), [test-dev](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/testdev/d2_coco_test-dev2017.txt)) | 43.0 | 62.3 | 46.2 | 22.5 | 47.0 | 58.4 | 42.5 | | 8.1M | 11.0B | | ||
| EfficientDet-D3 ([h5](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d3.h5), [ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d3.tar.gz), [val](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/val/d3_coco_val.txt), [test-dev](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/testdev/d3_coco_test-dev2017.txt)) | 47.5 | 66.2 | 51.5 | 27.9 | 51.4 | 62.0 | 47.2 | | 12.0M | 24.9B | | ||
| EfficientDet-D4 ([h5](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d4.h5), [ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d4.tar.gz), [val](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/val/d4_coco_val.txt), [test-dev](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/testdev/d4_coco_test-dev2017.txt)) | 49.7 | 68.4 | 53.9 | 30.7 | 53.2 | 63.2 | 49.3 | | 20.7M | 55.2B | | ||
| EfficientDet-D5 ([h5](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d5.h5), [ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d5.tar.gz), [val](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/val/d5_coco_val.txt), [test-dev](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/testdev/d5_coco_test-dev2017.txt)) | 51.5 | 70.5 | 56.1 | 33.9 | 54.7 | 64.1 | 51.2 | | 33.7M | 130B | | ||
| EfficientDet-D6 ([h5](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d6.h5), [ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d6.tar.gz), [val](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/val/d6_coco_val.txt), [test-dev](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/testdev/d6_coco_test-dev2017.txt)) | 52.6 | 71.5 | 57.2 | 34.9 | 56.0 | 65.4 | 52.1 | | 51.9M | 226B | | ||
| EfficientDet-D7 ([h5](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d7.h5), [ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d7.tar.gz), [val](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/val/d7_coco_val.txt), [test-dev](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/testdev/d7_coco_test-dev2017.txt)) | 53.7 | 72.4 | 58.4 | 35.8 | 57.0 | 66.3 | 53.4 | | 51.9M | 325B | | ||
| EfficientDet-D7x ([h5](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d7x.h5), [ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d7x.tar.gz), [val](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/val/d7x_coco_val.txt), [test-dev](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/testdev/d7x_coco_test-dev2017.txt)) | 55.1 | 74.3 | 59.9 | 37.2 | 57.9 | 68.0 | 54.4 | | 77.0M | 410B | | ||
| EfficientDet-D0 ([ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d0.tar.gz), [val](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/val/d0_coco_val.txt), [test-dev](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/testdev/d0_coco_test-dev2017.txt)) | 34.6 | 53.0 | 37.1 | 12.4 | 39.0 | 52.7 | 34.3 | | 3.9M | 2.54B | | ||
| EfficientDet-D1 ([ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d1.tar.gz), [val](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/val/d1_coco_val.txt), [test-dev](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/testdev/d1_coco_test-dev2017.txt)) | 40.5 | 59.1 | 43.7 | 18.3 | 45.0 | 57.5 | 40.2 | | 6.6M | 6.10B | | ||
| EfficientDet-D2 ([ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d2.tar.gz), [val](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/val/d2_coco_val.txt), [test-dev](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/testdev/d2_coco_test-dev2017.txt)) | 43.0 | 62.3 | 46.2 | 22.5 | 47.0 | 58.4 | 42.5 | | 8.1M | 11.0B | | ||
| EfficientDet-D3 ([ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d3.tar.gz), [val](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/val/d3_coco_val.txt), [test-dev](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/testdev/d3_coco_test-dev2017.txt)) | 47.5 | 66.2 | 51.5 | 27.9 | 51.4 | 62.0 | 47.2 | | 12.0M | 24.9B | | ||
| EfficientDet-D4 ([ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d4.tar.gz), [val](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/val/d4_coco_val.txt), [test-dev](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/testdev/d4_coco_test-dev2017.txt)) | 49.7 | 68.4 | 53.9 | 30.7 | 53.2 | 63.2 | 49.3 | | 20.7M | 55.2B | | ||
| EfficientDet-D5 ([ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d5.tar.gz), [val](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/val/d5_coco_val.txt), [test-dev](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/testdev/d5_coco_test-dev2017.txt)) | 51.5 | 70.5 | 56.1 | 33.9 | 54.7 | 64.1 | 51.2 | | 33.7M | 130B | | ||
| EfficientDet-D6 ([ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d6.tar.gz), [val](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/val/d6_coco_val.txt), [test-dev](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/testdev/d6_coco_test-dev2017.txt)) | 52.6 | 71.5 | 57.2 | 34.9 | 56.0 | 65.4 | 52.1 | | 51.9M | 226B | | ||
| EfficientDet-D7 ([ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d7.tar.gz), [val](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/val/d7_coco_val.txt), [test-dev](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/testdev/d7_coco_test-dev2017.txt)) | 53.7 | 72.4 | 58.4 | 35.8 | 57.0 | 66.3 | 53.4 | | 51.9M | 325B | | ||
| EfficientDet-D7x ([ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/efficientdet-d7x.tar.gz), [val](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/val/d7x_coco_val.txt), [test-dev](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco2/testdev/d7x_coco_test-dev2017.txt)) | 55.1 | 74.3 | 59.9 | 37.2 | 57.9 | 68.0 | 54.4 | | 77.0M | 410B | | ||
|
||
<sup><em>val</em> denotes validation results, <em>test-dev</em> denotes test-dev2017 results. AP<sup>val</sup> is for validation accuracy, all other AP results in the table are for COCO test-dev2017. All accuracy numbers are for single-model single-scale without ensemble or test-time augmentation. EfficientDet-D0 to D6 are trained for 300 epochs and D7/D7x are trained for 600 epochs.</sup> | ||
|
||
|
@@ -73,11 +73,11 @@ In addition, the following table includes a list of models trained with fixed 64 | |
|
||
| Model | mAP | Latency | | ||
| ------ | ------ | ------ | | ||
| D2(640) [h5](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco640/efficientdet-d2-640.h5), [ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco640/efficientdet-d2-640.tar.gz) | 41.7 | 14.8ms | | ||
| D3(640) [h5](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco640/efficientdet-d3-640.h5), [ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco640/efficientdet-d3-640.tar.gz) | 44.0 | 18.7ms | | ||
| D4(640) [h5](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco640/efficientdet-d4-640.h5), [ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco640/efficientdet-d4-640.tar.gz) | 45.7 | 21.7ms | | ||
| D5(640 [h5](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco640/efficientdet-d5-640.h5), [ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco640/efficientdet-d5-640.tar.gz) | 46.6 | 26.6ms | | ||
| D6(640) [h5](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco640/efficientdet-d6-640.h5), [ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco640/efficientdet-d6-640.tar.gz) | 47.9 | 33.8ms | | ||
| D2(640) [ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco640/efficientdet-d2-640.tar.gz) | 41.7 | 14.8ms | | ||
| D3(640) [ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco640/efficientdet-d3-640.tar.gz) | 44.0 | 18.7ms | | ||
| D4(640) [ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco640/efficientdet-d4-640.tar.gz) | 45.7 | 21.7ms | | ||
| D5(640) [ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco640/efficientdet-d5-640.tar.gz) | 46.6 | 26.6ms | | ||
| D6(640) [ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco640/efficientdet-d6-640.tar.gz) | 47.9 | 33.8ms | | ||
|
||
|
||
|
||
|
@@ -369,4 +369,26 @@ For more instructions about training on TPUs, please refer to the following tuto | |
|
||
* EfficientNet tutorial: https://cloud.google.com/tpu/docs/tutorials/efficientnet | ||
|
||
## 11. Reducing Memory Usage when Training EfficientDets on GPU. | ||
|
||
EfficientDets use a lot of GPU memory for a few reasons: | ||
|
||
* Large input resolution: because resolution is one of the scaling dimension, our resolution tends to be higher, which significantly increase activations (although no parameter increase). | ||
* Large internal activations for backbone: our backbone uses a relatively large expansion ratio (6), causing the large expanded activations. | ||
* Deep BiFPN: our BiFPN has multiple top-down and bottom-up paths, which leads to a lot of intermediate memory usage during training. | ||
|
||
To train this model on GPU with low memory there is an experimental option gradient_checkpointing. | ||
|
||
Check these links for a high-level idea of what gradient checkpointing is doing: | ||
1. https://github.com/cybertronai/gradient-checkpointing | ||
2. https://medium.com/tensorflow/fitting-larger-networks-into-memory-583e3c758ff9 | ||
|
||
**gradient_checkpointing: True** | ||
|
||
If set to True, strings defined by gradient_checkpointing_list (["Add"] by default) are searched in the tensors names and any tensors that match a string from the list are kept as checkpoints. When this option is used the standard tensorflow.python.ops.gradients method is being replaced with a custom method. | ||
|
||
Testing shows that: | ||
* On d4 network with batch-size of 1 (mixed precision enabled) it takes only 1/3.2 of memory with roughly 32% slower computation | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice document! |
||
* It also allows to compute a d6 network with batch size of 2 (mixed precision enabled) on a 11Gb (2080Ti) GPU | ||
|
||
NOTE: this is not an official Google product. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,7 +18,6 @@ | |
from absl import logging | ||
import numpy as np | ||
import tensorflow.compat.v1 as tf | ||
|
||
import coco_metric | ||
import efficientdet_arch | ||
import hparams_config | ||
|
@@ -153,7 +152,7 @@ def focal_loss(y_pred, y_true, alpha, gamma, normalizer, label_smoothing=0.0): | |
pred_prob = tf.sigmoid(y_pred) | ||
p_t = (y_true * pred_prob) + ((1 - y_true) * (1 - pred_prob)) | ||
alpha_factor = y_true * alpha + (1 - y_true) * (1 - alpha) | ||
modulating_factor = (1.0 - p_t) ** gamma | ||
modulating_factor = (1.0 - p_t)**gamma | ||
|
||
# apply label smoothing for cross_entropy for each entry. | ||
y_true = y_true * (1.0 - label_smoothing) + 0.5 * label_smoothing | ||
|
@@ -302,8 +301,7 @@ class and box losses from all levels. | |
box_loss = tf.add_n(box_losses) if box_losses else 0 | ||
|
||
total_loss = ( | ||
cls_loss + | ||
params['box_loss_weight'] * box_loss + | ||
cls_loss + params['box_loss_weight'] * box_loss + | ||
params['iou_loss_weight'] * box_iou_loss) | ||
|
||
return total_loss, cls_loss, box_loss, box_iou_loss | ||
|
@@ -347,6 +345,7 @@ def _model_fn(features, labels, mode, params, model, variable_filter_fn=None): | |
params['is_training_bn'] = (mode == tf.estimator.ModeKeys.TRAIN) | ||
|
||
if params['use_keras_model']: | ||
|
||
def model_fn(inputs): | ||
model = efficientdet_keras.EfficientDetNet( | ||
config=hparams_config.Config(params)) | ||
|
@@ -418,6 +417,23 @@ def model_fn(inputs): | |
|
||
if params['strategy'] == 'tpu': | ||
optimizer = tf.tpu.CrossShardOptimizer(optimizer) | ||
if params['gradient_checkpointing']: | ||
from third_party.grad_checkpoint \ | ||
import memory_saving_gradients # pylint: disable=g-import-not-at-top | ||
from tensorflow.python.ops \ | ||
import gradients # pylint: disable=g-import-not-at-top | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These imports can probably fit into a single line (try to avoid ""). |
||
|
||
# monkey patch tf.gradients to point to our custom version, | ||
# with automatic checkpoint selection | ||
def gradients_(ys, xs, grad_ys=None, **kwargs): | ||
return memory_saving_gradients.gradients( | ||
ys, | ||
xs, | ||
grad_ys, | ||
checkpoints=params['gradient_checkpointing_list'], | ||
**kwargs) | ||
|
||
gradients.__dict__["gradients"] = gradients_ | ||
|
||
# Batch norm requires update_ops to be added as a train_op dependency. | ||
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) | ||
|
@@ -615,6 +631,70 @@ def before_run(self, run_context): | |
every_n_iter=params.get('iterations_per_loop', 100), | ||
) | ||
training_hooks.append(logging_hook) | ||
|
||
if params["nvgpu_logging"]: | ||
try: | ||
from third_party.tools import nvgpu # pylint: disable=g-import-not-at-top | ||
from functools import reduce # pylint: disable=g-import-not-at-top | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just import functools, and use functools.reduce |
||
|
||
def get_nested_value(d, path): | ||
return reduce(dict.get, path, d) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we move most of the code to nvgpu, so this file can be clean? thanks. For example: nvgpu_gpu_info and commonsize and formatter_log can be moved to nvgpu. |
||
|
||
def nvgpu_gpu_info(inp): | ||
inp = inp.decode("utf-8") | ||
inp = inp.split(",") | ||
inp = [x.strip() for x in inp] | ||
value = get_nested_value(nvgpu.gpu_info(), inp) | ||
return np.str(value) | ||
|
||
def commonsize(inp): | ||
const_sizes = { | ||
'B': 1, | ||
'KB': 1e3, | ||
'MB': 1e6, | ||
'GB': 1e9, | ||
'TB': 1e12, | ||
'PB': 1e15, | ||
'KiB': 1024, | ||
'MiB': 1048576, | ||
'GiB': 1073741824 | ||
} | ||
inp = inp.split(" ") | ||
# convert all to MiB | ||
if inp[1] != 'MiB': | ||
inp_ = float(inp[0]) * (const_sizes[inp[1]] / 1048576.0) | ||
else: | ||
inp_ = float(inp[0]) | ||
|
||
return inp_ | ||
|
||
def formatter_log(tensors): | ||
"""Format the output.""" | ||
mem_used = tensors["memory used"].decode("utf-8") | ||
mem_total = tensors["memory total"].decode("utf-8") | ||
mem_util = commonsize(mem_used) / commonsize(mem_total) | ||
logstring = ( | ||
"GPU memory used: {} = {:.1%} ".format(mem_used, mem_util) + | ||
"of total GPU memory: {}".format(mem_total)) | ||
return logstring | ||
|
||
mem_used = tf.py_func(nvgpu_gpu_info, ['gpu, fb_memory_usage, used'], | ||
[tf.string])[0] | ||
mem_total = tf.py_func(nvgpu_gpu_info, ['gpu, fb_memory_usage, total'], | ||
[tf.string])[0] | ||
|
||
logging_hook3 = tf.estimator.LoggingTensorHook( | ||
tensors={ | ||
"memory used": mem_used, | ||
"memory total": mem_total, | ||
}, | ||
every_n_iter=params.get('iterations_per_loop', 100), | ||
formatter=formatter_log, | ||
) | ||
training_hooks.append(logging_hook3) | ||
except: | ||
logging.error("nvgpu error: nvidia-smi format not recognized") | ||
|
||
if params['strategy'] == 'tpu': | ||
return tf.estimator.tpu.TPUEstimatorSpec( | ||
mode=mode, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -136,7 +136,7 @@ def add_kv_recursive(k, v): | |
return {k: [eval_str_fn(vv) for vv in v.split('*')]} | ||
return {k: eval_str_fn(v)} | ||
pos = k.index('.') | ||
return {k[:pos]: add_kv_recursive(k[pos+1:], v)} | ||
return {k[:pos]: add_kv_recursive(k[pos + 1:], v)} | ||
|
||
def merge_dict_recursive(target, src): | ||
"""Recursively merge two nested dictionary.""" | ||
|
@@ -161,6 +161,8 @@ def as_dict(self): | |
else: | ||
config_dict[k] = copy.deepcopy(v) | ||
return config_dict | ||
|
||
|
||
# pylint: enable=protected-access | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe you can move "# pylint: enable=protected-access" right after return (with same indent), to avoid too many empty lines. |
||
|
||
|
@@ -281,6 +283,13 @@ def default_detection_configs(): | |
h.dataset_type = None | ||
h.positives_momentum = None | ||
|
||
# Reduces memory during training | ||
h.gradient_checkpointing = False | ||
h.gradient_checkpointing_list = ["Add"] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you add a comment to explain what values can be used other than "Add"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for adding more details. Could you explain a little bit more: what's the impact of this list? If I use ["Add"], does it mean it would automatically checkpoint all "Add" operation? If so, what's the pros and cons for adding more ops, and why the default is 'Add'? Sorry if these questions annoy you, but I am hoping to make it clear as this is a greatly useful feature. Thanks! |
||
|
||
# enable memory logging for NVIDIA cards | ||
h.nvgpu_logging = False | ||
|
||
return h | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these changes intended?