Gradient checkpointing #711

NikZak · 2020-08-28T08:40:25Z

Closes #85 , closes #368, closes #459, closes #737
Depends on #716

This is a slightly augmented version of gradient check-pointing algorithm for efficientdet network. It also includes porting of graph editor from tensorflow.contrib 1.15

This is an experimental option. It helps to save GPU memory while training.

As input, you need to provide a list of strings that would indicate which layers of the network to use to save checkpoints.

When this option is used the standard tensorflow.python.ops.gradients method is being replaced with a custom method. The parameters that you use are important and this requires further optimization. It takes time to reassemble the computation graph with new checkpoints and this operation is not multi-threaded at the moment - this could be improved. The graph reassembling only happens once per epoch in the beginning of the training epoch. Another improvement that could be made is caching the graph between epochs (which may not be straightforward given that every epoch runs in a separate process)

You have to provide a list of strings. These strings will be searched in the tensors name and only those tensors that match will be kept as checkpoints.

['L2Loss', 'entropy', 'FusedBatchNorm', 'Switch', 'dropout', 'Cast'] layers are always removed.

gradient_checkpointing: ["Add"] (leave only tensors with Add in the name) is an option that has been tested and works reasonably well:

For d4 network with batch-size of 1 (mixed precision enabled) it takes only 1/3.2 of memory with roughly 32% slower computation with this option enabled.

It also allows to train a d6 network with batch size of 2 (mixed precision enabled) on a 11Gb GPU which is impossible without this option

There were also some logging improvements added and in particular memory logging for Nvidia GPU (disabled by default) (for a single GPU at the moment as I don't have a multi GPU machine to test multiple GPUs)

I suggest to add this option as it does not break the main process flow. The memory improvement for GPU is very substantial but could be further optimized and improved through providing right params or changing the algorithm

… warning

ghost · 2020-08-28T08:58:02Z

I mean, this could help,
But according to the paper, Retinanet has 34M parameters, and I can train that network just fine on my 1080, however, with all the same settings, even a -D3 with 12M parameters will not fit? Doesn't that indicate that something is going wrong somewhere?

kartik4949 · 2020-08-28T11:26:15Z

@NikZak Hi
good work!
Have you tried all this with wrapping sub-modules (network) in tf.recompute_grad()
if this didnt work you can split the network in different parts which comprises big computation individually
and wrap them in tf.recompute_grad()

split_sub-model1 = tf.recompute_grad(split_sub-model1)
split_sub-model2 = tf.recompute_grad(split_sub-model2)
model = tf.keras.Sequential([split_sub-model1 , split_sub-model2])

Also above is a naive example for recompute_grad , you might need to split differently.
Can you try gradient checkpoint with this?
Thanks

fsx950223 · 2020-08-28T12:13:38Z

Awesome! But I have the same issue as @LaurensHagendoorn. I prefer to fix the network.
And I wonder why Tensorflow's memory optimizer doesn't work. According to document, they have similar behavior.

NikZak · 2020-08-28T12:15:24Z

@kartik4949 In principle, my algorithm is doing the same but instead of pointing the parts explicitly I looked at the graph of efficientnet and thought that 'Add' will be a good node to split the graph and added a functionality to split the graph by node name.

I did not try recompute_grad due to this [thread] (tensorflow/tensorflow#36981) and some other threads mentioning that recompute_grad does not help in reimplementing gradient checkpointing.

I suggest adding this capability as it allows training on smaller GPUs and then make enhancements to it or implement another method like recompute_grad and replace this functionality.

kartik4949 · 2020-08-28T12:23:48Z

@NikZak Sure ,Gradient Checkpoint is worth working on , but i will also prefer what @fsx950223 suggested
i.e to work on network fix , but on the go we can add this gradient checkpoint if it really help in memory saving .

fsx950223 · 2020-08-28T12:25:34Z

efficientdet/graph_editor/BUILD

@@ -0,0 +1,162 @@
+# Description:
+#   contains parts of TensorFlow that are experimental or unstable and which are not supported.


Do we need Bazel to build it?

@fsx950223 you don't need to build it for this gradient check-pointing and I never tried. Graph editor is not fully tested as a standalone tf 2.0 library but the functionality needed for gradient checkpointing works

kartik4949 · 2020-08-28T12:30:39Z

@NikZak also have you looked at this -> cybertronai/gradient-checkpointing#29
seems like it doesnt work above tf1.15
looking at the thread they started using recompute_grads instead of memory_gradients at least for tf2.x

NikZak · 2020-08-28T12:32:01Z

@kartik4949 this implementation of gradient checkpointing together with port of graph editor do work though

kartik4949 · 2020-08-28T12:33:39Z

@kartik4949 this implementation of gradient checkpointing together with port of graph editor do work though

Oh i see!

mingxingtan

@NikZak Fantastic work! Thanks a lot for adding this.

A high-level comment: since this CL is large, could you split it into 2 PRs:

PR1: just add graph_editor and memory_saving_gradients, without changing any existing files (This PR is large, but safe)
PR2: hook up memory_saving_gradients with existing files (this PR would be small, but with some risks)

Thank you!

mingxingtan · 2020-08-28T16:57:49Z

efficientdet/graph_editor/__init__.py

@@ -0,0 +1,41 @@
+# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
+#


This seems not the right copyright? Could you use the same copyright as existing files (Google Research)?

mingxingtan · 2020-08-28T16:58:53Z

efficientdet/graph_editor/README.md

+The TensorFlow Graph Editor library allows for modification of an existing
+tf.Graph instance in-place.
+
+The author's github username is [purpledog](https://github.com/purpledog).


Where is the source code fo this lib? What's the original copyright?

The source code is the one embedded in Tensorflow.contrib. https://github.com/tensorflow/tensorflow/tree/r1.15/tensorflow/contrib/graph_editor

In the init.py it is written "Licensed under the Apache License, Version 2.0" (same as tensorflow) which assumes that we can copy and modify.

Apache License 2.0
A permissive license whose main conditions require preservation of copyright and license notices. Contributors provide an express grant of patent rights. Licensed works, modifications, and larger works may be distributed under different terms and without source code.

Permissions
Commercial use
Modification
Distribution
Patent use
Private use
Limitations
Trademark use
Liability
Warranty

mingxingtan · 2020-08-28T17:16:16Z

I mean, this could help,
But according to the paper, Retinanet has 34M parameters, and I can train that network just fine on my 1080, however, with all the same settings, even a -D3 with 12M parameters will not fit? Doesn't that indicate that something is going wrong somewhere?

@NikZak @LaurensHagendoorn @fsx950223 @kartik4949

For CNNs, memory usage is mostly dominated by activations rather than parameters. EfficientDets use more memory for a few reasons:

Large input resolution: because resolution is one of the scaling dimension, our resolution tends to be higher, which significantly increase activations (although no parameter increase).
Large internal activations for backbone: our backbone uses a relatively large expansion ratio (6), causing the large expanded activations.
Deep BiFPN: RetinaNet uses a single top-down FPN, while our BiFPN has multiple top-down and bottom-up paths, which leads to much more intermediate memory usage during training.

It is great to see gradient checkpointing can significantly reduce memory without increasing much training time! Well done, @NikZak

kartik4949 · 2020-08-28T18:50:10Z

I mean, this could help,
But according to the paper, Retinanet has 34M parameters, and I can train that network just fine on my 1080, however, with all the same settings, even a -D3 with 12M parameters will not fit? Doesn't that indicate that something is going wrong somewhere?

@NikZak @LaurensHagendoorn @fsx950223 @kartik4949

For CNNs, memory usage is mostly dominated by activations rather than parameters. EfficientDets use more memory for a few reasons:

Large input resolution: because resolution is one of the scaling dimension, our resolution tends to be higher, which significantly increase activations (although no parameter increase).

Large internal activations for backbone: our backbone uses a relatively large expansion ratio (6), causing the large expanded activations.

Deep BiFPN: RetinaNet uses a single top-down FPN, while our BiFPN has multiple top-down and bottom-up paths, which leads to much more intermediate memory usage during training.

It is great to see gradient checkpointing can significantly reduce memory without increasing much training time! Well done, @NikZak

@mingxingtan makes sense ,got it now.

NikZak · 2020-08-29T00:03:11Z

@mingxingtan thanks for the comments! I will rectify on Monday

williamhyin · 2020-09-10T04:56:27Z

@williamhyin
Did you pull the new code from master? Could you provide an example in colab environment?
Could you try with following option?
clip_gradients_norm: 5.0

H Nikzak,

I have tried your method with clip_gradients_norm: 5.0.
But the result is the same. I prepare a subset of total datasets for you.
The following is colab notebook, which you can direct acess the github repo and datasets.

https://colab.research.google.com/drive/1wQBc2ukZ4gU9PryPOQKmUUuqyRPW7tJa?usp=sharing

I find that many place in this Efiicientdet official repo was written with detault coco classes number 90(100), if the classes number more then 100, maybe it will cause problem, such as the problem in eval modul.

File "/home/automl/efficientdet/coco_metric.py", line 132, in result
    self.metric_values = self.evaluate()

  File "/home/automl/efficientdet/coco_metric.py", line 125, in evaluate
    ap_perclass[c] = ap_c

I need change Initialization of ap_perclasses:
from
ap_perclass = [0] * 100 # assumeing at most 100 classes.
to
ap_perclass = [0] * 201 # assumeing at most 201 classes for dfg.

Then it worked.

So i dont know whether they have the similar reason(classes number more then 100).
Because in the previor vehicle-open-datasets(6 classes) performance comparision, i did not see the same situation.

@NikZak

NikZak · 2020-09-10T05:54:38Z

@williamhyin
Thanks a lot for you colab example! Great job!
I briefly ran your colab with

gradient_checkpointing: True

and I saw cls_loss = 10.0, det_loss = 10.02 the first few hundreds of steps.
Then I briefly ran your colab with

#gradient_checkpointing: True

which is same as gradient_checkpointing: False
and I also saw cls_loss = 10.0, det_loss = 10.03 the first few hundreds of steps

So shall I wait longer? Does it start to converge earlier with gradient_checkpointing set to False? At what step does it start to converge with gradient_checkpointing: True and at what step does it start to converge with gradient_checkpointing: False?

williamhyin · 2020-09-10T06:01:28Z

@williamhyin
Thanks a lot for you colab example! Great job!
I briefly ran your colab with
gradient_checkpointing: True
and I saw cls_loss = 10.0, det_loss = 10.02 the first few hundreds of steps.
Then I briefly ran your colab with
#gradient_checkpointing: True
which is same as gradient_checkpointing: False
and I also saw cls_loss = 10.0, det_loss = 10.03 the first few hundreds of steps

So shall I wait longer? Does it start to converge earlier with gradient_checkpointing set to False? At what step does it start to converge with gradient_checkpointing: True and at what step does it start to converge with gradient_checkpointing: False?

Should wait more than 9000 steps... If gradient_checkpointing: false, it will start converge from begining.

NikZak · 2020-09-10T06:04:54Z

@williamhyin thanks.
As I said I tried your colab with gradient_checkpointing: false and
it does not start to converge from the beginning

Same cls_loss = 10.0, det_loss = 10.03

Could you create two identical colabs then with only difference in the dfg.yaml file and gradient_checkpoint set to true or false. Then run them one by one and share the result

At the moment I did not see a difference from the beginning

williamhyin · 2020-09-10T06:11:34Z

@williamhyin thanks.
As I said I tried your colab with gradient_checkpointing: false and
it does not start to converge from the beginning

Same cls_loss = 10.0, det_loss = 10.03

Could you create two identical colabs then with only difference in the dfg.yaml file and gradient_checkpoint set to true or false. Then run them one by one and share the result

At the moment I did not see a difference from the beginning

gradient_checkpointing: false batch_size=2
start converge from 2600 step
https://colab.research.google.com/drive/14HeYSkRC_ObcnLOuuxPkDzP_8kRTruqL?usp=sharing

INFO:tensorflow:loss = 8.370411, step = 2600 (73.714 sec)
I0910 07:04:32.894324 140577495857024 basic_session_run_hooks.py:260] loss = 8.370411, step = 2600 (73.714 sec)
INFO:tensorflow:box_loss = 0.0021931177, cls_loss = 8.11686, det_loss = 8.226517, step = 2600 (73.714 sec)
I0910 07:04:32.894533 140577495857024 basic_session_run_hooks.py:260] box_loss = 0.0021931177, cls_loss = 8.11686, det_loss = 8.226517, step = 2600 (73.714 sec)
INFO:tensorflow:GPU memory used: 8805 MiB = 58.4% of total GPU memory: 15079 MiB

gradient_checkpointing: true batch_size=4
start converge from 10200 step

https://colab.research.google.com/drive/1wQBc2ukZ4gU9PryPOQKmUUuqyRPW7tJa?usp=sharing

I0910 11:17:06.202978 140103810455360 basic_session_run_hooks.py:702] global_step/sec: 1.02813
INFO:tensorflow:loss = 3.6903973, step = 10200 (97.264 sec)
I0910 11:17:06.204520 140103810455360 basic_session_run_hooks.py:260] loss = 3.6903973, step = 10200 (97.264 sec)
INFO:tensorflow:box_loss = 0.0013172865, cls_loss = 3.4850454, det_loss = 3.5509098, step = 10200 (97.263 sec)
I0910 11:17:06.204860 140103810455360 basic_session_run_hooks.py:260] box_loss = 0.0013172865, cls_loss = 3.4850454, det_loss = 3.5509098, step = 10200 (97.263 sec)
INFO:tensorflow:memory total = 11016.0, memory used = 10819.0, memory used % = 98.21169 (97.263 sec)

LucasSloan · 2020-09-11T17:23:43Z

Is this compatible with XLA? XLA gives massive speedups on efficientdet (2-3x) and I'd hate to lose that.

williamhyin · 2020-09-12T02:12:22Z

Is this compatible with XLA? XLA gives massive speedups on efficientdet (2-3x) and I'd hate to lose that.

Hi LucasSloan,

Thank your for your great suggestion! It works.
But i am confused about the use_xla command.

I have setted use_xla as true. It retured a warning info.

2020-09-12 09:51:48.487690: I tensorflow/compiler/jit/xla_compilation_cache.cc:314] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.

2020-09-12 09:51:51.446572: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1641] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.

I thought xla is not opened.
But the training speed is actually decreased from 29s to 17s.

Before :

I0911 21:27:43.641861 140484951627584 basic_session_run_hooks.py:260] box_loss = 0.00021900874, cls_loss = 0.22539198, det_loss = 0.23634242, step = 130300 (29.326 sec)
INFO:tensorflow:GPU memory used: 9897 MiB = 89.8% of total GPU memory: 11016 MiB

After:

I0912 10:00:21.039118 139902292236096 basic_session_run_hooks.py:260] box_loss = 0.0008860264, cls_loss = 0.1465769, det_loss = 0.19087821, step = 280500 (16.903 sec)
INFO:tensorflow:GPU memory used: 10134 MiB = 92.0% of total GPU memory: 11016 MiB
I0912 10:00:21.039428 139902292236096 basic_session_run_hooks.py:254] GPU memory used: 10134 MiB = 92.0% of total GPU memory: 11016 MiB

I'm confused about whether XLA is turned on or not, and if not, how should I set it.
Hope to receive your reply soon.
@LucasSloan

Thanks

NikZak · 2020-09-12T03:31:50Z

@LucasSloan
Thanks a lot for the suggestion.

In short, XLA and gradient checkpoint work together like a charm and provide a little bit of both worlds: reduced memory consumption and increased speed.

Switching on XLA and gradient checkpointing uses a little bit more memory compared to pure gradient checkpointing without XLA but significantly less memory compared to pure XLA without gradient checkpointing

I will probably provide more detailed stats next week

mingxingtan

Looks great! Thanks @NikZak

mingxingtan · 2020-09-19T18:56:49Z

efficientdet/README.md

-| D4(640) [h5](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco640/efficientdet-d4-640.h5), [ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco640/efficientdet-d4-640.tar.gz) |  45.7 | 21.7ms |
-| D5(640 [h5](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco640/efficientdet-d5-640.h5), [ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco640/efficientdet-d5-640.tar.gz) |  46.6 | 26.6ms |
-| D6(640) [h5](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco640/efficientdet-d6-640.h5), [ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco640/efficientdet-d6-640.tar.gz) |  47.9 | 33.8ms |
+| D2(640) [ckpt](https://storage.googleapis.com/cloud-tpu-checkpoints/efficientdet/coco640/efficientdet-d2-640.tar.gz) |  41.7 | 14.8ms |


Are these changes intended?

mingxingtan · 2020-09-19T18:57:16Z

efficientdet/README.md

+If set to True, strings defined by gradient_checkpointing_list (["Add"] by default) are searched in the tensors names and any tensors that match a string from the list are kept as checkpoints. When this option is used the standard tensorflow.python.ops.gradients method is being replaced with a custom method.
+
+Testing shows that:
+* On d4 network with batch-size of 1 (mixed precision enabled) it takes only 1/3.2 of memory with roughly 32% slower computation


Nice document!

mingxingtan · 2020-09-19T18:58:18Z

efficientdet/det_model_fn.py

+      from third_party.grad_checkpoint \
+          import memory_saving_gradients  # pylint: disable=g-import-not-at-top
+      from tensorflow.python.ops \
+          import gradients  # pylint: disable=g-import-not-at-top


These imports can probably fit into a single line (try to avoid "").

mingxingtan · 2020-09-19T18:58:45Z

efficientdet/det_model_fn.py

+    if params["nvgpu_logging"]:
+      try:
+        from third_party.tools import nvgpu  # pylint: disable=g-import-not-at-top
+        from functools import reduce  # pylint: disable=g-import-not-at-top


just import functools, and use functools.reduce

mingxingtan · 2020-09-19T19:05:11Z

efficientdet/det_model_fn.py

+        from functools import reduce  # pylint: disable=g-import-not-at-top
+
+        def get_nested_value(d, path):
+          return reduce(dict.get, path, d)


Can we move most of the code to nvgpu, so this file can be clean? thanks.

For example: nvgpu_gpu_info and commonsize and formatter_log can be moved to nvgpu.

mingxingtan · 2020-09-19T19:06:26Z

efficientdet/hparams_config.py

@@ -161,6 +161,8 @@ def as_dict(self):
      else:
        config_dict[k] = copy.deepcopy(v)
    return config_dict
+
+
 # pylint: enable=protected-access



Maybe you can move "# pylint: enable=protected-access" right after return (with same indent), to avoid too many empty lines.

mingxingtan · 2020-09-19T19:07:34Z

efficientdet/hparams_config.py

@@ -281,6 +283,13 @@ def default_detection_configs():
  h.dataset_type = None
  h.positives_momentum = None

+  # Reduces memory during training
+  h.gradient_checkpointing = False
+  h.gradient_checkpointing_list = ["Add"]


Could you add a comment to explain what values can be used other than "Add"?

Thanks for adding more details. Could you explain a little bit more: what's the impact of this list?

If I use ["Add"], does it mean it would automatically checkpoint all "Add" operation?
If I use ['Add', 'Sigmoid'], does it mean it would automatically checkpoint all 'Add' and 'Sigmoid" ops?

If so, what's the pros and cons for adding more ops, and why the default is 'Add'?

Sorry if these questions annoy you, but I am hoping to make it clear as this is a greatly useful feature. Thanks!

mingxingtan · 2020-09-19T19:08:40Z

efficientdet/main.py

@@ -117,6 +116,38 @@
    'run in a separate process for train and eval and memory will be cleared.'
    'Drawback: need to kill 2 processes if trainining needs to be interrupted.')

+flags.DEFINE_bool(
+    'gradient_checkpointing', False,


You don't need to define flags here since they are already in hparams.

mingxingtan

Just some minor comment. Overall looks good. Thanks!

mingxingtan · 2020-09-20T21:34:46Z

efficientdet/hparams_config.py

@@ -281,6 +283,13 @@ def default_detection_configs():
  h.dataset_type = None
  h.positives_momentum = None

+  # Reduces memory during training
+  h.gradient_checkpointing = False
+  h.gradient_checkpointing_list = ["Add"]


Thanks for adding more details. Could you explain a little bit more: what's the impact of this list?

If I use ["Add"], does it mean it would automatically checkpoint all "Add" operation?
If I use ['Add', 'Sigmoid'], does it mean it would automatically checkpoint all 'Add' and 'Sigmoid" ops?

If so, what's the pros and cons for adding more ops, and why the default is 'Add'?

Sorry if these questions annoy you, but I am hoping to make it clear as this is a greatly useful feature. Thanks!

mingxingtan · 2020-09-20T21:37:23Z

efficientdet/third_party/tools/nvgpu.py

+
+
+def commonsize(inp):
+  """Convert all to MiB."""


how about a more informative name such as 'input_size'? Similar, you can rename 'inp_' to 'converted_size' or 'output_size'

* add option each_epoch_in_separate_process * typos in description * comments wording * h.each_epoch_in_separate_process = True in default * renamed option to run_epoch_in_child_process to avoid confusion * flags.run_epoch_in_child_process also set to True in default * h.run_epoch_in_child_process = True : don't need this config * replaced lambda function with functools.partial to get read of pylint warning * gradient checkpointing * gradient checkpointing * gradient checkpointing * remove .ropeproject * description enhancement * description cleanup * gradient checkpoint libraries * deleted graph edtor and gradient checkpointing libraris from this branch * log message * remove BUILD * added back to master * logging * graph_editor and gradient checkpointing libs * deleted: graph_editor/BUILD * readme * readme * Copyright of gradient checkpointing * redo * redo * third_party linted * README * README * merge conflict typo * merge conflict typo * renaming * no log level reset * no log level reset * logging of step per epoch is no longer correct in the latest train_and_eval mode * add a bit of verbosity to avoid frustration during graph rebuld * readme * readme * less user discretion * replaced third party nvgpu with intenal module * replaced third party nvgpu with intenal module * replaced third party nvgpu with intenal module * comments added * carve out toposort and include it here * refactor toposort based on this repo reqs * checkout third party * minor typo * cleanup * cleanup, comments

NikZak added 11 commits August 26, 2020 12:23

add option each_epoch_in_separate_process

012f92d

typos in description

79ff583

comments wording

7ec607a

h.each_epoch_in_separate_process = True in default

0426f58

renamed option to run_epoch_in_child_process to avoid confusion

433167a

flags.run_epoch_in_child_process also set to True in default

a2811ff

h.run_epoch_in_child_process = True : don't need this config

6ec564a

replaced lambda function with functools.partial to get read of pylint…

213f5ef

… warning

gradient checkpointing

5269096

gradient checkpointing

5fb26d1

gradient checkpointing

2029e42

google-cla bot added the cla: yes CLA has been signed. label Aug 28, 2020

remove .ropeproject

d2e864a

NikZak changed the title ~~Gradient checkpoint~~ Gradient checkpointing Aug 28, 2020

description enhancement

c690eb9

kartik4949 requested a review from mingxingtan August 28, 2020 09:53

fsx950223 reviewed Aug 28, 2020

View reviewed changes

mingxingtan reviewed Aug 28, 2020

View reviewed changes

fsx950223 mentioned this pull request Aug 30, 2020

add graph gradient for gradient checkpointing usage #714

Closed

NikZak added 2 commits September 9, 2020 22:04

comments added

ad5edd0

Merge branch 'master' into gradient_checkpoint

1b5ca8f

NikZak added 4 commits September 11, 2020 13:38

carve out toposort and include it here

23d63d9

Merge branch 'master' into gradient_checkpoint

657c877

refactor toposort based on this repo reqs

f768439

Merge branch 'master' into gradient_checkpoint

9771a8b

NikZak added 5 commits September 19, 2020 15:15

checkout third party

6026eae

solved merge upstream conflicts

697b0aa

Merge remote-tracking branch 'upstream/master'

e3dcadb

minor typo

6eecfca

Merge branch 'master' into gradient_checkpoint

5214c15

mingxingtan reviewed Sep 19, 2020

View reviewed changes

NikZak added 3 commits September 20, 2020 16:59

Merge branch 'master' of https://github.com/google/automl

cd09613

cleanup

913b5bf

Merge branch 'master' into gradient_checkpoint

d7da5b1

NikZak requested a review from mingxingtan September 20, 2020 15:18

mingxingtan reviewed Sep 20, 2020

View reviewed changes

NikZak added 2 commits September 21, 2020 09:30

cleanup, comments

ed54123

Merge branch 'master' into gradient_checkpoint

ffb122e

mingxingtan merged commit 6ab70e1 into google:master Sep 21, 2020

NikZak deleted the gradient_checkpoint branch September 21, 2020 02:41

		@@ -0,0 +1,162 @@
		# Description:
		# contains parts of TensorFlow that are experimental or unstable and which are not supported.

		@@ -0,0 +1,41 @@
		# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
		#

Gradient checkpointing #711

Gradient checkpointing #711

Conversation

NikZak commented Aug 28, 2020 • edited Loading

ghost commented Aug 28, 2020

kartik4949 commented Aug 28, 2020 • edited Loading

fsx950223 commented Aug 28, 2020

NikZak commented Aug 28, 2020 • edited Loading

kartik4949 commented Aug 28, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kartik4949 commented Aug 28, 2020 • edited Loading

NikZak commented Aug 28, 2020 • edited Loading

kartik4949 commented Aug 28, 2020

mingxingtan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NikZak Aug 29, 2020 • edited Loading

Choose a reason for hiding this comment

mingxingtan commented Aug 28, 2020

kartik4949 commented Aug 28, 2020

NikZak commented Aug 29, 2020 • edited Loading

williamhyin commented Sep 10, 2020 • edited Loading

NikZak commented Sep 10, 2020 • edited Loading

williamhyin commented Sep 10, 2020 • edited Loading

NikZak commented Sep 10, 2020 • edited Loading

williamhyin commented Sep 10, 2020 • edited Loading

LucasSloan commented Sep 11, 2020

williamhyin commented Sep 12, 2020

NikZak commented Sep 12, 2020 • edited Loading

mingxingtan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mingxingtan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NikZak commented Aug 28, 2020 •

edited

Loading

kartik4949 commented Aug 28, 2020 •

edited

Loading

NikZak commented Aug 28, 2020 •

edited

Loading

kartik4949 commented Aug 28, 2020 •

edited

Loading

NikZak commented Aug 28, 2020 •

edited

Loading

NikZak Aug 29, 2020 •

edited

Loading

NikZak commented Aug 29, 2020 •

edited

Loading

williamhyin commented Sep 10, 2020 •

edited

Loading

NikZak commented Sep 10, 2020 •

edited

Loading

williamhyin commented Sep 10, 2020 •

edited

Loading

NikZak commented Sep 10, 2020 •

edited

Loading

williamhyin commented Sep 10, 2020 •

edited

Loading

NikZak commented Sep 12, 2020 •

edited

Loading