From e0e0d8172add589feb8240fd371acfaab856d9c9 Mon Sep 17 00:00:00 2001
From: Stas Bekman <stas00@users.noreply.github.com>
Date: Fri, 4 Dec 2020 14:02:46 -0800
Subject: [PATCH 1/2] document the caveat of leaky native amp

---
 examples/seq2seq/README.md | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/examples/seq2seq/README.md b/examples/seq2seq/README.md
index d025d46c973475..6cf50cba9b4aae 100644
--- a/examples/seq2seq/README.md
+++ b/examples/seq2seq/README.md
@@ -79,6 +79,11 @@ test.target
 ```
 The `.source` files are the input, the `.target` files are the desired output.
 
+### Potential issues
+
+- native AMP (`--fp16` and no apex) may lead to a huge memory leak and require 10x gpu memory. This has been fixed in pytorch-nightly and the minimal official version to have this fix will be pytorch-1.8. Until then if you have to use amp please use NVIDIA's apex. Reference: https://github.com/huggingface/transformers/issues/8403
+
+
 ### Tips and Tricks
 
 General Tips:

From 9b60185b8c5d0df7de4c5222ad66fb859a761fde Mon Sep 17 00:00:00 2001
From: Stas Bekman <stas00@users.noreply.github.com>
Date: Fri, 4 Dec 2020 15:43:09 -0800
Subject: [PATCH 2/2] Update examples/seq2seq/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
---
 examples/seq2seq/README.md | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/examples/seq2seq/README.md b/examples/seq2seq/README.md
index 6cf50cba9b4aae..6ac3cf8d7df4ab 100644
--- a/examples/seq2seq/README.md
+++ b/examples/seq2seq/README.md
@@ -81,7 +81,7 @@ The `.source` files are the input, the `.target` files are the desired output.
 
 ### Potential issues
 
-- native AMP (`--fp16` and no apex) may lead to a huge memory leak and require 10x gpu memory. This has been fixed in pytorch-nightly and the minimal official version to have this fix will be pytorch-1.8. Until then if you have to use amp please use NVIDIA's apex. Reference: https://github.com/huggingface/transformers/issues/8403
+- native AMP (`--fp16` and no apex) may lead to a huge memory leak and require 10x gpu memory. This has been fixed in pytorch-nightly and the minimal official version to have this fix will be pytorch-1.8. Until then if you have to use mixed precision please use AMP only with pytorch-nightly or NVIDIA's apex. Reference: https://github.com/huggingface/transformers/issues/8403
 
 
 ### Tips and Tricks
@@ -597,4 +597,3 @@ The feature is still experimental, because:
 + we can make it much more robust if we have memory mapped/preprocessed datasets.
 + The speedup over sortish sampler is not that large at the moment.
 
-