lvwerra HF staff commited on
Commit
7c1ce28
·
1 Parent(s): 68a01c2
Files changed (2) hide show
  1. dist/index.html +1 -1
  2. src/index.html +1 -1
dist/index.html CHANGED
@@ -1122,7 +1122,7 @@
1122
 
1123
  <p><img alt="image.png" src="/assets/images/placeholder.png" /></p>
1124
 
1125
- The bubble still has the same size so our training efficiency is not significantly improved. However we only need to store activations for <d-math>p</d-math> micro-batches instead of <d-math>m</d-math> which quite reduce the activation memory explosion we had in the AFAB schedule. As a consequence we can add more microbatches which then will actually reduce the bubble.
1126
 
1127
  <p><img alt="image.png" src="/assets/images/placeholder.png" /></p>
1128
 
 
1122
 
1123
  <p><img alt="image.png" src="/assets/images/placeholder.png" /></p>
1124
 
1125
+ <p>The bubble still has the same size so our training efficiency is not significantly improved. However we only need to store activations for <d-math>p</d-math> micro-batches instead of <d-math>m</d-math> which quite reduce the activation memory explosion we had in the AFAB schedule. As a consequence we can add more microbatches which then will actually reduce the bubble.</p>
1126
 
1127
  <p><img alt="image.png" src="/assets/images/placeholder.png" /></p>
1128
 
src/index.html CHANGED
@@ -1122,7 +1122,7 @@
1122
 
1123
  <p><img alt="image.png" src="/assets/images/placeholder.png" /></p>
1124
 
1125
- The bubble still has the same size so our training efficiency is not significantly improved. However we only need to store activations for <d-math>p</d-math> micro-batches instead of <d-math>m</d-math> which quite reduce the activation memory explosion we had in the AFAB schedule. As a consequence we can add more microbatches which then will actually reduce the bubble.
1126
 
1127
  <p><img alt="image.png" src="/assets/images/placeholder.png" /></p>
1128
 
 
1122
 
1123
  <p><img alt="image.png" src="/assets/images/placeholder.png" /></p>
1124
 
1125
+ <p>The bubble still has the same size so our training efficiency is not significantly improved. However we only need to store activations for <d-math>p</d-math> micro-batches instead of <d-math>m</d-math> which quite reduce the activation memory explosion we had in the AFAB schedule. As a consequence we can add more microbatches which then will actually reduce the bubble.</p>
1126
 
1127
  <p><img alt="image.png" src="/assets/images/placeholder.png" /></p>
1128