Spaces:

nanotron
/

ultrascale-playbook

Running

lvwerra HF staff commited on 12 days ago

Commit

7c1ce28

1 Parent(s): 68a01c2

fix

Files changed (2) hide show

dist/index.html CHANGED Viewed

@@ -1122,7 +1122,7 @@
         <p><img alt="image.png" src="/assets/images/placeholder.png" /></p>
-        The bubble still has the same size so our training efficiency is not significantly improved. However we only need to store activations for <d-math>p</d-math> micro-batches instead of <d-math>m</d-math> which quite reduce the activation memory explosion we had in the AFAB schedule. As a consequence we can add more microbatches which then will actually reduce the bubble.
         <p><img alt="image.png" src="/assets/images/placeholder.png" /></p>

         <p><img alt="image.png" src="/assets/images/placeholder.png" /></p>
+        <p>The bubble still has the same size so our training efficiency is not significantly improved. However we only need to store activations for <d-math>p</d-math> micro-batches instead of <d-math>m</d-math> which quite reduce the activation memory explosion we had in the AFAB schedule. As a consequence we can add more microbatches which then will actually reduce the bubble.</p>
         <p><img alt="image.png" src="/assets/images/placeholder.png" /></p>

src/index.html CHANGED Viewed

@@ -1122,7 +1122,7 @@
         <p><img alt="image.png" src="/assets/images/placeholder.png" /></p>
-        The bubble still has the same size so our training efficiency is not significantly improved. However we only need to store activations for <d-math>p</d-math> micro-batches instead of <d-math>m</d-math> which quite reduce the activation memory explosion we had in the AFAB schedule. As a consequence we can add more microbatches which then will actually reduce the bubble.
         <p><img alt="image.png" src="/assets/images/placeholder.png" /></p>

         <p><img alt="image.png" src="/assets/images/placeholder.png" /></p>
+        <p>The bubble still has the same size so our training efficiency is not significantly improved. However we only need to store activations for <d-math>p</d-math> micro-batches instead of <d-math>m</d-math> which quite reduce the activation memory explosion we had in the AFAB schedule. As a consequence we can add more microbatches which then will actually reduce the bubble.</p>
         <p><img alt="image.png" src="/assets/images/placeholder.png" /></p>