Spaces:
Running
Running
fix
Browse files- dist/index.html +1 -1
- src/index.html +1 -1
dist/index.html
CHANGED
@@ -1122,7 +1122,7 @@
|
|
1122 |
|
1123 |
<p><img alt="image.png" src="/assets/images/placeholder.png" /></p>
|
1124 |
|
1125 |
-
The bubble still has the same size so our training efficiency is not significantly improved. However we only need to store activations for <d-math>p</d-math> micro-batches instead of <d-math>m</d-math> which quite reduce the activation memory explosion we had in the AFAB schedule. As a consequence we can add more microbatches which then will actually reduce the bubble
|
1126 |
|
1127 |
<p><img alt="image.png" src="/assets/images/placeholder.png" /></p>
|
1128 |
|
|
|
1122 |
|
1123 |
<p><img alt="image.png" src="/assets/images/placeholder.png" /></p>
|
1124 |
|
1125 |
+
<p>The bubble still has the same size so our training efficiency is not significantly improved. However we only need to store activations for <d-math>p</d-math> micro-batches instead of <d-math>m</d-math> which quite reduce the activation memory explosion we had in the AFAB schedule. As a consequence we can add more microbatches which then will actually reduce the bubble.</p>
|
1126 |
|
1127 |
<p><img alt="image.png" src="/assets/images/placeholder.png" /></p>
|
1128 |
|
src/index.html
CHANGED
@@ -1122,7 +1122,7 @@
|
|
1122 |
|
1123 |
<p><img alt="image.png" src="/assets/images/placeholder.png" /></p>
|
1124 |
|
1125 |
-
The bubble still has the same size so our training efficiency is not significantly improved. However we only need to store activations for <d-math>p</d-math> micro-batches instead of <d-math>m</d-math> which quite reduce the activation memory explosion we had in the AFAB schedule. As a consequence we can add more microbatches which then will actually reduce the bubble
|
1126 |
|
1127 |
<p><img alt="image.png" src="/assets/images/placeholder.png" /></p>
|
1128 |
|
|
|
1122 |
|
1123 |
<p><img alt="image.png" src="/assets/images/placeholder.png" /></p>
|
1124 |
|
1125 |
+
<p>The bubble still has the same size so our training efficiency is not significantly improved. However we only need to store activations for <d-math>p</d-math> micro-batches instead of <d-math>m</d-math> which quite reduce the activation memory explosion we had in the AFAB schedule. As a consequence we can add more microbatches which then will actually reduce the bubble.</p>
|
1126 |
|
1127 |
<p><img alt="image.png" src="/assets/images/placeholder.png" /></p>
|
1128 |
|