hynky HF staff commited on
Commit
f869662
·
2 Parent(s): d3b8b05 894c434

Merge remote-tracking branch 'origin/main' into small_optims

Browse files
assets/images/5D_nutshell_tp_sp.svg CHANGED
assets/images/5d_nutshell_cp.svg CHANGED
assets/images/5d_nutshell_ep.svg CHANGED
dist/index.html CHANGED
@@ -1660,7 +1660,7 @@
1660
 
1661
  <p><strong>Tensor Parallelism</strong> (with Sequence Parallelism) is naturally complementary and can be combined with both Pipeline Parallelism and ZeRO-3 as it relies on the distributive property of matrix multiplications which allows weights and activations to be sharded and computed independently before being combined.</p>
1662
 
1663
- <img alt="TP & SP diagram" src="/assets/images/5D_nutshell_tp_sp.svg" style="width: 1000px; max-width: none;" />
1664
  <!-- <p><img alt="image.png" src="/assets/images/placeholder.png" /></p> -->
1665
 
1666
 
 
1660
 
1661
  <p><strong>Tensor Parallelism</strong> (with Sequence Parallelism) is naturally complementary and can be combined with both Pipeline Parallelism and ZeRO-3 as it relies on the distributive property of matrix multiplications which allows weights and activations to be sharded and computed independently before being combined.</p>
1662
 
1663
+ <img alt="TP & SP diagram" src="/assets/images/5d_nutshell_tp_sp.svg" style="width: 1000px; max-width: none;" />
1664
  <!-- <p><img alt="image.png" src="/assets/images/placeholder.png" /></p> -->
1665
 
1666
 
src/index.html CHANGED
@@ -1660,7 +1660,7 @@
1660
 
1661
  <p><strong>Tensor Parallelism</strong> (with Sequence Parallelism) is naturally complementary and can be combined with both Pipeline Parallelism and ZeRO-3 as it relies on the distributive property of matrix multiplications which allows weights and activations to be sharded and computed independently before being combined.</p>
1662
 
1663
- <img alt="TP & SP diagram" src="/assets/images/5D_nutshell_tp_sp.svg" style="width: 1000px; max-width: none;" />
1664
  <!-- <p><img alt="image.png" src="/assets/images/placeholder.png" /></p> -->
1665
 
1666
 
 
1660
 
1661
  <p><strong>Tensor Parallelism</strong> (with Sequence Parallelism) is naturally complementary and can be combined with both Pipeline Parallelism and ZeRO-3 as it relies on the distributive property of matrix multiplications which allows weights and activations to be sharded and computed independently before being combined.</p>
1662
 
1663
+ <img alt="TP & SP diagram" src="/assets/images/5d_nutshell_tp_sp.svg" style="width: 1000px; max-width: none;" />
1664
  <!-- <p><img alt="image.png" src="/assets/images/placeholder.png" /></p> -->
1665
 
1666