Spaces:
Running
Running
Merge branch 'main' of hf.co:spaces/nanotron/Nanotron-Gigablogpost
Browse files- dist/fragments/memory-profile.html +0 -0
- dist/fragments/memory-recomputation.html +1 -1
- dist/index.html +26 -35
- src/fragments/cp_8Bmemoryusage.html +1 -0
- src/fragments/dp_ourjourney_memoryusage.html +1 -0
- src/fragments/dp_scaling.html +1 -0
- src/fragments/memory-profile.html +0 -0
- src/fragments/memory-recomputation.html +1 -1
- src/fragments/memusage_activations.html +1 -0
- src/fragments/pp_bubblesize.html +1 -0
- src/fragments/pp_comm_bandwidth.html +1 -0
- src/fragments/pp_memoryusage.html +1 -0
- src/fragments/tp_memoryusage.html +1 -0
- src/fragments/tp_scaling.html +1 -0
- src/fragments/tp_sp_memoryusage.html +1 -0
- src/fragments/tp_sp_scaling.html +1 -0
- src/fragments/zero3_memoryusage.html +1 -0
- src/index.html +26 -35
dist/fragments/memory-profile.html
CHANGED
The diff for this file is too large to render.
See raw diff
|
|
dist/fragments/memory-recomputation.html
CHANGED
@@ -1 +1 @@
|
|
1 |
-
<div> <div id=3a29a7b6-5a6e-4e51-aabe-515dafefbd63 class=plotly-graph-div style="height:500px; width:850px;"></div> <script>window.PLOTLYENV=window.PLOTLYENV||{},document.getElementById("3a29a7b6-5a6e-4e51-aabe-515dafefbd63")&&Plotly.newPlot("3a29a7b6-5a6e-4e51-aabe-515dafefbd63",[{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(78, 165, 183)"},name:"parameters",showlegend:!0,visible:!0,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAEjnD0AAAAAASPcPQAAAAACkCxBAAAAAAKQrEEA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(227, 138, 66)"},name:"gradients",showlegend:!0,visible:!0,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAEjnD0AAAAAASPcPQAAAAACkCxBAAAAAAKQrEEA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(232, 137, 171)"},name:"optimizer",showlegend:!0,visible:!0,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAEjnH0AAAAAASPcfQAAAAACkCyBAAAAAAKQrIEA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(206, 192, 250)"},name:"activations",showlegend:!0,visible:!0,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAACEDEAAAAAAAEIoQAAAAAAAIUZAAAAAAIAQZUA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(78, 165, 183)"},name:"parameters",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAKWXKkAAAAAApZ0qQAAAAAClqSpAAAAAAKXBKkA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(227, 138, 66)"},name:"gradients",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAKWXKkAAAAAApZ0qQAAAAAClqSpAAAAAAKXBKkA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(232, 137, 171)"},name:"optimizer",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAKWXOkAAAAAApZ06QAAAAAClqTpAAAAAAKXBOkA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(206, 192, 250)"},name:"activations",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAABLGEAAAAAAgLUyQAAAAACA1U9AAAAAAMAKbUA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(78, 165, 183)"},name:"parameters",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAKL6OUAAAAAAov45QAAAAACiBjpAAAAAAKIWOkA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(227, 138, 66)"},name:"gradients",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAKL6OUAAAAAAov45QAAAAACiBjpAAAAAAKIWOkA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(232, 137, 171)"},name:"optimizer",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAKL6SUAAAAAAov5JQAAAAACiBkpAAAAAAKIWSkA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(206, 192, 250)"},name:"activations",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAACCIkAAAAAAAII8QAAAAAAAQVhAAAAAAIAgdkA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(78, 165, 183)"},name:"parameters",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAgER/bkAAAACARIBuQAAAAIBEgm5AAAAAgESGbkA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(227, 138, 66)"},name:"gradients",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAgER/bkAAAACARIBuQAAAAIBEgm5AAAAAgESGbkA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(232, 137, 171)"},name:"optimizer",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAgER/fkAAAACARIB+QAAAAIBEgn5AAAAAgESGfkA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(206, 192, 250)"},name:"activations",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAAAhR0AAAAAAgNBhQAAAAACAUH5AAAAAAECom0A="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(78, 165, 183)"},name:"parameters",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAgPa/l0AAAACANsCXQAAAAIC2wJdAAAAAgLbBl0A="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(227, 138, 66)"},name:"gradients",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAgPa/l0AAAACANsCXQAAAAIC2wJdAAAAAgLbBl0A="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(232, 137, 171)"},name:"optimizer",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAgPa/p0AAAACANsCnQAAAAIC2wKdAAAAAgLbBp0A="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(206, 192, 250)"},name:"activations",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAIA2YkAAAAAAgA58QAAAAABA35dAAAAAAKDHtUA="},type:"bar"}],{template:{data:{barpolar:[{marker:{line:{color:"white",width:.5},pattern:{fillmode:"overlay",size:10,solidity:.2}},type:"barpolar"}],bar:[{error_x:{color:"#2a3f5f"},error_y:{color:"#2a3f5f"},marker:{line:{color:"white",width:.5},pattern:{fillmode:"overlay",size:10,solidity:.2}},type:"bar"}],carpet:[{aaxis:{endlinecolor:"#2a3f5f",gridcolor:"#C8D4E3",linecolor:"#C8D4E3",minorgridcolor:"#C8D4E3",startlinecolor:"#2a3f5f"},baxis:{endlinecolor:"#2a3f5f",gridcolor:"#C8D4E3",linecolor:"#C8D4E3",minorgridcolor:"#C8D4E3",startlinecolor:"#2a3f5f"},type:"carpet"}],choropleth:[{colorbar:{outlinewidth:0,ticks:""},type:"choropleth"}],contourcarpet:[{colorbar:{outlinewidth:0,ticks:""},type:"contourcarpet"}],contour:[{colorbar:{outlinewidth:0,ticks:""},colorscale:[[0,"#0d0887"],[.1111111111111111,"#46039f"],[.2222222222222222,"#7201a8"],[.3333333333333333,"#9c179e"],[.4444444444444444,"#bd3786"],[.5555555555555556,"#d8576b"],[.6666666666666666,"#ed7953"],[.7777777777777778,"#fb9f3a"],[.8888888888888888,"#fdca26"],[1,"#f0f921"]],type:"contour"}],heatmap:[{colorbar:{outlinewidth:0,ticks:""},colorscale:[[0,"#0d0887"],[.1111111111111111,"#46039f"],[.2222222222222222,"#7201a8"],[.3333333333333333,"#9c179e"],[.4444444444444444,"#bd3786"],[.5555555555555556,"#d8576b"],[.6666666666666666,"#ed7953"],[.7777777777777778,"#fb9f3a"],[.8888888888888888,"#fdca26"],[1,"#f0f921"]],type:"heatmap"}],histogram2dcontour:[{colorbar:{outlinewidth:0,ticks:""},colorscale:[[0,"#0d0887"],[.1111111111111111,"#46039f"],[.2222222222222222,"#7201a8"],[.3333333333333333,"#9c179e"],[.4444444444444444,"#bd3786"],[.5555555555555556,"#d8576b"],[.6666666666666666,"#ed7953"],[.7777777777777778,"#fb9f3a"],[.8888888888888888,"#fdca26"],[1,"#f0f921"]],type:"histogram2dcontour"}],histogram2d:[{colorbar:{outlinewidth:0,ticks:""},colorscale:[[0,"#0d0887"],[.1111111111111111,"#46039f"],[.2222222222222222,"#7201a8"],[.3333333333333333,"#9c179e"],[.4444444444444444,"#bd3786"],[.5555555555555556,"#d8576b"],[.6666666666666666,"#ed7953"],[.7777777777777778,"#fb9f3a"],[.8888888888888888,"#fdca26"],[1,"#f0f921"]],type:"histogram2d"}],histogram:[{marker:{pattern:{fillmode:"overlay",size:10,solidity:.2}},type:"histogram"}],mesh3d:[{colorbar:{outlinewidth:0,ticks:""},type:"mesh3d"}],parcoords:[{line:{colorbar:{outlinewidth:0,ticks:""}},type:"parcoords"}],pie:[{automargin:!0,type:"pie"}],scatter3d:[{line:{colorbar:{outlinewidth:0,ticks:""}},marker:{colorbar:{outlinewidth:0,ticks:""}},type:"scatter3d"}],scattercarpet:[{marker:{colorbar:{outlinewidth:0,ticks:""}},type:"scattercarpet"}],scattergeo:[{marker:{colorbar:{outlinewidth:0,ticks:""}},type:"scattergeo"}],scattergl:[{marker:{colorbar:{outlinewidth:0,ticks:""}},type:"scattergl"}],scattermapbox:[{marker:{colorbar:{outlinewidth:0,ticks:""}},type:"scattermapbox"}],scattermap:[{marker:{colorbar:{outlinewidth:0,ticks:""}},type:"scattermap"}],scatterpolargl:[{marker:{colorbar:{outlinewidth:0,ticks:""}},type:"scatterpolargl"}],scatterpolar:[{marker:{colorbar:{outlinewidth:0,ticks:""}},type:"scatterpolar"}],scatter:[{fillpattern:{fillmode:"overlay",size:10,solidity:.2},type:"scatter"}],scatterternary:[{marker:{colorbar:{outlinewidth:0,ticks:""}},type:"scatterternary"}],surface:[{colorbar:{outlinewidth:0,ticks:""},colorscale:[[0,"#0d0887"],[.1111111111111111,"#46039f"],[.2222222222222222,"#7201a8"],[.3333333333333333,"#9c179e"],[.4444444444444444,"#bd3786"],[.5555555555555556,"#d8576b"],[.6666666666666666,"#ed7953"],[.7777777777777778,"#fb9f3a"],[.8888888888888888,"#fdca26"],[1,"#f0f921"]],type:"surface"}],table:[{cells:{fill:{color:"#EBF0F8"},line:{color:"white"}},header:{fill:{color:"#C8D4E3"},line:{color:"white"}},type:"table"}]},layout:{annotationdefaults:{arrowcolor:"#2a3f5f",arrowhead:0,arrowwidth:1},autotypenumbers:"strict",coloraxis:{colorbar:{outlinewidth:0,ticks:""}},colorscale:{diverging:[[0,"#8e0152"],[.1,"#c51b7d"],[.2,"#de77ae"],[.3,"#f1b6da"],[.4,"#fde0ef"],[.5,"#f7f7f7"],[.6,"#e6f5d0"],[.7,"#b8e186"],[.8,"#7fbc41"],[.9,"#4d9221"],[1,"#276419"]],sequential:[[0,"#0d0887"],[.1111111111111111,"#46039f"],[.2222222222222222,"#7201a8"],[.3333333333333333,"#9c179e"],[.4444444444444444,"#bd3786"],[.5555555555555556,"#d8576b"],[.6666666666666666,"#ed7953"],[.7777777777777778,"#fb9f3a"],[.8888888888888888,"#fdca26"],[1,"#f0f921"]],sequentialminus:[[0,"#0d0887"],[.1111111111111111,"#46039f"],[.2222222222222222,"#7201a8"],[.3333333333333333,"#9c179e"],[.4444444444444444,"#bd3786"],[.5555555555555556,"#d8576b"],[.6666666666666666,"#ed7953"],[.7777777777777778,"#fb9f3a"],[.8888888888888888,"#fdca26"],[1,"#f0f921"]]},colorway:["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],font:{color:"#2a3f5f"},geo:{bgcolor:"white",lakecolor:"white",landcolor:"white",showlakes:!0,showland:!0,subunitcolor:"#C8D4E3"},hoverlabel:{align:"left"},hovermode:"closest",mapbox:{style:"light"},paper_bgcolor:"white",plot_bgcolor:"white",polar:{angularaxis:{gridcolor:"#EBF0F8",linecolor:"#EBF0F8",ticks:""},bgcolor:"white",radialaxis:{gridcolor:"#EBF0F8",linecolor:"#EBF0F8",ticks:""}},scene:{xaxis:{backgroundcolor:"white",gridcolor:"#DFE8F3",gridwidth:2,linecolor:"#EBF0F8",showbackground:!0,ticks:"",zerolinecolor:"#EBF0F8"},yaxis:{backgroundcolor:"white",gridcolor:"#DFE8F3",gridwidth:2,linecolor:"#EBF0F8",showbackground:!0,ticks:"",zerolinecolor:"#EBF0F8"},zaxis:{backgroundcolor:"white",gridcolor:"#DFE8F3",gridwidth:2,linecolor:"#EBF0F8",showbackground:!0,ticks:"",zerolinecolor:"#EBF0F8"}},shapedefaults:{line:{color:"#2a3f5f"}},ternary:{aaxis:{gridcolor:"#DFE8F3",linecolor:"#A2B1C6",ticks:""},baxis:{gridcolor:"#DFE8F3",linecolor:"#A2B1C6",ticks:""},bgcolor:"white",caxis:{gridcolor:"#DFE8F3",linecolor:"#A2B1C6",ticks:""}},title:{x:.05},xaxis:{automargin:!0,gridcolor:"#EBF0F8",linecolor:"#EBF0F8",ticks:"",title:{standoff:15},zerolinecolor:"#EBF0F8",zerolinewidth:2},yaxis:{automargin:!0,gridcolor:"#EBF0F8",linecolor:"#EBF0F8",ticks:"",title:{standoff:15},zerolinecolor:"#EBF0F8",zerolinewidth:2}}},title:{text:"Memory Usage with Recomputation",x:.5,y:.95},margin:{r:80},legend:{y:.95,x:1.02,xanchor:"left",yanchor:"top"},updatemenus:[{active:0,buttons:[{args:[{visible:[!0,!0,!0,!0,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1]},{"yaxis.range":[0,203.1547058105469]}],label:"1B",method:"update"},{args:[{visible:[!1,!1,!1,!1,!0,!0,!0,!0,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1]},{"yaxis.range":[0,314.4336639404297]}],label:"3B",method:"update"},{args:[{visible:[!1,!1,!1,!1,!1,!1,!1,!1,!0,!0,!0,!0,!1,!1,!1,!1,!1,!1,!1,!1]},{"yaxis.range":[0,504.2233764648438]}],label:"8B",method:"update"},{args:[{visible:[!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!0,!0,!0,!0,!1,!1,!1,!1]},{"yaxis.range":[0,3021.530541992188]}],label:"70B",method:"update"},{args:[{visible:[!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!0,!0,!0,!0]},{"yaxis.range":[0,12823.0716796875]}],label:"405B",method:"update"}],direction:"down",showactive:!0,x:1.035,xanchor:"left",y:.6,yanchor:"top"},{active:0,buttons:[{args:[{y:[[3.9879302978515625,3.9957427978515625,4.0113677978515625,4.0426177978515625],[3.9879302978515625,3.9957427978515625,4.0113677978515625,4.0426177978515625],[7.975860595703125,7.991485595703125,8.022735595703125,8.085235595703125],[3.564453125,12.12890625,44.2578125,168.515625],[13.296180725097656,13.307899475097656,13.331336975097656,13.378211975097656],[13.296180725097656,13.307899475097656,13.331336975097656,13.378211975097656],[26.592361450195313,26.615798950195313,26.662673950195313,26.756423950195313],[6.0732421875,18.708984375,63.66796875,232.3359375],[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125],[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125],[51.95806884765625,51.98931884765625,52.05181884765625,52.17681884765625],[9.25390625,28.5078125,97.015625,354.03125],[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625],[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625],[487.9542236328125,488.0167236328125,488.1417236328125,488.3917236328125],[46.2578125,142.515625,485.03125,1770.0625],[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625],[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625],[3039.9814453125,3040.1064453125,3040.3564453125,3040.8564453125],[145.703125,448.90625,1527.8125,5575.625]]}],label:"None",method:"restyle"},{args:[{y:[[3.9879302978515625,3.9957427978515625,4.0113677978515625,4.0426177978515625],[3.9879302978515625,3.9957427978515625,4.0113677978515625,4.0426177978515625],[7.975860595703125,7.991485595703125,8.022735595703125,8.085235595703125],[1.064453125,2.12890625,4.2578125,8.515625],[13.296180725097656,13.307899475097656,13.331336975097656,13.378211975097656],[13.296180725097656,13.307899475097656,13.331336975097656,13.378211975097656],[26.592361450195313,26.615798950195313,26.662673950195313,26.756423950195313],[2.7919921875,5.583984375,11.16796875,22.3359375],[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125],[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125],[51.95806884765625,51.98931884765625,52.05181884765625,52.17681884765625],[4.25390625,8.5078125,17.015625,34.03125],[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625],[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625],[487.9542236328125,488.0167236328125,488.1417236328125,488.3917236328125],[21.2578125,42.515625,85.03125,170.0625],[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625],[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625],[3039.9814453125,3040.1064453125,3040.3564453125,3040.8564453125],[66.953125,133.90625,267.8125,535.625]]}],label:"selective",method:"restyle"},{args:[{y:[[3.9879302978515625,3.9957427978515625,4.0113677978515625,4.0426177978515625],[3.9879302978515625,3.9957427978515625,4.0113677978515625,4.0426177978515625],[7.975860595703125,7.991485595703125,8.022735595703125,8.085235595703125],[.064453125,.12890625,.2578125,.515625],[13.296180725097656,13.307899475097656,13.331336975097656,13.378211975097656],[13.296180725097656,13.307899475097656,13.331336975097656,13.378211975097656],[26.592361450195313,26.615798950195313,26.662673950195313,26.756423950195313],[.1669921875,.333984375,.66796875,1.3359375],[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125],[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125],[51.95806884765625,51.98931884765625,52.05181884765625,52.17681884765625],[.25390625,.5078125,1.015625,2.03125],[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625],[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625],[487.9542236328125,488.0167236328125,488.1417236328125,488.3917236328125],[1.2578125,2.515625,5.03125,10.0625],[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625],[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625],[3039.9814453125,3040.1064453125,3040.3564453125,3040.8564453125],[3.953125,7.90625,15.8125,31.625]]}],label:"full",method:"restyle"}],direction:"down",showactive:!0,x:1.035,xanchor:"left",y:.4,yanchor:"top"}],barmode:"stack",xaxis:{title:{text:"Sequence Length"}},yaxis:{title:{text:"Memory (GB)"},range:[0,203.1547058105469]},width:850,height:500,annotations:[{showarrow:!1,text:"Model Size:",x:1.035,xanchor:"left",xref:"paper",y:.6,yanchor:"bottom",yref:"paper"},{showarrow:!1,text:"Recomputation:",x:1.035,xanchor:"left",xref:"paper",y:.4,yanchor:"bottom",yref:"paper"}]},{responsive:!0,scrollZoom:!1})</script> </div>
|
|
|
1 |
+
<div> <div id=3a29a7b6-5a6e-4e51-aabe-515dafefbd63 class=plotly-graph-div style="height:500px; width:850px;"></div> <script>window.PLOTLYENV=window.PLOTLYENV||{},document.getElementById("3a29a7b6-5a6e-4e51-aabe-515dafefbd63")&&Plotly.newPlot("3a29a7b6-5a6e-4e51-aabe-515dafefbd63",[{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(78, 165, 183)"},name:"parameters",showlegend:!0,visible:!0,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAEjnD0AAAAAASPcPQAAAAACkCxBAAAAAAKQrEEA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(227, 138, 66)"},name:"gradients",showlegend:!0,visible:!0,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAEjnD0AAAAAASPcPQAAAAACkCxBAAAAAAKQrEEA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(232, 137, 171)"},name:"optimizer",showlegend:!0,visible:!0,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAEjnH0AAAAAASPcfQAAAAACkCyBAAAAAAKQrIEA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(206, 192, 250)"},name:"activations",showlegend:!0,visible:!0,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAACEDEAAAAAAAEIoQAAAAAAAIUZAAAAAAIAQZUA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(78, 165, 183)"},name:"parameters",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAKWXKkAAAAAApZ0qQAAAAAClqSpAAAAAAKXBKkA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(227, 138, 66)"},name:"gradients",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAKWXKkAAAAAApZ0qQAAAAAClqSpAAAAAAKXBKkA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(232, 137, 171)"},name:"optimizer",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAKWXOkAAAAAApZ06QAAAAAClqTpAAAAAAKXBOkA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(206, 192, 250)"},name:"activations",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAABLGEAAAAAAgLUyQAAAAACA1U9AAAAAAMAKbUA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(78, 165, 183)"},name:"parameters",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAKL6OUAAAAAAov45QAAAAACiBjpAAAAAAKIWOkA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(227, 138, 66)"},name:"gradients",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAKL6OUAAAAAAov45QAAAAACiBjpAAAAAAKIWOkA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(232, 137, 171)"},name:"optimizer",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAKL6SUAAAAAAov5JQAAAAACiBkpAAAAAAKIWSkA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(206, 192, 250)"},name:"activations",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAACCIkAAAAAAAII8QAAAAAAAQVhAAAAAAIAgdkA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(78, 165, 183)"},name:"parameters",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAgER/bkAAAACARIBuQAAAAIBEgm5AAAAAgESGbkA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(227, 138, 66)"},name:"gradients",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAgER/bkAAAACARIBuQAAAAIBEgm5AAAAAgESGbkA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(232, 137, 171)"},name:"optimizer",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAgER/fkAAAACARIB+QAAAAIBEgn5AAAAAgESGfkA="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(206, 192, 250)"},name:"activations",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAAAhR0AAAAAAgNBhQAAAAACAUH5AAAAAAECom0A="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(78, 165, 183)"},name:"parameters",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAgPa/l0AAAACANsCXQAAAAIC2wJdAAAAAgLbBl0A="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(227, 138, 66)"},name:"gradients",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAgPa/l0AAAACANsCXQAAAAIC2wJdAAAAAgLbBl0A="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(232, 137, 171)"},name:"optimizer",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAgPa/p0AAAACANsCnQAAAAIC2wKdAAAAAgLbBp0A="},type:"bar"},{hovertemplate:"Seq len=%{x}<br>Mem=%{y:.1f}GB<br>%{data.name}<extra></extra>",marker:{color:"rgb(206, 192, 250)"},name:"activations",showlegend:!0,visible:!1,x:["1024","2048","4096","8192"],y:{dtype:"f8",bdata:"AAAAAIA2YkAAAAAAgA58QAAAAABA35dAAAAAAKDHtUA="},type:"bar"}],{template:{data:{barpolar:[{marker:{line:{color:"white",width:.5},pattern:{fillmode:"overlay",size:10,solidity:.2}},type:"barpolar"}],bar:[{error_x:{color:"#2a3f5f"},error_y:{color:"#2a3f5f"},marker:{line:{color:"white",width:.5},pattern:{fillmode:"overlay",size:10,solidity:.2}},type:"bar"}],carpet:[{aaxis:{endlinecolor:"#2a3f5f",gridcolor:"#C8D4E3",linecolor:"#C8D4E3",minorgridcolor:"#C8D4E3",startlinecolor:"#2a3f5f"},baxis:{endlinecolor:"#2a3f5f",gridcolor:"#C8D4E3",linecolor:"#C8D4E3",minorgridcolor:"#C8D4E3",startlinecolor:"#2a3f5f"},type:"carpet"}],choropleth:[{colorbar:{outlinewidth:0,ticks:""},type:"choropleth"}],contourcarpet:[{colorbar:{outlinewidth:0,ticks:""},type:"contourcarpet"}],contour:[{colorbar:{outlinewidth:0,ticks:""},colorscale:[[0,"#0d0887"],[.1111111111111111,"#46039f"],[.2222222222222222,"#7201a8"],[.3333333333333333,"#9c179e"],[.4444444444444444,"#bd3786"],[.5555555555555556,"#d8576b"],[.6666666666666666,"#ed7953"],[.7777777777777778,"#fb9f3a"],[.8888888888888888,"#fdca26"],[1,"#f0f921"]],type:"contour"}],heatmap:[{colorbar:{outlinewidth:0,ticks:""},colorscale:[[0,"#0d0887"],[.1111111111111111,"#46039f"],[.2222222222222222,"#7201a8"],[.3333333333333333,"#9c179e"],[.4444444444444444,"#bd3786"],[.5555555555555556,"#d8576b"],[.6666666666666666,"#ed7953"],[.7777777777777778,"#fb9f3a"],[.8888888888888888,"#fdca26"],[1,"#f0f921"]],type:"heatmap"}],histogram2dcontour:[{colorbar:{outlinewidth:0,ticks:""},colorscale:[[0,"#0d0887"],[.1111111111111111,"#46039f"],[.2222222222222222,"#7201a8"],[.3333333333333333,"#9c179e"],[.4444444444444444,"#bd3786"],[.5555555555555556,"#d8576b"],[.6666666666666666,"#ed7953"],[.7777777777777778,"#fb9f3a"],[.8888888888888888,"#fdca26"],[1,"#f0f921"]],type:"histogram2dcontour"}],histogram2d:[{colorbar:{outlinewidth:0,ticks:""},colorscale:[[0,"#0d0887"],[.1111111111111111,"#46039f"],[.2222222222222222,"#7201a8"],[.3333333333333333,"#9c179e"],[.4444444444444444,"#bd3786"],[.5555555555555556,"#d8576b"],[.6666666666666666,"#ed7953"],[.7777777777777778,"#fb9f3a"],[.8888888888888888,"#fdca26"],[1,"#f0f921"]],type:"histogram2d"}],histogram:[{marker:{pattern:{fillmode:"overlay",size:10,solidity:.2}},type:"histogram"}],mesh3d:[{colorbar:{outlinewidth:0,ticks:""},type:"mesh3d"}],parcoords:[{line:{colorbar:{outlinewidth:0,ticks:""}},type:"parcoords"}],pie:[{automargin:!0,type:"pie"}],scatter3d:[{line:{colorbar:{outlinewidth:0,ticks:""}},marker:{colorbar:{outlinewidth:0,ticks:""}},type:"scatter3d"}],scattercarpet:[{marker:{colorbar:{outlinewidth:0,ticks:""}},type:"scattercarpet"}],scattergeo:[{marker:{colorbar:{outlinewidth:0,ticks:""}},type:"scattergeo"}],scattergl:[{marker:{colorbar:{outlinewidth:0,ticks:""}},type:"scattergl"}],scattermapbox:[{marker:{colorbar:{outlinewidth:0,ticks:""}},type:"scattermapbox"}],scattermap:[{marker:{colorbar:{outlinewidth:0,ticks:""}},type:"scattermap"}],scatterpolargl:[{marker:{colorbar:{outlinewidth:0,ticks:""}},type:"scatterpolargl"}],scatterpolar:[{marker:{colorbar:{outlinewidth:0,ticks:""}},type:"scatterpolar"}],scatter:[{fillpattern:{fillmode:"overlay",size:10,solidity:.2},type:"scatter"}],scatterternary:[{marker:{colorbar:{outlinewidth:0,ticks:""}},type:"scatterternary"}],surface:[{colorbar:{outlinewidth:0,ticks:""},colorscale:[[0,"#0d0887"],[.1111111111111111,"#46039f"],[.2222222222222222,"#7201a8"],[.3333333333333333,"#9c179e"],[.4444444444444444,"#bd3786"],[.5555555555555556,"#d8576b"],[.6666666666666666,"#ed7953"],[.7777777777777778,"#fb9f3a"],[.8888888888888888,"#fdca26"],[1,"#f0f921"]],type:"surface"}],table:[{cells:{fill:{color:"#EBF0F8"},line:{color:"white"}},header:{fill:{color:"#C8D4E3"},line:{color:"white"}},type:"table"}]},layout:{annotationdefaults:{arrowcolor:"#2a3f5f",arrowhead:0,arrowwidth:1},autotypenumbers:"strict",coloraxis:{colorbar:{outlinewidth:0,ticks:""}},colorscale:{diverging:[[0,"#8e0152"],[.1,"#c51b7d"],[.2,"#de77ae"],[.3,"#f1b6da"],[.4,"#fde0ef"],[.5,"#f7f7f7"],[.6,"#e6f5d0"],[.7,"#b8e186"],[.8,"#7fbc41"],[.9,"#4d9221"],[1,"#276419"]],sequential:[[0,"#0d0887"],[.1111111111111111,"#46039f"],[.2222222222222222,"#7201a8"],[.3333333333333333,"#9c179e"],[.4444444444444444,"#bd3786"],[.5555555555555556,"#d8576b"],[.6666666666666666,"#ed7953"],[.7777777777777778,"#fb9f3a"],[.8888888888888888,"#fdca26"],[1,"#f0f921"]],sequentialminus:[[0,"#0d0887"],[.1111111111111111,"#46039f"],[.2222222222222222,"#7201a8"],[.3333333333333333,"#9c179e"],[.4444444444444444,"#bd3786"],[.5555555555555556,"#d8576b"],[.6666666666666666,"#ed7953"],[.7777777777777778,"#fb9f3a"],[.8888888888888888,"#fdca26"],[1,"#f0f921"]]},colorway:["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],font:{color:"#2a3f5f"},geo:{bgcolor:"white",lakecolor:"white",landcolor:"white",showlakes:!0,showland:!0,subunitcolor:"#C8D4E3"},hoverlabel:{align:"left"},hovermode:"closest",mapbox:{style:"light"},paper_bgcolor:"white",plot_bgcolor:"white",polar:{angularaxis:{gridcolor:"#EBF0F8",linecolor:"#EBF0F8",ticks:""},bgcolor:"white",radialaxis:{gridcolor:"#EBF0F8",linecolor:"#EBF0F8",ticks:""}},scene:{xaxis:{backgroundcolor:"white",gridcolor:"#DFE8F3",gridwidth:2,linecolor:"#EBF0F8",showbackground:!0,ticks:"",zerolinecolor:"#EBF0F8"},yaxis:{backgroundcolor:"white",gridcolor:"#DFE8F3",gridwidth:2,linecolor:"#EBF0F8",showbackground:!0,ticks:"",zerolinecolor:"#EBF0F8"},zaxis:{backgroundcolor:"white",gridcolor:"#DFE8F3",gridwidth:2,linecolor:"#EBF0F8",showbackground:!0,ticks:"",zerolinecolor:"#EBF0F8"}},shapedefaults:{line:{color:"#2a3f5f"}},ternary:{aaxis:{gridcolor:"#DFE8F3",linecolor:"#A2B1C6",ticks:""},baxis:{gridcolor:"#DFE8F3",linecolor:"#A2B1C6",ticks:""},bgcolor:"white",caxis:{gridcolor:"#DFE8F3",linecolor:"#A2B1C6",ticks:""}},title:{x:.05},xaxis:{automargin:!0,gridcolor:"#EBF0F8",linecolor:"#EBF0F8",ticks:"",title:{standoff:15},zerolinecolor:"#EBF0F8",zerolinewidth:2},yaxis:{automargin:!0,gridcolor:"#EBF0F8",linecolor:"#EBF0F8",ticks:"",title:{standoff:15},zerolinecolor:"#EBF0F8",zerolinewidth:2}}},title:{text:"Memory Usage with Recomputation",x:.5,y:.95},margin:{r:80},legend:{y:.95,x:1.02,xanchor:"left",yanchor:"top"},updatemenus:[{active:0,buttons:[{args:[{visible:[!0,!0,!0,!0,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1]},{"yaxis.range":[0,203.1547058105469]}],label:"1B",method:"update"},{args:[{visible:[!1,!1,!1,!1,!0,!0,!0,!0,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1]},{"yaxis.range":[0,314.4336639404297]}],label:"3B",method:"update"},{args:[{visible:[!1,!1,!1,!1,!1,!1,!1,!1,!0,!0,!0,!0,!1,!1,!1,!1,!1,!1,!1,!1]},{"yaxis.range":[0,504.2233764648438]}],label:"8B",method:"update"},{args:[{visible:[!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!0,!0,!0,!0,!1,!1,!1,!1]},{"yaxis.range":[0,3021.530541992188]}],label:"70B",method:"update"},{args:[{visible:[!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!1,!0,!0,!0,!0]},{"yaxis.range":[0,12823.0716796875]}],label:"405B",method:"update"}],direction:"down",showactive:!0,x:1.035,xanchor:"left",y:.6,yanchor:"top"},{active:0,buttons:[{args:[{y:[[3.9879302978515625,3.9957427978515625,4.0113677978515625,4.0426177978515625],[3.9879302978515625,3.9957427978515625,4.0113677978515625,4.0426177978515625],[7.975860595703125,7.991485595703125,8.022735595703125,8.085235595703125],[3.564453125,12.12890625,44.2578125,168.515625],[13.296180725097656,13.307899475097656,13.331336975097656,13.378211975097656],[13.296180725097656,13.307899475097656,13.331336975097656,13.378211975097656],[26.592361450195313,26.615798950195313,26.662673950195313,26.756423950195313],[6.0732421875,18.708984375,63.66796875,232.3359375],[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125],[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125],[51.95806884765625,51.98931884765625,52.05181884765625,52.17681884765625],[9.25390625,28.5078125,97.015625,354.03125],[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625],[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625],[487.9542236328125,488.0167236328125,488.1417236328125,488.3917236328125],[46.2578125,142.515625,485.03125,1770.0625],[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625],[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625],[3039.9814453125,3040.1064453125,3040.3564453125,3040.8564453125],[145.703125,448.90625,1527.8125,5575.625]]}],label:"None",method:"restyle"},{args:[{y:[[3.9879302978515625,3.9957427978515625,4.0113677978515625,4.0426177978515625],[3.9879302978515625,3.9957427978515625,4.0113677978515625,4.0426177978515625],[7.975860595703125,7.991485595703125,8.022735595703125,8.085235595703125],[1.064453125,2.12890625,4.2578125,8.515625],[13.296180725097656,13.307899475097656,13.331336975097656,13.378211975097656],[13.296180725097656,13.307899475097656,13.331336975097656,13.378211975097656],[26.592361450195313,26.615798950195313,26.662673950195313,26.756423950195313],[2.7919921875,5.583984375,11.16796875,22.3359375],[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125],[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125],[51.95806884765625,51.98931884765625,52.05181884765625,52.17681884765625],[4.25390625,8.5078125,17.015625,34.03125],[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625],[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625],[487.9542236328125,488.0167236328125,488.1417236328125,488.3917236328125],[21.2578125,42.515625,85.03125,170.0625],[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625],[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625],[3039.9814453125,3040.1064453125,3040.3564453125,3040.8564453125],[66.953125,133.90625,267.8125,535.625]]}],label:"selective",method:"restyle"},{args:[{y:[[3.9879302978515625,3.9957427978515625,4.0113677978515625,4.0426177978515625],[3.9879302978515625,3.9957427978515625,4.0113677978515625,4.0426177978515625],[7.975860595703125,7.991485595703125,8.022735595703125,8.085235595703125],[.064453125,.12890625,.2578125,.515625],[13.296180725097656,13.307899475097656,13.331336975097656,13.378211975097656],[13.296180725097656,13.307899475097656,13.331336975097656,13.378211975097656],[26.592361450195313,26.615798950195313,26.662673950195313,26.756423950195313],[.1669921875,.333984375,.66796875,1.3359375],[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125],[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125],[51.95806884765625,51.98931884765625,52.05181884765625,52.17681884765625],[.25390625,.5078125,1.015625,2.03125],[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625],[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625],[487.9542236328125,488.0167236328125,488.1417236328125,488.3917236328125],[1.2578125,2.515625,5.03125,10.0625],[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625],[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625],[3039.9814453125,3040.1064453125,3040.3564453125,3040.8564453125],[3.953125,7.90625,15.8125,31.625]]}],label:"full",method:"restyle"}],direction:"down",showactive:!0,x:1.035,xanchor:"left",y:.4,yanchor:"top"}],barmode:"stack",xaxis:{title:{text:"Sequence Length"}},yaxis:{title:{text:"Memory (GB)"},range:[0,203.1547058105469]},width:850,height:500,annotations:[{showarrow:!1,text:"Model Size:",x:1.035,xanchor:"left",xref:"paper",y:.6,yanchor:"bottom",yref:"paper"},{showarrow:!1,text:"Recomputation:",x:1.035,xanchor:"left",xref:"paper",y:.4,yanchor:"bottom",yref:"paper"}]},{responsive:!0,scrollZoom:!1})</script> </div>
|
dist/index.html
CHANGED
@@ -2407,53 +2407,45 @@
|
|
2407 |
<h2>Conclusion</h2>
|
2408 |
|
2409 |
|
2410 |
-
<p>Congratulations!
|
2411 |
|
2412 |
<p><img alt="image.png" src="/assets/images/conclusion_llama3_parallelism.png" /></p>
|
2413 |
|
2414 |
-
<p>
|
2415 |
|
2416 |
-
<p>
|
2417 |
-
|
2418 |
-
<p>Let’s start with a brief recap of all the things we covered in these past hours and days!</p>
|
2419 |
-
|
2420 |
-
<h3>What you learned</h3>
|
2421 |
-
|
2422 |
-
<p>Working through this whole blog post you mastered a ranged of concepts:</p>
|
2423 |
-
|
2424 |
-
<ul>
|
2425 |
-
<li>Basic principle of model training</li>
|
2426 |
-
<li>Collective communication primitives </li>
|
2427 |
-
<li>Memory anatomy of a LLM</li>
|
2428 |
-
<li>Distributed training with DP and ZeRO </li>
|
2429 |
-
<li>Model parallelism with TP, SP, CP and PP</li>
|
2430 |
-
<li>Fast kernels and mixed precision training</li>
|
2431 |
-
<li>Overlapping communication and computation</li>
|
2432 |
-
<li>Profiling distributed training</li>
|
2433 |
-
</ul>
|
2434 |
|
2435 |
-
<p>
|
2436 |
|
2437 |
<h3>What we learned</h3>
|
2438 |
|
2439 |
-
<p>
|
|
|
|
|
|
|
|
|
|
|
2440 |
</p>
|
2441 |
|
2442 |
<ul>
|
2443 |
<li>PyTorch processes would sometimes fail to clean up properly</li>
|
2444 |
<li>Slurm job manager would forcefully terminate jobs, leading to node failures </li>
|
2445 |
<li>Simple benchmarks that should take minutes would stretch into hours</li>
|
2446 |
-
<li>
|
2447 |
-
|
2448 |
-
|
2449 |
-
|
2450 |
-
|
2451 |
-
|
2452 |
-
</
|
|
|
|
|
|
|
2453 |
</ul>
|
2454 |
|
2455 |
<p>These challenges deserve their own story, but they taught us valuable lessons about the complexities of distributed training infrastructure. What looks simple in theory often requires careful attention to many moving parts in practice.</p>
|
2456 |
|
|
|
2457 |
<p>Let's analyze the results of our benchmarks and understand how different configurations affect each other. All benchmarks were run with a sequence length of 4096 and a global batch size of 1M tokens. We'll look at two key visualizations that help illustrate our findings.
|
2458 |
</p>
|
2459 |
|
@@ -2464,7 +2456,6 @@
|
|
2464 |
|
2465 |
<p>To complement this, let's look at the relationships between different parameters:</p>
|
2466 |
|
2467 |
-
<!-- <p><img alt="image.png" src="/assets/images/what_we_learnt_parallel_coordinates.html" /></p> -->
|
2468 |
<iframe id="plotFrame" src="/assets/images/what_we_learnt_parallel_coordinates.html" height="540" width="1000" scrolling="no" frameborder="0"></iframe>
|
2469 |
|
2470 |
<p>Parallel coordinates plot showing the relationship between different model parallelism configurations (Data Parallel degree, Tensor Parallel degree, Pipeline Parallel degree), training hyperparameters (gradient accumulation steps, micro batch size), ZeRO stage and the resulting Model FLOPs Utilization (MFU). Each line represents a different training configuration, with colors indicating the MFU value - warmer colors show higher efficiency.</p>
|
@@ -2478,19 +2469,19 @@
|
|
2478 |
<li>Larger models present a different challenge. As model size increases, memory requirements grow substantially. This creates two scenarios with fewer nodes: either the model doesn't fit at all, or it barely fits but runs inefficiently due to operating near the GPU memory limits.</li>
|
2479 |
<li>Our benchmarks demonstrate how performance heavily depends on implementation quality. When we first implemented both parallelism strategies, Tensor Parallelism (TP) outperformed Pipeline Parallelism (PP). After optimizing our PP code, it became the faster option. Now that we're improving the communication overlap in our TP implementation, we expect it to regain the performance lead.</li>
|
2480 |
</ol>
|
|
|
|
|
2481 |
|
2482 |
-
<
|
2483 |
-
|
2484 |
-
<h3>What’s next?</h3>
|
2485 |
|
2486 |
-
<p>You
|
2487 |
<ul>
|
2488 |
<li>Carefully read some of the landmark or very recent papers. You can find a list of some of the most impactful papers in <a target="_self" href="#references" class="">References</a>.</li>
|
2489 |
<li>Start from scratch and implement an algorithm yourself. Often a method only fully “clicks” if you implemented it yourself.</li>
|
2490 |
<li>Dive into one of the widely used frameworks and start contributing: fix bugs, answer issues, or implement a new feature. That’s the best way to get in any ML field!</li>
|
2491 |
</ul>
|
2492 |
|
2493 |
-
<p>We hope this
|
2494 |
|
2495 |
<h2>References</h2>
|
2496 |
|
|
|
2407 |
<h2>Conclusion</h2>
|
2408 |
|
2409 |
|
2410 |
+
<p>Congratulations, dear reader, you made it to the end! We've completed quite a journey: we started from understanding how to train a simple model on a single GPU, all the way to mastering all the intricate techniques used to efficiently train massive language models like Llama-405B and DeepSeek-V3 on thousands of GPUs. By now, you can read a diagram, like Llama-3's 4D parallel setup, with ease:</p>
|
2411 |
|
2412 |
<p><img alt="image.png" src="/assets/images/conclusion_llama3_parallelism.png" /></p>
|
2413 |
|
2414 |
+
<p>Orchestrating large clusters of GPUs to train LLMs efficiently is no easy feat. We learned how to optimize computations and communications between GPUs such that they run with maximum utilization at all times. It involves choosing the right parallelization strategy for a given model and cluster size, overlapping communication and computation where possible, and writing custom kernels that take into account the hardware layout to perform an operation as fast as possible on the GPU.</p>
|
2415 |
|
2416 |
+
<p>You might still believe that this knowledge is a bit niche and only concerns the small set of people that pretrain LLMs. Historically, that mayb be true, but as models are growing rapidly even people who want to fine-tune models require distributd training setups. So diving deeper into all things distributed might prove very timely.</p>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2417 |
|
2418 |
+
<p>This has been a long learning journey, but not just for you! Running thousands of benchmarks on a GPU cluster was more challenging than we anticipated and we want to share a few highlights of our learning experience.</p>
|
2419 |
|
2420 |
<h3>What we learned</h3>
|
2421 |
|
2422 |
+
<p>Our goal for this blogpost was not only to discuss theory and implementations but provide actual data points as well. So the plan was simple: lets run every possible distributed configuration for every model and a number of cluster sizes (namely 1-64 nodes of 8xH100s). Even after excluding impossible configuration we still needed to run thousands of experiments. </p>
|
2423 |
+
|
2424 |
+
<aside>We want to take this opportunity to apologize to our co-workers for blocking most of the science cluster and in turn forgive any threats that may have been whispered.</aside>
|
2425 |
+
|
2426 |
+
<p>
|
2427 |
+
On paper this sounds easy enough: we can easily launch big arrays of jobs on our cluster. However, when we launched the first batches is when the troubles began:
|
2428 |
</p>
|
2429 |
|
2430 |
<ul>
|
2431 |
<li>PyTorch processes would sometimes fail to clean up properly</li>
|
2432 |
<li>Slurm job manager would forcefully terminate jobs, leading to node failures </li>
|
2433 |
<li>Simple benchmarks that should take minutes would stretch into hours</li>
|
2434 |
+
<li>Some jobs would hang indefinitely</li>
|
2435 |
+
</ul>
|
2436 |
+
|
2437 |
+
<p>So in order to run all experiments in a finite amount of time required some additional engineering. In particular we spent a significant amount of time on the following:</p>
|
2438 |
+
|
2439 |
+
<ul>
|
2440 |
+
<li>Minimizing cluster restart times and optimize idle time</li>
|
2441 |
+
<li>Analyzing detailed NCCL debug logs</li>
|
2442 |
+
<li>Understand memory usage patterns and CUDA memory allocator behaviors</li>
|
2443 |
+
<li>Improving pipeline parallelism performance on multi-node</li>
|
2444 |
</ul>
|
2445 |
|
2446 |
<p>These challenges deserve their own story, but they taught us valuable lessons about the complexities of distributed training infrastructure. What looks simple in theory often requires careful attention to many moving parts in practice.</p>
|
2447 |
|
2448 |
+
<!--
|
2449 |
<p>Let's analyze the results of our benchmarks and understand how different configurations affect each other. All benchmarks were run with a sequence length of 4096 and a global batch size of 1M tokens. We'll look at two key visualizations that help illustrate our findings.
|
2450 |
</p>
|
2451 |
|
|
|
2456 |
|
2457 |
<p>To complement this, let's look at the relationships between different parameters:</p>
|
2458 |
|
|
|
2459 |
<iframe id="plotFrame" src="/assets/images/what_we_learnt_parallel_coordinates.html" height="540" width="1000" scrolling="no" frameborder="0"></iframe>
|
2460 |
|
2461 |
<p>Parallel coordinates plot showing the relationship between different model parallelism configurations (Data Parallel degree, Tensor Parallel degree, Pipeline Parallel degree), training hyperparameters (gradient accumulation steps, micro batch size), ZeRO stage and the resulting Model FLOPs Utilization (MFU). Each line represents a different training configuration, with colors indicating the MFU value - warmer colors show higher efficiency.</p>
|
|
|
2469 |
<li>Larger models present a different challenge. As model size increases, memory requirements grow substantially. This creates two scenarios with fewer nodes: either the model doesn't fit at all, or it barely fits but runs inefficiently due to operating near the GPU memory limits.</li>
|
2470 |
<li>Our benchmarks demonstrate how performance heavily depends on implementation quality. When we first implemented both parallelism strategies, Tensor Parallelism (TP) outperformed Pipeline Parallelism (PP). After optimizing our PP code, it became the faster option. Now that we're improving the communication overlap in our TP implementation, we expect it to regain the performance lead.</li>
|
2471 |
</ol>
|
2472 |
+
-->
|
2473 |
+
<p>Reproducing theoretical results in practice is challenging, especially given the limited availability of production training code. Through open-source projects like picotron and nanotron, we hope to make these distributed training techniques more accessible and foster collaboration on simpler, more efficient codebases that help researchers and practitioners make the most of their hardware resources.</p>
|
2474 |
|
2475 |
+
<h3>So, what’s next?</h3>
|
|
|
|
|
2476 |
|
2477 |
+
<p>You now have good overview of the main distributed training concepts but at the same time we just scratched to surface of on some aspects. There are many ways to dive deep into a subject but here are some steps that we recommend:</p>
|
2478 |
<ul>
|
2479 |
<li>Carefully read some of the landmark or very recent papers. You can find a list of some of the most impactful papers in <a target="_self" href="#references" class="">References</a>.</li>
|
2480 |
<li>Start from scratch and implement an algorithm yourself. Often a method only fully “clicks” if you implemented it yourself.</li>
|
2481 |
<li>Dive into one of the widely used frameworks and start contributing: fix bugs, answer issues, or implement a new feature. That’s the best way to get in any ML field!</li>
|
2482 |
</ul>
|
2483 |
|
2484 |
+
<p>We hope this book helps you get started in distributed training and that you will train the next generation of awesome models to the hum of your GPU cluster!</p>
|
2485 |
|
2486 |
<h2>References</h2>
|
2487 |
|
src/fragments/cp_8Bmemoryusage.html
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
<div> <div id="995b326f-ddf4-41a8-b9e5-fce1da175b3f" class="plotly-graph-div" style="height:410px; width:1000px;"></div> <script type="text/javascript"> window.PLOTLYENV=window.PLOTLYENV || {}; if (document.getElementById("995b326f-ddf4-41a8-b9e5-fce1da175b3f")) { Plotly.newPlot( "995b326f-ddf4-41a8-b9e5-fce1da175b3f", [{"legendgroup":"Model Parameters","marker":{"color":"#4ea5b7"},"name":"Model Parameters","showlegend":true,"x":["1024","4096","16384","65536","131072"],"y":[14.95703125,14.95703125,14.95703125,14.95703125,14.95703125],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"Gradients","marker":{"color":"#e889ab"},"name":"Gradients","showlegend":true,"x":["1024","4096","16384","65536","131072"],"y":[14.95703125,14.95703125,14.95703125,14.95703125,14.95703125],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"Optimizer States","marker":{"color":"#cec0fa"},"name":"Optimizer States","showlegend":true,"x":["1024","4096","16384","65536","131072"],"y":[59.828125,59.828125,59.828125,59.828125,59.828125],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"Activations","marker":{"color":"#e38a42"},"name":"Activations","showlegend":true,"x":["1024","4096","16384","65536","131072"],"y":[4.25,17.0,68.0,272.0,544.0],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"Model Parameters","marker":{"color":"#4ea5b7"},"name":"Model Parameters","showlegend":false,"x":["1024","4096","16384","65536","131072"],"y":[7.478515625,7.478515625,7.478515625,7.478515625,7.478515625],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"Gradients","marker":{"color":"#e889ab"},"name":"Gradients","showlegend":false,"x":["1024","4096","16384","65536","131072"],"y":[7.478515625,7.478515625,7.478515625,7.478515625,7.478515625],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"Optimizer States","marker":{"color":"#cec0fa"},"name":"Optimizer States","showlegend":false,"x":["1024","4096","16384","65536","131072"],"y":[29.9140625,29.9140625,29.9140625,29.9140625,29.9140625],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"Activations","marker":{"color":"#e38a42"},"name":"Activations","showlegend":false,"x":["1024","4096","16384","65536","131072"],"y":[2.75,11.0,44.0,176.0,352.0],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"Model Parameters","marker":{"color":"#4ea5b7"},"name":"Model Parameters","showlegend":false,"x":["1024","4096","16384","65536","131072"],"y":[7.478515625,7.478515625,7.478515625,7.478515625,7.478515625],"type":"bar","xaxis":"x3","yaxis":"y3"},{"legendgroup":"Gradients","marker":{"color":"#e889ab"},"name":"Gradients","showlegend":false,"x":["1024","4096","16384","65536","131072"],"y":[7.478515625,7.478515625,7.478515625,7.478515625,7.478515625],"type":"bar","xaxis":"x3","yaxis":"y3"},{"legendgroup":"Optimizer States","marker":{"color":"#cec0fa"},"name":"Optimizer States","showlegend":false,"x":["1024","4096","16384","65536","131072"],"y":[29.9140625,29.9140625,29.9140625,29.9140625,29.9140625],"type":"bar","xaxis":"x3","yaxis":"y3"},{"legendgroup":"Activations","marker":{"color":"#e38a42"},"name":"Activations","showlegend":false,"x":["1024","4096","16384","65536","131072"],"y":[0.6875,2.75,11.0,44.0,88.0],"type":"bar","xaxis":"x3","yaxis":"y3"}], {"template":{"data":{"histogram2dcontour":[{"type":"histogram2dcontour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"choropleth":[{"type":"choropleth","colorbar":{"outlinewidth":0,"ticks":""}}],"histogram2d":[{"type":"histogram2d","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmap":[{"type":"heatmap","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmapgl":[{"type":"heatmapgl","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"contourcarpet":[{"type":"contourcarpet","colorbar":{"outlinewidth":0,"ticks":""}}],"contour":[{"type":"contour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"surface":[{"type":"surface","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"mesh3d":[{"type":"mesh3d","colorbar":{"outlinewidth":0,"ticks":""}}],"scatter":[{"fillpattern":{"fillmode":"overlay","size":10,"solidity":0.2},"type":"scatter"}],"parcoords":[{"type":"parcoords","line":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolargl":[{"type":"scatterpolargl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"bar":[{"error_x":{"color":"#2a3f5f"},"error_y":{"color":"#2a3f5f"},"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"scattergeo":[{"type":"scattergeo","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolar":[{"type":"scatterpolar","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"scattergl":[{"type":"scattergl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatter3d":[{"type":"scatter3d","line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattermapbox":[{"type":"scattermapbox","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterternary":[{"type":"scatterternary","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattercarpet":[{"type":"scattercarpet","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"carpet":[{"aaxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"baxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"type":"carpet"}],"table":[{"cells":{"fill":{"color":"#EBF0F8"},"line":{"color":"white"}},"header":{"fill":{"color":"#C8D4E3"},"line":{"color":"white"}},"type":"table"}],"barpolar":[{"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"pie":[{"automargin":true,"type":"pie"}]},"layout":{"autotypenumbers":"strict","colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#2a3f5f"},"hovermode":"closest","hoverlabel":{"align":"left"},"paper_bgcolor":"white","plot_bgcolor":"#E5ECF6","polar":{"bgcolor":"#E5ECF6","angularaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"radialaxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"ternary":{"bgcolor":"#E5ECF6","aaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"baxis":{"gridcolor":"white","linecolor":"white","ticks":""},"caxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]]},"xaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"yaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"scene":{"xaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"yaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"zaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2}},"shapedefaults":{"line":{"color":"#2a3f5f"}},"annotationdefaults":{"arrowcolor":"#2a3f5f","arrowhead":0,"arrowwidth":1},"geo":{"bgcolor":"white","landcolor":"#E5ECF6","subunitcolor":"white","showland":true,"showlakes":true,"lakecolor":"white"},"title":{"x":0.05},"mapbox":{"style":"light"}}},"xaxis":{"anchor":"y","domain":[0.0,0.2888888888888889],"title":{"text":"Sequence Length"},"showgrid":true,"gridcolor":"LightGray"},"yaxis":{"anchor":"x","domain":[0.0,1.0],"range":[0,150],"dtick":20,"showgrid":true,"gridcolor":"LightGray","title":{"text":"Memory Usage (GB)"}},"xaxis2":{"anchor":"y2","domain":[0.35555555555555557,0.6444444444444445],"title":{"text":"Sequence Length"},"showgrid":true,"gridcolor":"LightGray"},"yaxis2":{"anchor":"x2","domain":[0.0,1.0],"matches":"y","showticklabels":false,"range":[0,150],"dtick":20,"showgrid":true,"gridcolor":"LightGray"},"xaxis3":{"anchor":"y3","domain":[0.7111111111111111,1.0],"title":{"text":"Sequence Length"},"showgrid":true,"gridcolor":"LightGray"},"yaxis3":{"anchor":"x3","domain":[0.0,1.0],"matches":"y","showticklabels":false,"range":[0,150],"dtick":20,"showgrid":true,"gridcolor":"LightGray"},"annotations":[{"font":{"size":16},"showarrow":false,"text":"No Parallelism","x":0.14444444444444446,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"TP=2 CP=1","x":0.5,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"TP=2 CP=4","x":0.8555555555555556,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"}],"shapes":[{"line":{"color":"red","dash":"dash"},"type":"line","x0":0,"x1":1,"xref":"x domain","y0":80,"y1":80,"yref":"y"},{"line":{"color":"red","dash":"dash"},"type":"line","x0":0,"x1":1,"xref":"x2 domain","y0":80,"y1":80,"yref":"y2"},{"line":{"color":"red","dash":"dash"},"type":"line","x0":0,"x1":1,"xref":"x3 domain","y0":80,"y1":80,"yref":"y3"}],"title":{"text":"Memory Usage for 8B Model"},"legend":{"orientation":"v","x":1.02,"y":0.5},"margin":{"r":150},"barmode":"stack","width":1000,"height":410}, {"responsive": true} ) }; </script> </div>
|
src/fragments/dp_ourjourney_memoryusage.html
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
<div> <div id="9f507a17-fb27-4b9a-9224-34ffad9cd0d4" class="plotly-graph-div" style="height:410px; width:1000px;"></div> <script type="text/javascript"> window.PLOTLYENV=window.PLOTLYENV || {}; if (document.getElementById("9f507a17-fb27-4b9a-9224-34ffad9cd0d4")) { Plotly.newPlot( "9f507a17-fb27-4b9a-9224-34ffad9cd0d4", [{"legendgroup":"parameters","marker":{"color":"#4ea5b7"},"name":"parameters","showlegend":true,"x":["1024","2048","4096","8192","16384"],"y":[2.3017578125,2.3017578125,2.3017578125,2.3017578125,2.3017578125],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"gradients","marker":{"color":"#e889ab"},"name":"gradients","showlegend":true,"x":["1024","2048","4096","8192","16384"],"y":[2.3017578125,2.3017578125,2.3017578125,2.3017578125,2.3017578125],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"optimizer states","marker":{"color":"#cec0fa"},"name":"optimizer states","showlegend":true,"x":["1024","2048","4096","8192","16384"],"y":[9.20703125,9.20703125,9.20703125,9.20703125,9.20703125],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"activations","marker":{"color":"#e38a42"},"name":"activations","showlegend":true,"x":["1024","2048","4096","8192","16384"],"y":[1.0625,2.125,4.25,8.5,17.0],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"parameters","marker":{"color":"#4ea5b7"},"name":"parameters","showlegend":false,"x":["1024","2048","4096","8192","16384"],"y":[14.95703125,14.95703125,14.95703125,14.95703125,14.95703125],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"gradients","marker":{"color":"#e889ab"},"name":"gradients","showlegend":false,"x":["1024","2048","4096","8192","16384"],"y":[14.95703125,14.95703125,14.95703125,14.95703125,14.95703125],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"optimizer states","marker":{"color":"#cec0fa"},"name":"optimizer states","showlegend":false,"x":["1024","2048","4096","8192","16384"],"y":[59.828125,59.828125,59.828125,59.828125,59.828125],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"activations","marker":{"color":"#e38a42"},"name":"activations","showlegend":false,"x":["1024","2048","4096","8192","16384"],"y":[4.25,8.5,17.0,34.0,68.0],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"parameters","marker":{"color":"#4ea5b7"},"name":"parameters","showlegend":false,"x":["1024","2048","4096","8192","16384"],"y":[131.4140625,131.4140625,131.4140625,131.4140625,131.4140625],"type":"bar","xaxis":"x3","yaxis":"y3"},{"legendgroup":"gradients","marker":{"color":"#e889ab"},"name":"gradients","showlegend":false,"x":["1024","2048","4096","8192","16384"],"y":[131.4140625,131.4140625,131.4140625,131.4140625,131.4140625],"type":"bar","xaxis":"x3","yaxis":"y3"},{"legendgroup":"optimizer states","marker":{"color":"#cec0fa"},"name":"optimizer states","showlegend":false,"x":["1024","2048","4096","8192","16384"],"y":[525.65625,525.65625,525.65625,525.65625,525.65625],"type":"bar","xaxis":"x3","yaxis":"y3"},{"legendgroup":"activations","marker":{"color":"#e38a42"},"name":"activations","showlegend":false,"x":["1024","2048","4096","8192","16384"],"y":[21.25,42.5,85.0,170.0,340.0],"type":"bar","xaxis":"x3","yaxis":"y3"}], {"template":{"data":{"histogram2dcontour":[{"type":"histogram2dcontour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"choropleth":[{"type":"choropleth","colorbar":{"outlinewidth":0,"ticks":""}}],"histogram2d":[{"type":"histogram2d","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmap":[{"type":"heatmap","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmapgl":[{"type":"heatmapgl","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"contourcarpet":[{"type":"contourcarpet","colorbar":{"outlinewidth":0,"ticks":""}}],"contour":[{"type":"contour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"surface":[{"type":"surface","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"mesh3d":[{"type":"mesh3d","colorbar":{"outlinewidth":0,"ticks":""}}],"scatter":[{"fillpattern":{"fillmode":"overlay","size":10,"solidity":0.2},"type":"scatter"}],"parcoords":[{"type":"parcoords","line":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolargl":[{"type":"scatterpolargl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"bar":[{"error_x":{"color":"#2a3f5f"},"error_y":{"color":"#2a3f5f"},"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"scattergeo":[{"type":"scattergeo","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolar":[{"type":"scatterpolar","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"scattergl":[{"type":"scattergl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatter3d":[{"type":"scatter3d","line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattermapbox":[{"type":"scattermapbox","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterternary":[{"type":"scatterternary","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattercarpet":[{"type":"scattercarpet","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"carpet":[{"aaxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"baxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"type":"carpet"}],"table":[{"cells":{"fill":{"color":"#EBF0F8"},"line":{"color":"white"}},"header":{"fill":{"color":"#C8D4E3"},"line":{"color":"white"}},"type":"table"}],"barpolar":[{"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"pie":[{"automargin":true,"type":"pie"}]},"layout":{"autotypenumbers":"strict","colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#2a3f5f"},"hovermode":"closest","hoverlabel":{"align":"left"},"paper_bgcolor":"white","plot_bgcolor":"#E5ECF6","polar":{"bgcolor":"#E5ECF6","angularaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"radialaxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"ternary":{"bgcolor":"#E5ECF6","aaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"baxis":{"gridcolor":"white","linecolor":"white","ticks":""},"caxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]]},"xaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"yaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"scene":{"xaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"yaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"zaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2}},"shapedefaults":{"line":{"color":"#2a3f5f"}},"annotationdefaults":{"arrowcolor":"#2a3f5f","arrowhead":0,"arrowwidth":1},"geo":{"bgcolor":"white","landcolor":"#E5ECF6","subunitcolor":"white","showland":true,"showlakes":true,"lakecolor":"white"},"title":{"x":0.05},"mapbox":{"style":"light"}}},"xaxis":{"anchor":"y","domain":[0.0,0.2888888888888889]},"yaxis":{"anchor":"x","domain":[0.0,1.0],"range":[0,150],"title":{"text":"GB memory"}},"xaxis2":{"anchor":"y2","domain":[0.35555555555555557,0.6444444444444445]},"yaxis2":{"anchor":"x2","domain":[0.0,1.0],"matches":"y","showticklabels":false,"range":[0,150]},"xaxis3":{"anchor":"y3","domain":[0.7111111111111111,1.0]},"yaxis3":{"anchor":"x3","domain":[0.0,1.0],"matches":"y","showticklabels":false,"range":[0,150]},"annotations":[{"font":{"size":16},"showarrow":false,"text":"1B model","x":0.14444444444444446,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"8B model","x":0.5,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"70B model","x":0.8555555555555556,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"}],"shapes":[{"line":{"color":"red","dash":"dash"},"type":"line","x0":0,"x1":1,"xref":"x domain","y0":80,"y1":80,"yref":"y"},{"line":{"color":"red","dash":"dash"},"type":"line","x0":0,"x1":1,"xref":"x2 domain","y0":80,"y1":80,"yref":"y2"},{"line":{"color":"red","dash":"dash"},"type":"line","x0":0,"x1":1,"xref":"x3 domain","y0":80,"y1":80,"yref":"y3"}],"title":{"text":"Memory Usage vs Sequence Length for Different Model Sizes"},"legend":{"orientation":"v","x":1.02,"y":0.5},"margin":{"r":150},"barmode":"stack","width":1000,"height":410}, {"responsive": true} ) }; </script> </div>
|
src/fragments/dp_scaling.html
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
<div> <div id="f6b00dd8-6230-46cf-9b38-f7fd425a1dd3" class="plotly-graph-div" style="height:400px; width:1000px;"></div> <script type="text/javascript"> window.PLOTLYENV=window.PLOTLYENV || {}; if (document.getElementById("f6b00dd8-6230-46cf-9b38-f7fd425a1dd3")) { Plotly.newPlot( "f6b00dd8-6230-46cf-9b38-f7fd425a1dd3", [{"marker":{"color":"#4ea5b7"},"name":"Throughput (tokens\u002fsec\u002fGPU)","width":0.7,"x":["8","16","32","64","128","256"],"y":[40149.94,37609.69,35367.61,31112.23,26446.44,15700.38],"type":"bar","xaxis":"x","yaxis":"y"},{"base":[37609.69],"marker":{"color":"#e889ab"},"name":"Performance Drop","showlegend":true,"width":0.0875,"x":["16"],"y":[2540.25],"type":"bar","xaxis":"x","yaxis":"y"},{"base":[35367.61],"marker":{"color":"#e889ab"},"showlegend":false,"width":0.0875,"x":["32"],"y":[2242.0800000000017],"type":"bar","xaxis":"x","yaxis":"y"},{"base":[31112.23],"marker":{"color":"#e889ab"},"showlegend":false,"width":0.0875,"x":["64"],"y":[4255.380000000001],"type":"bar","xaxis":"x","yaxis":"y"},{"base":[26446.44],"marker":{"color":"#e889ab"},"showlegend":false,"width":0.0875,"x":["128"],"y":[4665.790000000001],"type":"bar","xaxis":"x","yaxis":"y"},{"base":[15700.38],"marker":{"color":"#e889ab"},"showlegend":false,"width":0.0875,"x":["256"],"y":[10746.06],"type":"bar","xaxis":"x","yaxis":"y"},{"line":{"color":"#e889ab"},"marker":{"color":"#e889ab"},"mode":"lines+markers","name":"Memory Usage (GB)","x":["8","16","32","64","128","256"],"y":[36.66,36.66,36.66,36.66,36.66,36.66],"type":"scatter","xaxis":"x2","yaxis":"y2"}], {"template":{"data":{"histogram2dcontour":[{"type":"histogram2dcontour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"choropleth":[{"type":"choropleth","colorbar":{"outlinewidth":0,"ticks":""}}],"histogram2d":[{"type":"histogram2d","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmap":[{"type":"heatmap","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmapgl":[{"type":"heatmapgl","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"contourcarpet":[{"type":"contourcarpet","colorbar":{"outlinewidth":0,"ticks":""}}],"contour":[{"type":"contour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"surface":[{"type":"surface","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"mesh3d":[{"type":"mesh3d","colorbar":{"outlinewidth":0,"ticks":""}}],"scatter":[{"fillpattern":{"fillmode":"overlay","size":10,"solidity":0.2},"type":"scatter"}],"parcoords":[{"type":"parcoords","line":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolargl":[{"type":"scatterpolargl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"bar":[{"error_x":{"color":"#2a3f5f"},"error_y":{"color":"#2a3f5f"},"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"scattergeo":[{"type":"scattergeo","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolar":[{"type":"scatterpolar","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"scattergl":[{"type":"scattergl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatter3d":[{"type":"scatter3d","line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattermapbox":[{"type":"scattermapbox","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterternary":[{"type":"scatterternary","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattercarpet":[{"type":"scattercarpet","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"carpet":[{"aaxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"baxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"type":"carpet"}],"table":[{"cells":{"fill":{"color":"#EBF0F8"},"line":{"color":"white"}},"header":{"fill":{"color":"#C8D4E3"},"line":{"color":"white"}},"type":"table"}],"barpolar":[{"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"pie":[{"automargin":true,"type":"pie"}]},"layout":{"autotypenumbers":"strict","colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#2a3f5f"},"hovermode":"closest","hoverlabel":{"align":"left"},"paper_bgcolor":"white","plot_bgcolor":"#E5ECF6","polar":{"bgcolor":"#E5ECF6","angularaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"radialaxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"ternary":{"bgcolor":"#E5ECF6","aaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"baxis":{"gridcolor":"white","linecolor":"white","ticks":""},"caxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]]},"xaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"yaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"scene":{"xaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"yaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"zaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2}},"shapedefaults":{"line":{"color":"#2a3f5f"}},"annotationdefaults":{"arrowcolor":"#2a3f5f","arrowhead":0,"arrowwidth":1},"geo":{"bgcolor":"white","landcolor":"#E5ECF6","subunitcolor":"white","showland":true,"showlakes":true,"lakecolor":"white"},"title":{"x":0.05},"mapbox":{"style":"light"}}},"xaxis":{"anchor":"y","domain":[0.0,0.45],"title":{"text":"Data Parallelism (DP)"},"showgrid":true,"gridcolor":"LightGray"},"yaxis":{"anchor":"x","domain":[0.0,1.0],"title":{"text":"Throughput (tokens\u002fsec\u002fGPU)"},"showgrid":true,"gridcolor":"LightGray"},"xaxis2":{"anchor":"y2","domain":[0.55,1.0],"title":{"text":"Data Parallelism (DP)"},"showgrid":true,"gridcolor":"LightGray"},"yaxis2":{"anchor":"x2","domain":[0.0,1.0],"title":{"text":"Memory Usage (GB)"},"showgrid":true,"gridcolor":"LightGray"},"annotations":[{"font":{"size":16},"showarrow":false,"text":"Throughput Scaling with Data Parallelism","x":0.225,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"Memory Usage Scaling with Data Parallelism","x":0.775,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"},{"font":{"color":"#e889ab"},"showarrow":false,"text":"-6.3%","x":1,"xanchor":"center","xref":"x","xshift":30,"y":38879.815,"yanchor":"middle","yref":"y"},{"font":{"color":"#e889ab"},"showarrow":false,"text":"-6.0%","x":2,"xanchor":"center","xref":"x","xshift":30,"y":36488.65,"yanchor":"middle","yref":"y"},{"font":{"color":"#e889ab"},"showarrow":false,"text":"-12.0%","x":3,"xanchor":"center","xref":"x","xshift":30,"y":33239.92,"yanchor":"middle","yref":"y"},{"font":{"color":"#e889ab"},"showarrow":false,"text":"-15.0%","x":4,"xanchor":"center","xref":"x","xshift":30,"y":28779.335,"yanchor":"middle","yref":"y"},{"font":{"color":"#e889ab"},"showarrow":false,"text":"-40.6%","x":5,"xanchor":"center","xref":"x","xshift":30,"y":21073.41,"yanchor":"middle","yref":"y"}],"legend":{"x":0.55,"y":1.0},"width":1000,"height":400,"barmode":"stack"}, {"responsive": true} ) }; </script> </div>
|
src/fragments/memory-profile.html
CHANGED
The diff for this file is too large to render.
See raw diff
|
|
src/fragments/memory-recomputation.html
CHANGED
@@ -1 +1 @@
|
|
1 |
-
<div> <div id="3a29a7b6-5a6e-4e51-aabe-515dafefbd63" class="plotly-graph-div" style="height:500px; width:850px;"></div> <script type="text/javascript"> window.PLOTLYENV=window.PLOTLYENV || {}; if (document.getElementById("3a29a7b6-5a6e-4e51-aabe-515dafefbd63")) { Plotly.newPlot( "3a29a7b6-5a6e-4e51-aabe-515dafefbd63", [{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(78, 165, 183)"},"name":"parameters","showlegend":true,"visible":true,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAEjnD0AAAAAASPcPQAAAAACkCxBAAAAAAKQrEEA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(227, 138, 66)"},"name":"gradients","showlegend":true,"visible":true,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAEjnD0AAAAAASPcPQAAAAACkCxBAAAAAAKQrEEA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(232, 137, 171)"},"name":"optimizer","showlegend":true,"visible":true,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAEjnH0AAAAAASPcfQAAAAACkCyBAAAAAAKQrIEA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(206, 192, 250)"},"name":"activations","showlegend":true,"visible":true,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAACEDEAAAAAAAEIoQAAAAAAAIUZAAAAAAIAQZUA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(78, 165, 183)"},"name":"parameters","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAKWXKkAAAAAApZ0qQAAAAAClqSpAAAAAAKXBKkA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(227, 138, 66)"},"name":"gradients","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAKWXKkAAAAAApZ0qQAAAAAClqSpAAAAAAKXBKkA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(232, 137, 171)"},"name":"optimizer","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAKWXOkAAAAAApZ06QAAAAAClqTpAAAAAAKXBOkA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(206, 192, 250)"},"name":"activations","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAABLGEAAAAAAgLUyQAAAAACA1U9AAAAAAMAKbUA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(78, 165, 183)"},"name":"parameters","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAKL6OUAAAAAAov45QAAAAACiBjpAAAAAAKIWOkA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(227, 138, 66)"},"name":"gradients","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAKL6OUAAAAAAov45QAAAAACiBjpAAAAAAKIWOkA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(232, 137, 171)"},"name":"optimizer","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAKL6SUAAAAAAov5JQAAAAACiBkpAAAAAAKIWSkA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(206, 192, 250)"},"name":"activations","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAACCIkAAAAAAAII8QAAAAAAAQVhAAAAAAIAgdkA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(78, 165, 183)"},"name":"parameters","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAgER\u002fbkAAAACARIBuQAAAAIBEgm5AAAAAgESGbkA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(227, 138, 66)"},"name":"gradients","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAgER\u002fbkAAAACARIBuQAAAAIBEgm5AAAAAgESGbkA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(232, 137, 171)"},"name":"optimizer","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAgER\u002ffkAAAACARIB+QAAAAIBEgn5AAAAAgESGfkA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(206, 192, 250)"},"name":"activations","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAAAhR0AAAAAAgNBhQAAAAACAUH5AAAAAAECom0A="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(78, 165, 183)"},"name":"parameters","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAgPa\u002fl0AAAACANsCXQAAAAIC2wJdAAAAAgLbBl0A="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(227, 138, 66)"},"name":"gradients","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAgPa\u002fl0AAAACANsCXQAAAAIC2wJdAAAAAgLbBl0A="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(232, 137, 171)"},"name":"optimizer","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAgPa\u002fp0AAAACANsCnQAAAAIC2wKdAAAAAgLbBp0A="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(206, 192, 250)"},"name":"activations","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAIA2YkAAAAAAgA58QAAAAABA35dAAAAAAKDHtUA="},"type":"bar"}], {"template":{"data":{"barpolar":[{"marker":{"line":{"color":"white","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"bar":[{"error_x":{"color":"#2a3f5f"},"error_y":{"color":"#2a3f5f"},"marker":{"line":{"color":"white","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"carpet":[{"aaxis":{"endlinecolor":"#2a3f5f","gridcolor":"#C8D4E3","linecolor":"#C8D4E3","minorgridcolor":"#C8D4E3","startlinecolor":"#2a3f5f"},"baxis":{"endlinecolor":"#2a3f5f","gridcolor":"#C8D4E3","linecolor":"#C8D4E3","minorgridcolor":"#C8D4E3","startlinecolor":"#2a3f5f"},"type":"carpet"}],"choropleth":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"choropleth"}],"contourcarpet":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"contourcarpet"}],"contour":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"contour"}],"heatmap":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"heatmap"}],"histogram2dcontour":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"histogram2dcontour"}],"histogram2d":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"histogram2d"}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"mesh3d":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"mesh3d"}],"parcoords":[{"line":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"parcoords"}],"pie":[{"automargin":true,"type":"pie"}],"scatter3d":[{"line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatter3d"}],"scattercarpet":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattercarpet"}],"scattergeo":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattergeo"}],"scattergl":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattergl"}],"scattermapbox":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattermapbox"}],"scattermap":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattermap"}],"scatterpolargl":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterpolargl"}],"scatterpolar":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterpolar"}],"scatter":[{"fillpattern":{"fillmode":"overlay","size":10,"solidity":0.2},"type":"scatter"}],"scatterternary":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterternary"}],"surface":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"surface"}],"table":[{"cells":{"fill":{"color":"#EBF0F8"},"line":{"color":"white"}},"header":{"fill":{"color":"#C8D4E3"},"line":{"color":"white"}},"type":"table"}]},"layout":{"annotationdefaults":{"arrowcolor":"#2a3f5f","arrowhead":0,"arrowwidth":1},"autotypenumbers":"strict","coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]],"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]},"colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#2a3f5f"},"geo":{"bgcolor":"white","lakecolor":"white","landcolor":"white","showlakes":true,"showland":true,"subunitcolor":"#C8D4E3"},"hoverlabel":{"align":"left"},"hovermode":"closest","mapbox":{"style":"light"},"paper_bgcolor":"white","plot_bgcolor":"white","polar":{"angularaxis":{"gridcolor":"#EBF0F8","linecolor":"#EBF0F8","ticks":""},"bgcolor":"white","radialaxis":{"gridcolor":"#EBF0F8","linecolor":"#EBF0F8","ticks":""}},"scene":{"xaxis":{"backgroundcolor":"white","gridcolor":"#DFE8F3","gridwidth":2,"linecolor":"#EBF0F8","showbackground":true,"ticks":"","zerolinecolor":"#EBF0F8"},"yaxis":{"backgroundcolor":"white","gridcolor":"#DFE8F3","gridwidth":2,"linecolor":"#EBF0F8","showbackground":true,"ticks":"","zerolinecolor":"#EBF0F8"},"zaxis":{"backgroundcolor":"white","gridcolor":"#DFE8F3","gridwidth":2,"linecolor":"#EBF0F8","showbackground":true,"ticks":"","zerolinecolor":"#EBF0F8"}},"shapedefaults":{"line":{"color":"#2a3f5f"}},"ternary":{"aaxis":{"gridcolor":"#DFE8F3","linecolor":"#A2B1C6","ticks":""},"baxis":{"gridcolor":"#DFE8F3","linecolor":"#A2B1C6","ticks":""},"bgcolor":"white","caxis":{"gridcolor":"#DFE8F3","linecolor":"#A2B1C6","ticks":""}},"title":{"x":0.05},"xaxis":{"automargin":true,"gridcolor":"#EBF0F8","linecolor":"#EBF0F8","ticks":"","title":{"standoff":15},"zerolinecolor":"#EBF0F8","zerolinewidth":2},"yaxis":{"automargin":true,"gridcolor":"#EBF0F8","linecolor":"#EBF0F8","ticks":"","title":{"standoff":15},"zerolinecolor":"#EBF0F8","zerolinewidth":2}}},"title":{"text":"Memory Usage with Recomputation","x":0.5,"y":0.95},"margin":{"r":80},"legend":{"y":0.95,"x":1.02,"xanchor":"left","yanchor":"top"},"updatemenus":[{"active":0,"buttons":[{"args":[{"visible":[true,true,true,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false]},{"yaxis.range":[0,203.1547058105469]}],"label":"1B","method":"update"},{"args":[{"visible":[false,false,false,false,true,true,true,true,false,false,false,false,false,false,false,false,false,false,false,false]},{"yaxis.range":[0,314.4336639404297]}],"label":"3B","method":"update"},{"args":[{"visible":[false,false,false,false,false,false,false,false,true,true,true,true,false,false,false,false,false,false,false,false]},{"yaxis.range":[0,504.2233764648438]}],"label":"8B","method":"update"},{"args":[{"visible":[false,false,false,false,false,false,false,false,false,false,false,false,true,true,true,true,false,false,false,false]},{"yaxis.range":[0,3021.530541992188]}],"label":"70B","method":"update"},{"args":[{"visible":[false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,true,true,true,true]},{"yaxis.range":[0,12823.0716796875]}],"label":"405B","method":"update"}],"direction":"down","showactive":true,"x":1.035,"xanchor":"left","y":0.6,"yanchor":"top"},{"active":0,"buttons":[{"args":[{"y":[[3.9879302978515625,3.9957427978515625,4.0113677978515625,4.0426177978515625],[3.9879302978515625,3.9957427978515625,4.0113677978515625,4.0426177978515625],[7.975860595703125,7.991485595703125,8.022735595703125,8.085235595703125],[3.564453125,12.12890625,44.2578125,168.515625],[13.296180725097656,13.307899475097656,13.331336975097656,13.378211975097656],[13.296180725097656,13.307899475097656,13.331336975097656,13.378211975097656],[26.592361450195312,26.615798950195312,26.662673950195312,26.756423950195312],[6.0732421875,18.708984375,63.66796875,232.3359375],[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125],[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125],[51.95806884765625,51.98931884765625,52.05181884765625,52.17681884765625],[9.25390625,28.5078125,97.015625,354.03125],[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625],[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625],[487.9542236328125,488.0167236328125,488.1417236328125,488.3917236328125],[46.2578125,142.515625,485.03125,1770.0625],[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625],[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625],[3039.9814453125,3040.1064453125,3040.3564453125,3040.8564453125],[145.703125,448.90625,1527.8125,5575.625]]}],"label":"None","method":"restyle"},{"args":[{"y":[[3.9879302978515625,3.9957427978515625,4.0113677978515625,4.0426177978515625],[3.9879302978515625,3.9957427978515625,4.0113677978515625,4.0426177978515625],[7.975860595703125,7.991485595703125,8.022735595703125,8.085235595703125],[1.064453125,2.12890625,4.2578125,8.515625],[13.296180725097656,13.307899475097656,13.331336975097656,13.378211975097656],[13.296180725097656,13.307899475097656,13.331336975097656,13.378211975097656],[26.592361450195312,26.615798950195312,26.662673950195312,26.756423950195312],[2.7919921875,5.583984375,11.16796875,22.3359375],[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125],[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125],[51.95806884765625,51.98931884765625,52.05181884765625,52.17681884765625],[4.25390625,8.5078125,17.015625,34.03125],[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625],[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625],[487.9542236328125,488.0167236328125,488.1417236328125,488.3917236328125],[21.2578125,42.515625,85.03125,170.0625],[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625],[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625],[3039.9814453125,3040.1064453125,3040.3564453125,3040.8564453125],[66.953125,133.90625,267.8125,535.625]]}],"label":"selective","method":"restyle"},{"args":[{"y":[[3.9879302978515625,3.9957427978515625,4.0113677978515625,4.0426177978515625],[3.9879302978515625,3.9957427978515625,4.0113677978515625,4.0426177978515625],[7.975860595703125,7.991485595703125,8.022735595703125,8.085235595703125],[0.064453125,0.12890625,0.2578125,0.515625],[13.296180725097656,13.307899475097656,13.331336975097656,13.378211975097656],[13.296180725097656,13.307899475097656,13.331336975097656,13.378211975097656],[26.592361450195312,26.615798950195312,26.662673950195312,26.756423950195312],[0.1669921875,0.333984375,0.66796875,1.3359375],[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125],[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125],[51.95806884765625,51.98931884765625,52.05181884765625,52.17681884765625],[0.25390625,0.5078125,1.015625,2.03125],[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625],[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625],[487.9542236328125,488.0167236328125,488.1417236328125,488.3917236328125],[1.2578125,2.515625,5.03125,10.0625],[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625],[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625],[3039.9814453125,3040.1064453125,3040.3564453125,3040.8564453125],[3.953125,7.90625,15.8125,31.625]]}],"label":"full","method":"restyle"}],"direction":"down","showactive":true,"x":1.035,"xanchor":"left","y":0.4,"yanchor":"top"}],"barmode":"stack","xaxis":{"title":{"text":"Sequence Length"}},"yaxis":{"title":{"text":"Memory (GB)"},"range":[0,203.1547058105469]},"width":850,"height":500,"annotations":[{"showarrow":false,"text":"Model Size:","x":1.035,"xanchor":"left","xref":"paper","y":0.6,"yanchor":"bottom","yref":"paper"},{"showarrow":false,"text":"Recomputation:","x":1.035,"xanchor":"left","xref":"paper","y":0.4,"yanchor":"bottom","yref":"paper"}]}, {"responsive": true, "scrollZoom": false} ) }; </script> </div>
|
|
|
1 |
+
<div> <div id="3a29a7b6-5a6e-4e51-aabe-515dafefbd63" class="plotly-graph-div" style="height:500px; width:850px;"></div> <script type="text/javascript"> window.PLOTLYENV=window.PLOTLYENV || {}; if (document.getElementById("3a29a7b6-5a6e-4e51-aabe-515dafefbd63")) { Plotly.newPlot( "3a29a7b6-5a6e-4e51-aabe-515dafefbd63", [{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(78, 165, 183)"},"name":"parameters","showlegend":true,"visible":true,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAEjnD0AAAAAASPcPQAAAAACkCxBAAAAAAKQrEEA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(227, 138, 66)"},"name":"gradients","showlegend":true,"visible":true,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAEjnD0AAAAAASPcPQAAAAACkCxBAAAAAAKQrEEA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(232, 137, 171)"},"name":"optimizer","showlegend":true,"visible":true,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAEjnH0AAAAAASPcfQAAAAACkCyBAAAAAAKQrIEA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(206, 192, 250)"},"name":"activations","showlegend":true,"visible":true,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAACEDEAAAAAAAEIoQAAAAAAAIUZAAAAAAIAQZUA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(78, 165, 183)"},"name":"parameters","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAKWXKkAAAAAApZ0qQAAAAAClqSpAAAAAAKXBKkA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(227, 138, 66)"},"name":"gradients","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAKWXKkAAAAAApZ0qQAAAAAClqSpAAAAAAKXBKkA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(232, 137, 171)"},"name":"optimizer","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAKWXOkAAAAAApZ06QAAAAAClqTpAAAAAAKXBOkA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(206, 192, 250)"},"name":"activations","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAABLGEAAAAAAgLUyQAAAAACA1U9AAAAAAMAKbUA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(78, 165, 183)"},"name":"parameters","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAKL6OUAAAAAAov45QAAAAACiBjpAAAAAAKIWOkA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(227, 138, 66)"},"name":"gradients","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAKL6OUAAAAAAov45QAAAAACiBjpAAAAAAKIWOkA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(232, 137, 171)"},"name":"optimizer","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAKL6SUAAAAAAov5JQAAAAACiBkpAAAAAAKIWSkA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(206, 192, 250)"},"name":"activations","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAACCIkAAAAAAAII8QAAAAAAAQVhAAAAAAIAgdkA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(78, 165, 183)"},"name":"parameters","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAgER\u002fbkAAAACARIBuQAAAAIBEgm5AAAAAgESGbkA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(227, 138, 66)"},"name":"gradients","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAgER\u002fbkAAAACARIBuQAAAAIBEgm5AAAAAgESGbkA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(232, 137, 171)"},"name":"optimizer","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAgER\u002ffkAAAACARIB+QAAAAIBEgn5AAAAAgESGfkA="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(206, 192, 250)"},"name":"activations","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAAAhR0AAAAAAgNBhQAAAAACAUH5AAAAAAECom0A="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(78, 165, 183)"},"name":"parameters","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAgPa\u002fl0AAAACANsCXQAAAAIC2wJdAAAAAgLbBl0A="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(227, 138, 66)"},"name":"gradients","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAgPa\u002fl0AAAACANsCXQAAAAIC2wJdAAAAAgLbBl0A="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(232, 137, 171)"},"name":"optimizer","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAgPa\u002fp0AAAACANsCnQAAAAIC2wKdAAAAAgLbBp0A="},"type":"bar"},{"hovertemplate":"Seq len=%{x}\u003cbr\u003eMem=%{y:.1f}GB\u003cbr\u003e%{data.name}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"rgb(206, 192, 250)"},"name":"activations","showlegend":true,"visible":false,"x":["1024","2048","4096","8192"],"y":{"dtype":"f8","bdata":"AAAAAIA2YkAAAAAAgA58QAAAAABA35dAAAAAAKDHtUA="},"type":"bar"}], {"template":{"data":{"barpolar":[{"marker":{"line":{"color":"white","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"bar":[{"error_x":{"color":"#2a3f5f"},"error_y":{"color":"#2a3f5f"},"marker":{"line":{"color":"white","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"carpet":[{"aaxis":{"endlinecolor":"#2a3f5f","gridcolor":"#C8D4E3","linecolor":"#C8D4E3","minorgridcolor":"#C8D4E3","startlinecolor":"#2a3f5f"},"baxis":{"endlinecolor":"#2a3f5f","gridcolor":"#C8D4E3","linecolor":"#C8D4E3","minorgridcolor":"#C8D4E3","startlinecolor":"#2a3f5f"},"type":"carpet"}],"choropleth":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"choropleth"}],"contourcarpet":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"contourcarpet"}],"contour":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"contour"}],"heatmap":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"heatmap"}],"histogram2dcontour":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"histogram2dcontour"}],"histogram2d":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"histogram2d"}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"mesh3d":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"mesh3d"}],"parcoords":[{"line":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"parcoords"}],"pie":[{"automargin":true,"type":"pie"}],"scatter3d":[{"line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatter3d"}],"scattercarpet":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattercarpet"}],"scattergeo":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattergeo"}],"scattergl":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattergl"}],"scattermapbox":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattermapbox"}],"scattermap":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattermap"}],"scatterpolargl":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterpolargl"}],"scatterpolar":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterpolar"}],"scatter":[{"fillpattern":{"fillmode":"overlay","size":10,"solidity":0.2},"type":"scatter"}],"scatterternary":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterternary"}],"surface":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"surface"}],"table":[{"cells":{"fill":{"color":"#EBF0F8"},"line":{"color":"white"}},"header":{"fill":{"color":"#C8D4E3"},"line":{"color":"white"}},"type":"table"}]},"layout":{"annotationdefaults":{"arrowcolor":"#2a3f5f","arrowhead":0,"arrowwidth":1},"autotypenumbers":"strict","coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]],"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]},"colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#2a3f5f"},"geo":{"bgcolor":"white","lakecolor":"white","landcolor":"white","showlakes":true,"showland":true,"subunitcolor":"#C8D4E3"},"hoverlabel":{"align":"left"},"hovermode":"closest","mapbox":{"style":"light"},"paper_bgcolor":"white","plot_bgcolor":"white","polar":{"angularaxis":{"gridcolor":"#EBF0F8","linecolor":"#EBF0F8","ticks":""},"bgcolor":"white","radialaxis":{"gridcolor":"#EBF0F8","linecolor":"#EBF0F8","ticks":""}},"scene":{"xaxis":{"backgroundcolor":"white","gridcolor":"#DFE8F3","gridwidth":2,"linecolor":"#EBF0F8","showbackground":true,"ticks":"","zerolinecolor":"#EBF0F8"},"yaxis":{"backgroundcolor":"white","gridcolor":"#DFE8F3","gridwidth":2,"linecolor":"#EBF0F8","showbackground":true,"ticks":"","zerolinecolor":"#EBF0F8"},"zaxis":{"backgroundcolor":"white","gridcolor":"#DFE8F3","gridwidth":2,"linecolor":"#EBF0F8","showbackground":true,"ticks":"","zerolinecolor":"#EBF0F8"}},"shapedefaults":{"line":{"color":"#2a3f5f"}},"ternary":{"aaxis":{"gridcolor":"#DFE8F3","linecolor":"#A2B1C6","ticks":""},"baxis":{"gridcolor":"#DFE8F3","linecolor":"#A2B1C6","ticks":""},"bgcolor":"white","caxis":{"gridcolor":"#DFE8F3","linecolor":"#A2B1C6","ticks":""}},"title":{"x":0.05},"xaxis":{"automargin":true,"gridcolor":"#EBF0F8","linecolor":"#EBF0F8","ticks":"","title":{"standoff":15},"zerolinecolor":"#EBF0F8","zerolinewidth":2},"yaxis":{"automargin":true,"gridcolor":"#EBF0F8","linecolor":"#EBF0F8","ticks":"","title":{"standoff":15},"zerolinecolor":"#EBF0F8","zerolinewidth":2}}},"title":{"text":"Memory Usage with Recomputation","x":0.5,"y":0.95},"margin":{"r":80},"legend":{"y":0.95,"x":1.02,"xanchor":"left","yanchor":"top"},"updatemenus":[{"active":0,"buttons":[{"args":[{"visible":[true,true,true,true,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false]},{"yaxis.range":[0,203.1547058105469]}],"label":"1B","method":"update"},{"args":[{"visible":[false,false,false,false,true,true,true,true,false,false,false,false,false,false,false,false,false,false,false,false]},{"yaxis.range":[0,314.4336639404297]}],"label":"3B","method":"update"},{"args":[{"visible":[false,false,false,false,false,false,false,false,true,true,true,true,false,false,false,false,false,false,false,false]},{"yaxis.range":[0,504.2233764648438]}],"label":"8B","method":"update"},{"args":[{"visible":[false,false,false,false,false,false,false,false,false,false,false,false,true,true,true,true,false,false,false,false]},{"yaxis.range":[0,3021.530541992188]}],"label":"70B","method":"update"},{"args":[{"visible":[false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,false,true,true,true,true]},{"yaxis.range":[0,12823.0716796875]}],"label":"405B","method":"update"}],"direction":"down","showactive":true,"x":1.035,"xanchor":"left","y":0.6,"yanchor":"top"},{"active":0,"buttons":[{"args":[{"y":[[3.9879302978515625,3.9957427978515625,4.0113677978515625,4.0426177978515625],[3.9879302978515625,3.9957427978515625,4.0113677978515625,4.0426177978515625],[7.975860595703125,7.991485595703125,8.022735595703125,8.085235595703125],[3.564453125,12.12890625,44.2578125,168.515625],[13.296180725097656,13.307899475097656,13.331336975097656,13.378211975097656],[13.296180725097656,13.307899475097656,13.331336975097656,13.378211975097656],[26.592361450195312,26.615798950195312,26.662673950195312,26.756423950195312],[6.0732421875,18.708984375,63.66796875,232.3359375],[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125],[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125],[51.95806884765625,51.98931884765625,52.05181884765625,52.17681884765625],[9.25390625,28.5078125,97.015625,354.03125],[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625],[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625],[487.9542236328125,488.0167236328125,488.1417236328125,488.3917236328125],[46.2578125,142.515625,485.03125,1770.0625],[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625],[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625],[3039.9814453125,3040.1064453125,3040.3564453125,3040.8564453125],[145.703125,448.90625,1527.8125,5575.625]]}],"label":"None","method":"restyle"},{"args":[{"y":[[3.9879302978515625,3.9957427978515625,4.0113677978515625,4.0426177978515625],[3.9879302978515625,3.9957427978515625,4.0113677978515625,4.0426177978515625],[7.975860595703125,7.991485595703125,8.022735595703125,8.085235595703125],[1.064453125,2.12890625,4.2578125,8.515625],[13.296180725097656,13.307899475097656,13.331336975097656,13.378211975097656],[13.296180725097656,13.307899475097656,13.331336975097656,13.378211975097656],[26.592361450195312,26.615798950195312,26.662673950195312,26.756423950195312],[2.7919921875,5.583984375,11.16796875,22.3359375],[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125],[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125],[51.95806884765625,51.98931884765625,52.05181884765625,52.17681884765625],[4.25390625,8.5078125,17.015625,34.03125],[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625],[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625],[487.9542236328125,488.0167236328125,488.1417236328125,488.3917236328125],[21.2578125,42.515625,85.03125,170.0625],[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625],[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625],[3039.9814453125,3040.1064453125,3040.3564453125,3040.8564453125],[66.953125,133.90625,267.8125,535.625]]}],"label":"selective","method":"restyle"},{"args":[{"y":[[3.9879302978515625,3.9957427978515625,4.0113677978515625,4.0426177978515625],[3.9879302978515625,3.9957427978515625,4.0113677978515625,4.0426177978515625],[7.975860595703125,7.991485595703125,8.022735595703125,8.085235595703125],[0.064453125,0.12890625,0.2578125,0.515625],[13.296180725097656,13.307899475097656,13.331336975097656,13.378211975097656],[13.296180725097656,13.307899475097656,13.331336975097656,13.378211975097656],[26.592361450195312,26.615798950195312,26.662673950195312,26.756423950195312],[0.1669921875,0.333984375,0.66796875,1.3359375],[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125],[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125],[51.95806884765625,51.98931884765625,52.05181884765625,52.17681884765625],[0.25390625,0.5078125,1.015625,2.03125],[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625],[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625],[487.9542236328125,488.0167236328125,488.1417236328125,488.3917236328125],[1.2578125,2.515625,5.03125,10.0625],[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625],[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625],[3039.9814453125,3040.1064453125,3040.3564453125,3040.8564453125],[3.953125,7.90625,15.8125,31.625]]}],"label":"full","method":"restyle"}],"direction":"down","showactive":true,"x":1.035,"xanchor":"left","y":0.4,"yanchor":"top"}],"barmode":"stack","xaxis":{"title":{"text":"Sequence Length"}},"yaxis":{"title":{"text":"Memory (GB)"},"range":[0,203.1547058105469]},"width":850,"height":500,"annotations":[{"showarrow":false,"text":"Model Size:","x":1.035,"xanchor":"left","xref":"paper","y":0.6,"yanchor":"bottom","yref":"paper"},{"showarrow":false,"text":"Recomputation:","x":1.035,"xanchor":"left","xref":"paper","y":0.4,"yanchor":"bottom","yref":"paper"}]}, {"responsive": true, "scrollZoom": false} ) }; </script> </div>
|
src/fragments/memusage_activations.html
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
<div> <div id="121faba8-d8ec-447e-9ab1-9a8fa34c1f63" class="plotly-graph-div" style="height:400px; width:1000px;"></div> <script type="text/javascript"> window.PLOTLYENV=window.PLOTLYENV || {}; if (document.getElementById("121faba8-d8ec-447e-9ab1-9a8fa34c1f63")) { Plotly.newPlot( "121faba8-d8ec-447e-9ab1-9a8fa34c1f63", [{"legendgroup":"parameters","marker":{"color":"#4ea5b7"},"name":"parameters","showlegend":true,"x":["1024","2048","4096","8192","16384"],"y":[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125,26.213409423828125],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"gradients","marker":{"color":"#e889ab"},"name":"gradients","showlegend":true,"x":["1024","2048","4096","8192","16384"],"y":[25.979034423828125,25.994659423828125,26.025909423828125,26.088409423828125,26.213409423828125],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"optimizer states","marker":{"color":"#cec0fa"},"name":"optimizer states","showlegend":true,"x":["1024","2048","4096","8192","16384"],"y":[51.95806884765625,51.98931884765625,52.05181884765625,52.17681884765625,52.42681884765625],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"activations","marker":{"color":"#e38a42"},"name":"activations","showlegend":true,"x":["1024","2048","4096","8192","16384"],"y":[9.25390625,28.5078125,97.015625,354.03125,1348.0625],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"parameters","marker":{"color":"#4ea5b7"},"name":"parameters","showlegend":false,"x":["1024","2048","4096","8192","16384"],"y":[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625,244.44586181640625],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"gradients","marker":{"color":"#e889ab"},"name":"gradients","showlegend":false,"x":["1024","2048","4096","8192","16384"],"y":[243.97711181640625,244.00836181640625,244.07086181640625,244.19586181640625,244.44586181640625],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"optimizer states","marker":{"color":"#cec0fa"},"name":"optimizer states","showlegend":false,"x":["1024","2048","4096","8192","16384"],"y":[487.9542236328125,488.0167236328125,488.1417236328125,488.3917236328125,488.8917236328125],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"activations","marker":{"color":"#e38a42"},"name":"activations","showlegend":false,"x":["1024","2048","4096","8192","16384"],"y":[46.2578125,142.515625,485.03125,1770.0625,6740.125],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"parameters","marker":{"color":"#4ea5b7"},"name":"parameters","showlegend":false,"x":["1024","2048","4096","8192","16384"],"y":[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625,1520.92822265625],"type":"bar","xaxis":"x3","yaxis":"y3"},{"legendgroup":"gradients","marker":{"color":"#e889ab"},"name":"gradients","showlegend":false,"x":["1024","2048","4096","8192","16384"],"y":[1519.99072265625,1520.05322265625,1520.17822265625,1520.42822265625,1520.92822265625],"type":"bar","xaxis":"x3","yaxis":"y3"},{"legendgroup":"optimizer states","marker":{"color":"#cec0fa"},"name":"optimizer states","showlegend":false,"x":["1024","2048","4096","8192","16384"],"y":[3039.9814453125,3040.1064453125,3040.3564453125,3040.8564453125,3041.8564453125],"type":"bar","xaxis":"x3","yaxis":"y3"},{"legendgroup":"activations","marker":{"color":"#e38a42"},"name":"activations","showlegend":false,"x":["1024","2048","4096","8192","16384"],"y":[145.703125,448.90625,1527.8125,5575.625,21231.25],"type":"bar","xaxis":"x3","yaxis":"y3"}], {"template":{"data":{"histogram2dcontour":[{"type":"histogram2dcontour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"choropleth":[{"type":"choropleth","colorbar":{"outlinewidth":0,"ticks":""}}],"histogram2d":[{"type":"histogram2d","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmap":[{"type":"heatmap","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmapgl":[{"type":"heatmapgl","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"contourcarpet":[{"type":"contourcarpet","colorbar":{"outlinewidth":0,"ticks":""}}],"contour":[{"type":"contour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"surface":[{"type":"surface","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"mesh3d":[{"type":"mesh3d","colorbar":{"outlinewidth":0,"ticks":""}}],"scatter":[{"fillpattern":{"fillmode":"overlay","size":10,"solidity":0.2},"type":"scatter"}],"parcoords":[{"type":"parcoords","line":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolargl":[{"type":"scatterpolargl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"bar":[{"error_x":{"color":"#2a3f5f"},"error_y":{"color":"#2a3f5f"},"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"scattergeo":[{"type":"scattergeo","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolar":[{"type":"scatterpolar","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"scattergl":[{"type":"scattergl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatter3d":[{"type":"scatter3d","line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattermapbox":[{"type":"scattermapbox","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterternary":[{"type":"scatterternary","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattercarpet":[{"type":"scattercarpet","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"carpet":[{"aaxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"baxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"type":"carpet"}],"table":[{"cells":{"fill":{"color":"#EBF0F8"},"line":{"color":"white"}},"header":{"fill":{"color":"#C8D4E3"},"line":{"color":"white"}},"type":"table"}],"barpolar":[{"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"pie":[{"automargin":true,"type":"pie"}]},"layout":{"autotypenumbers":"strict","colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#2a3f5f"},"hovermode":"closest","hoverlabel":{"align":"left"},"paper_bgcolor":"white","plot_bgcolor":"#E5ECF6","polar":{"bgcolor":"#E5ECF6","angularaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"radialaxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"ternary":{"bgcolor":"#E5ECF6","aaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"baxis":{"gridcolor":"white","linecolor":"white","ticks":""},"caxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]]},"xaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"yaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"scene":{"xaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"yaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"zaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2}},"shapedefaults":{"line":{"color":"#2a3f5f"}},"annotationdefaults":{"arrowcolor":"#2a3f5f","arrowhead":0,"arrowwidth":1},"geo":{"bgcolor":"white","landcolor":"#E5ECF6","subunitcolor":"white","showland":true,"showlakes":true,"lakecolor":"white"},"title":{"x":0.05},"mapbox":{"style":"light"}}},"xaxis":{"anchor":"y","domain":[0.0,0.2888888888888889],"showgrid":true,"gridwidth":1,"gridcolor":"LightGray"},"yaxis":{"anchor":"x","domain":[0.0,1.0],"title":{"text":"GB memory"},"showgrid":true,"gridwidth":1,"gridcolor":"LightGray"},"xaxis2":{"anchor":"y2","domain":[0.35555555555555557,0.6444444444444445],"showgrid":true,"gridwidth":1,"gridcolor":"LightGray"},"yaxis2":{"anchor":"x2","domain":[0.0,1.0],"showgrid":true,"gridwidth":1,"gridcolor":"LightGray"},"xaxis3":{"anchor":"y3","domain":[0.7111111111111111,1.0],"showgrid":true,"gridwidth":1,"gridcolor":"LightGray"},"yaxis3":{"anchor":"x3","domain":[0.0,1.0],"showgrid":true,"gridwidth":1,"gridcolor":"LightGray"},"annotations":[{"font":{"size":16},"showarrow":false,"text":"Meta-Llama-3.1-8B","x":0.14444444444444446,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"Meta-Llama-3.1-70B","x":0.5,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"Meta-Llama-3.1-405B","x":0.8555555555555556,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"}],"barmode":"stack","width":1000,"height":400,"legend":{"title":{}}}, {"responsive": true} ) }; </script> </div>
|
src/fragments/pp_bubblesize.html
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
<div> <div id="3997052d-7390-4347-abf3-6c8b08e53c31" class="plotly-graph-div" style="height:650px; width:900px;"></div> <script type="text/javascript"> window.PLOTLYENV=window.PLOTLYENV || {}; if (document.getElementById("3997052d-7390-4347-abf3-6c8b08e53c31")) { Plotly.newPlot( "3997052d-7390-4347-abf3-6c8b08e53c31", [{"marker":{"color":"#4ea5b7"},"orientation":"h","text":["0.03","0.05","0.05","0.11","0.11","0.11","0.22","0.22","0.22","0.22","0.44","0.44","0.44","0.44","0.88","0.88","0.88","0.88","1.75","1.75","1.75","3.50","3.50","7.00"],"textposition":"outside","x":[0.02734375,0.0546875,0.0546875,0.109375,0.109375,0.109375,0.21875,0.21875,0.21875,0.21875,0.4375,0.4375,0.4375,0.4375,0.875,0.875,0.875,0.875,1.75,1.75,1.75,3.5,3.5,7.0],"y":["m=32, v=8","m=16, v=8","m=32, v=4","m=32, v=2","m=16, v=4","m=8, v=8","m=32, v=1","m=16, v=2","m=8, v=4","m=4, v=8","m=4, v=4","m=8, v=2","m=2, v=8","m=16, v=1","m=8, v=1","m=2, v=4","m=1, v=8","m=4, v=2","m=4, v=1","m=2, v=2","m=1, v=4","m=2, v=1","m=1, v=2","m=1, v=1"],"type":"bar"}], {"template":{"data":{"histogram2dcontour":[{"type":"histogram2dcontour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"choropleth":[{"type":"choropleth","colorbar":{"outlinewidth":0,"ticks":""}}],"histogram2d":[{"type":"histogram2d","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmap":[{"type":"heatmap","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmapgl":[{"type":"heatmapgl","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"contourcarpet":[{"type":"contourcarpet","colorbar":{"outlinewidth":0,"ticks":""}}],"contour":[{"type":"contour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"surface":[{"type":"surface","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"mesh3d":[{"type":"mesh3d","colorbar":{"outlinewidth":0,"ticks":""}}],"scatter":[{"fillpattern":{"fillmode":"overlay","size":10,"solidity":0.2},"type":"scatter"}],"parcoords":[{"type":"parcoords","line":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolargl":[{"type":"scatterpolargl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"bar":[{"error_x":{"color":"#2a3f5f"},"error_y":{"color":"#2a3f5f"},"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"scattergeo":[{"type":"scattergeo","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolar":[{"type":"scatterpolar","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"scattergl":[{"type":"scattergl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatter3d":[{"type":"scatter3d","line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattermapbox":[{"type":"scattermapbox","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterternary":[{"type":"scatterternary","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattercarpet":[{"type":"scattercarpet","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"carpet":[{"aaxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"baxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"type":"carpet"}],"table":[{"cells":{"fill":{"color":"#EBF0F8"},"line":{"color":"white"}},"header":{"fill":{"color":"#C8D4E3"},"line":{"color":"white"}},"type":"table"}],"barpolar":[{"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"pie":[{"automargin":true,"type":"pie"}]},"layout":{"autotypenumbers":"strict","colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#2a3f5f"},"hovermode":"closest","hoverlabel":{"align":"left"},"paper_bgcolor":"white","plot_bgcolor":"#E5ECF6","polar":{"bgcolor":"#E5ECF6","angularaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"radialaxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"ternary":{"bgcolor":"#E5ECF6","aaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"baxis":{"gridcolor":"white","linecolor":"white","ticks":""},"caxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]]},"xaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"yaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"scene":{"xaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"yaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"zaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2}},"shapedefaults":{"line":{"color":"#2a3f5f"}},"annotationdefaults":{"arrowcolor":"#2a3f5f","arrowhead":0,"arrowwidth":1},"geo":{"bgcolor":"white","landcolor":"#E5ECF6","subunitcolor":"white","showland":true,"showlakes":true,"lakecolor":"white"},"title":{"x":0.05},"mapbox":{"style":"light"}}},"yaxis":{"tickmode":"array","tickvals":["m=32, v=8","m=16, v=8","m=32, v=4","m=32, v=2","m=16, v=4","m=8, v=8","m=32, v=1","m=16, v=2","m=8, v=4","m=4, v=8","m=4, v=4","m=8, v=2","m=2, v=8","m=16, v=1","m=8, v=1","m=2, v=4","m=1, v=8","m=4, v=2","m=4, v=1","m=2, v=2","m=1, v=4","m=2, v=1","m=1, v=2","m=1, v=1"],"ticktext":["m=32, v=8","m=16, v=8","m=32, v=4","m=32, v=2","m=16, v=4","m=8, v=8","m=32, v=1","m=16, v=2","m=8, v=4","m=4, v=8","m=4, v=4","m=8, v=2","m=2, v=8","m=16, v=1","m=8, v=1","m=2, v=4","m=1, v=8","m=4, v=2","m=4, v=1","m=2, v=2","m=1, v=4","m=2, v=1","m=1, v=2","m=1, v=1"],"title":{"text":"PP configuration"}},"margin":{"l":150,"r":100,"t":100,"b":100},"title":{"text":"Bubble size for PP=8"},"xaxis":{"title":{"text":"Bubble size"}},"width":900,"height":650}, {"responsive": true} ) }; </script> </div>
|
src/fragments/pp_comm_bandwidth.html
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
<div> <div id="91842954-4418-43b7-bc03-81a893ff71fe" class="plotly-graph-div" style="height:410px; width:1000px;"></div> <script type="text/javascript"> window.PLOTLYENV=window.PLOTLYENV || {}; if (document.getElementById("91842954-4418-43b7-bc03-81a893ff71fe")) { Plotly.newPlot( "91842954-4418-43b7-bc03-81a893ff71fe", [{"fill":"toself","fillcolor":"rgba(78,165,183,0.2)","hoverinfo":"skip","line":{"color":"rgba(255,255,255,0)"},"showlegend":false,"x":[0,1,2,3,4,5,6,6,5,4,3,2,1,0],"y":[461.538,436.13,219.09099999999998,192.73399999999992,188.42249999999999,194.227,177.35999999999999,12.493500000000001,24.965,31.34,41.587,44.509,302.33500000000004,422.288],"type":"scatter"},{"line":{"color":"#4ea5b7","width":3},"marker":{"size":10,"symbol":"circle"},"mode":"lines+markers+text","name":"AllReduce","text":["436.0","361.7","160.1","99.6","84.7","64.9","32.9"],"textposition":"bottom center","x":[0,1,2,3,4,5,6],"y":[435.9668115942029,361.74920529801324,160.13950738916256,99.56427561837455,84.74052884615384,64.92543661971831,32.937847222222224],"type":"scatter"},{"fill":"toself","fillcolor":"rgba(232,137,171,0.2)","hoverinfo":"skip","line":{"color":"rgba(255,255,255,0)"},"showlegend":false,"x":[0,1,2,3,4,5,6,6,5,4,3,2,1,0],"y":[264.93,226.26999999999998,229.40999999999997,178.47899999999998,126.6575,77.026,44.4165,6.1535,12.314,24.525000000000002,47.147,40.757000000000005,147.97500000000002,239.55200000000002],"type":"scatter"},{"line":{"color":"#e889ab","width":3},"marker":{"size":10,"symbol":"square"},"mode":"lines+markers","name":"AllGather","x":[0,1,2,3,4,5,6],"y":[249.84884057971013,184.61324503311258,118.96753694581281,68.99752650176679,54.972283653846155,27.969183098591547,11.038298611111111],"type":"scatter"},{"fill":"toself","fillcolor":"rgba(206,192,250,0.2)","hoverinfo":"skip","line":{"color":"rgba(255,255,255,0)"},"showlegend":false,"x":[0,1,2,3,4,5,6,6,5,4,3,2,1,0],"y":[264.64599999999996,226.37,215.492,177.54299999999998,126.4825,77.289,45.1295,6.1005,12.39,24.544999999999998,46.802,41.176,146.1,240.804],"type":"scatter"},{"line":{"color":"#cec0fa","width":3},"marker":{"size":10,"symbol":"triangle-up"},"mode":"lines+markers","name":"ReduceScatter","x":[0,1,2,3,4,5,6],"y":[249.72898550724636,181.5535761589404,115.7576354679803,68.55106007067138,54.524230769230776,27.944281690140844,11.069652777777778],"type":"scatter"}], {"template":{"data":{"histogram2dcontour":[{"type":"histogram2dcontour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"choropleth":[{"type":"choropleth","colorbar":{"outlinewidth":0,"ticks":""}}],"histogram2d":[{"type":"histogram2d","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmap":[{"type":"heatmap","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmapgl":[{"type":"heatmapgl","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"contourcarpet":[{"type":"contourcarpet","colorbar":{"outlinewidth":0,"ticks":""}}],"contour":[{"type":"contour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"surface":[{"type":"surface","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"mesh3d":[{"type":"mesh3d","colorbar":{"outlinewidth":0,"ticks":""}}],"scatter":[{"fillpattern":{"fillmode":"overlay","size":10,"solidity":0.2},"type":"scatter"}],"parcoords":[{"type":"parcoords","line":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolargl":[{"type":"scatterpolargl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"bar":[{"error_x":{"color":"#2a3f5f"},"error_y":{"color":"#2a3f5f"},"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"scattergeo":[{"type":"scattergeo","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolar":[{"type":"scatterpolar","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"scattergl":[{"type":"scattergl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatter3d":[{"type":"scatter3d","line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattermapbox":[{"type":"scattermapbox","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterternary":[{"type":"scatterternary","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattercarpet":[{"type":"scattercarpet","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"carpet":[{"aaxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"baxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"type":"carpet"}],"table":[{"cells":{"fill":{"color":"#EBF0F8"},"line":{"color":"white"}},"header":{"fill":{"color":"#C8D4E3"},"line":{"color":"white"}},"type":"table"}],"barpolar":[{"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"pie":[{"automargin":true,"type":"pie"}]},"layout":{"autotypenumbers":"strict","colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#2a3f5f"},"hovermode":"closest","hoverlabel":{"align":"left"},"paper_bgcolor":"white","plot_bgcolor":"#E5ECF6","polar":{"bgcolor":"#E5ECF6","angularaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"radialaxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"ternary":{"bgcolor":"#E5ECF6","aaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"baxis":{"gridcolor":"white","linecolor":"white","ticks":""},"caxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]]},"xaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"yaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"scene":{"xaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"yaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"zaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2}},"shapedefaults":{"line":{"color":"#2a3f5f"}},"annotationdefaults":{"arrowcolor":"#2a3f5f","arrowhead":0,"arrowwidth":1},"geo":{"bgcolor":"white","landcolor":"#E5ECF6","subunitcolor":"white","showland":true,"showlakes":true,"lakecolor":"white"},"title":{"x":0.05},"mapbox":{"style":"light"}}},"xaxis":{"title":{"text":"Number of Nodes"},"tickmode":"array","tickvals":[0,1,2,3,4,5,6],"ticktext":["1","2","4","8","16","32","64"]},"yaxis":{"title":{"text":"Bandwidth (GB\u002fs)"},"range":[0,480.0],"gridcolor":"rgba(0,0,0,0.1)"},"legend":{"x":0.85,"y":1,"bgcolor":"rgba(255,255,255,0.5)"},"margin":{"l":80,"r":80,"t":80,"b":80},"title":{"text":"Communication Bandwidth by Number of Nodes (size=256MB)"},"width":1000,"height":410}, {"responsive": true} ) }; </script> </div>
|
src/fragments/pp_memoryusage.html
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
<div> <div id="8db330ae-52ab-4193-8ee5-dfa289fe4609" class="plotly-graph-div" style="height:410px; width:800px;"></div> <script type="text/javascript"> window.PLOTLYENV=window.PLOTLYENV || {}; if (document.getElementById("8db330ae-52ab-4193-8ee5-dfa289fe4609")) { Plotly.newPlot( "8db330ae-52ab-4193-8ee5-dfa289fe4609", [{"legendgroup":"Model Parameters","marker":{"color":"#4ea5b7"},"name":"Model Parameters","showlegend":true,"x":["1024","4096","16384"],"y":[14.95703125,14.95703125,14.95703125],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"Gradients","marker":{"color":"#e889ab"},"name":"Gradients","showlegend":true,"x":["1024","4096","16384"],"y":[14.95703125,14.95703125,14.95703125],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"Optimizer States","marker":{"color":"#cec0fa"},"name":"Optimizer States","showlegend":true,"x":["1024","4096","16384"],"y":[59.828125,59.828125,59.828125],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"Activations","marker":{"color":"#e38a42"},"name":"Activations","showlegend":true,"x":["1024","4096","16384"],"y":[4.25,17.0,68.0],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"Model Parameters","marker":{"color":"#4ea5b7"},"name":"Model Parameters","showlegend":false,"x":["1024","4096","16384"],"y":[2.603515625,2.603515625,2.603515625],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"Gradients","marker":{"color":"#e889ab"},"name":"Gradients","showlegend":false,"x":["1024","4096","16384"],"y":[2.603515625,2.603515625,2.603515625],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"Optimizer States","marker":{"color":"#cec0fa"},"name":"Optimizer States","showlegend":false,"x":["1024","4096","16384"],"y":[10.4140625,10.4140625,10.4140625],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"Activations","marker":{"color":"#e38a42"},"name":"Activations","showlegend":false,"x":["1024","4096","16384"],"y":[4.25,17.0,68.0],"type":"bar","xaxis":"x2","yaxis":"y2"}], {"template":{"data":{"histogram2dcontour":[{"type":"histogram2dcontour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"choropleth":[{"type":"choropleth","colorbar":{"outlinewidth":0,"ticks":""}}],"histogram2d":[{"type":"histogram2d","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmap":[{"type":"heatmap","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmapgl":[{"type":"heatmapgl","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"contourcarpet":[{"type":"contourcarpet","colorbar":{"outlinewidth":0,"ticks":""}}],"contour":[{"type":"contour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"surface":[{"type":"surface","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"mesh3d":[{"type":"mesh3d","colorbar":{"outlinewidth":0,"ticks":""}}],"scatter":[{"fillpattern":{"fillmode":"overlay","size":10,"solidity":0.2},"type":"scatter"}],"parcoords":[{"type":"parcoords","line":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolargl":[{"type":"scatterpolargl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"bar":[{"error_x":{"color":"#2a3f5f"},"error_y":{"color":"#2a3f5f"},"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"scattergeo":[{"type":"scattergeo","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolar":[{"type":"scatterpolar","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"scattergl":[{"type":"scattergl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatter3d":[{"type":"scatter3d","line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattermapbox":[{"type":"scattermapbox","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterternary":[{"type":"scatterternary","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattercarpet":[{"type":"scattercarpet","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"carpet":[{"aaxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"baxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"type":"carpet"}],"table":[{"cells":{"fill":{"color":"#EBF0F8"},"line":{"color":"white"}},"header":{"fill":{"color":"#C8D4E3"},"line":{"color":"white"}},"type":"table"}],"barpolar":[{"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"pie":[{"automargin":true,"type":"pie"}]},"layout":{"autotypenumbers":"strict","colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#2a3f5f"},"hovermode":"closest","hoverlabel":{"align":"left"},"paper_bgcolor":"white","plot_bgcolor":"#E5ECF6","polar":{"bgcolor":"#E5ECF6","angularaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"radialaxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"ternary":{"bgcolor":"#E5ECF6","aaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"baxis":{"gridcolor":"white","linecolor":"white","ticks":""},"caxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]]},"xaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"yaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"scene":{"xaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"yaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"zaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2}},"shapedefaults":{"line":{"color":"#2a3f5f"}},"annotationdefaults":{"arrowcolor":"#2a3f5f","arrowhead":0,"arrowwidth":1},"geo":{"bgcolor":"white","landcolor":"#E5ECF6","subunitcolor":"white","showland":true,"showlakes":true,"lakecolor":"white"},"title":{"x":0.05},"mapbox":{"style":"light"}}},"xaxis":{"anchor":"y","domain":[0.0,0.45],"title":{"text":"Sequence Length"},"showgrid":true,"gridcolor":"LightGray"},"yaxis":{"anchor":"x","domain":[0.0,1.0],"range":[0,150],"dtick":20,"showgrid":true,"gridcolor":"LightGray","title":{"text":"Memory Usage (GB)"}},"xaxis2":{"anchor":"y2","domain":[0.55,1.0],"title":{"text":"Sequence Length"},"showgrid":true,"gridcolor":"LightGray"},"yaxis2":{"anchor":"x2","domain":[0.0,1.0],"matches":"y","showticklabels":false,"range":[0,150],"dtick":20,"showgrid":true,"gridcolor":"LightGray"},"annotations":[{"font":{"size":16},"showarrow":false,"text":"No Parallelism","x":0.225,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"PP=8","x":0.775,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"}],"shapes":[{"line":{"color":"red","dash":"dash"},"type":"line","x0":0,"x1":1,"xref":"x domain","y0":80,"y1":80,"yref":"y"},{"line":{"color":"red","dash":"dash"},"type":"line","x0":0,"x1":1,"xref":"x2 domain","y0":80,"y1":80,"yref":"y2"}],"title":{"text":"Memory Usage for 8B Model"},"legend":{"orientation":"v","x":1.02,"y":0.5},"margin":{"r":150},"barmode":"stack","width":800,"height":410}, {"responsive": true} ) }; </script> </div>
|
src/fragments/tp_memoryusage.html
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
<div> <div id="eed45cdb-8f9e-4c54-b31f-85062d3362e8" class="plotly-graph-div" style="height:400px; width:1000px;"></div> <script type="text/javascript"> window.PLOTLYENV=window.PLOTLYENV || {}; if (document.getElementById("eed45cdb-8f9e-4c54-b31f-85062d3362e8")) { Plotly.newPlot( "eed45cdb-8f9e-4c54-b31f-85062d3362e8", [{"legendgroup":"Model Parameters","marker":{"color":"#4ea5b7"},"name":"Model Parameters","showlegend":true,"x":["1024","4096","16384"],"y":[131.5,131.5,131.5],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"Gradients","marker":{"color":"#e889ab"},"name":"Gradients","showlegend":true,"x":["1024","4096","16384"],"y":[131.5,131.5,131.5],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"Optimizer States","marker":{"color":"#cec0fa"},"name":"Optimizer States","showlegend":true,"x":["1024","4096","16384"],"y":[526.0,526.0,526.0],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"Activations","marker":{"color":"#e38a42"},"name":"Activations","showlegend":true,"x":["1024","4096","16384"],"y":[21.25,85.0,340.0],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"Model Parameters","marker":{"color":"#4ea5b7"},"name":"Model Parameters","showlegend":false,"x":["1024","4096","16384"],"y":[16.4375,16.4375,16.4375],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"Gradients","marker":{"color":"#e889ab"},"name":"Gradients","showlegend":false,"x":["1024","4096","16384"],"y":[16.4375,16.4375,16.4375],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"Optimizer States","marker":{"color":"#cec0fa"},"name":"Optimizer States","showlegend":false,"x":["1024","4096","16384"],"y":[65.75,65.75,65.75],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"Activations","marker":{"color":"#e38a42"},"name":"Activations","showlegend":false,"x":["1024","4096","16384"],"y":[8.125,32.5,130.0],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"Model Parameters","marker":{"color":"#4ea5b7"},"name":"Model Parameters","showlegend":false,"x":["1024","4096","16384"],"y":[8.21875,8.21875,8.21875],"type":"bar","xaxis":"x3","yaxis":"y3"},{"legendgroup":"Gradients","marker":{"color":"#e889ab"},"name":"Gradients","showlegend":false,"x":["1024","4096","16384"],"y":[8.21875,8.21875,8.21875],"type":"bar","xaxis":"x3","yaxis":"y3"},{"legendgroup":"Optimizer States","marker":{"color":"#cec0fa"},"name":"Optimizer States","showlegend":false,"x":["1024","4096","16384"],"y":[32.875,32.875,32.875],"type":"bar","xaxis":"x3","yaxis":"y3"},{"legendgroup":"Activations","marker":{"color":"#e38a42"},"name":"Activations","showlegend":false,"x":["1024","4096","16384"],"y":[7.1875,28.75,115.0],"type":"bar","xaxis":"x3","yaxis":"y3"}], {"template":{"data":{"histogram2dcontour":[{"type":"histogram2dcontour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"choropleth":[{"type":"choropleth","colorbar":{"outlinewidth":0,"ticks":""}}],"histogram2d":[{"type":"histogram2d","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmap":[{"type":"heatmap","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmapgl":[{"type":"heatmapgl","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"contourcarpet":[{"type":"contourcarpet","colorbar":{"outlinewidth":0,"ticks":""}}],"contour":[{"type":"contour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"surface":[{"type":"surface","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"mesh3d":[{"type":"mesh3d","colorbar":{"outlinewidth":0,"ticks":""}}],"scatter":[{"fillpattern":{"fillmode":"overlay","size":10,"solidity":0.2},"type":"scatter"}],"parcoords":[{"type":"parcoords","line":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolargl":[{"type":"scatterpolargl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"bar":[{"error_x":{"color":"#2a3f5f"},"error_y":{"color":"#2a3f5f"},"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"scattergeo":[{"type":"scattergeo","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolar":[{"type":"scatterpolar","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"scattergl":[{"type":"scattergl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatter3d":[{"type":"scatter3d","line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattermapbox":[{"type":"scattermapbox","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterternary":[{"type":"scatterternary","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattercarpet":[{"type":"scattercarpet","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"carpet":[{"aaxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"baxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"type":"carpet"}],"table":[{"cells":{"fill":{"color":"#EBF0F8"},"line":{"color":"white"}},"header":{"fill":{"color":"#C8D4E3"},"line":{"color":"white"}},"type":"table"}],"barpolar":[{"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"pie":[{"automargin":true,"type":"pie"}]},"layout":{"autotypenumbers":"strict","colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#2a3f5f"},"hovermode":"closest","hoverlabel":{"align":"left"},"paper_bgcolor":"white","plot_bgcolor":"#E5ECF6","polar":{"bgcolor":"#E5ECF6","angularaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"radialaxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"ternary":{"bgcolor":"#E5ECF6","aaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"baxis":{"gridcolor":"white","linecolor":"white","ticks":""},"caxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]]},"xaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"yaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"scene":{"xaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"yaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"zaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2}},"shapedefaults":{"line":{"color":"#2a3f5f"}},"annotationdefaults":{"arrowcolor":"#2a3f5f","arrowhead":0,"arrowwidth":1},"geo":{"bgcolor":"white","landcolor":"#E5ECF6","subunitcolor":"white","showland":true,"showlakes":true,"lakecolor":"white"},"title":{"x":0.05},"mapbox":{"style":"light"}}},"xaxis":{"anchor":"y","domain":[0.0,0.2888888888888889],"title":{"text":"Sequence Length"},"showgrid":true,"gridcolor":"LightGray"},"yaxis":{"anchor":"x","domain":[0.0,1.0],"range":[0,150],"dtick":20,"title":{"text":"Memory Usage (GB)"},"showgrid":true,"gridcolor":"LightGray"},"xaxis2":{"anchor":"y2","domain":[0.35555555555555557,0.6444444444444445],"title":{"text":"Sequence Length"},"showgrid":true,"gridcolor":"LightGray"},"yaxis2":{"anchor":"x2","domain":[0.0,1.0],"matches":"y","showticklabels":false,"range":[0,150],"dtick":20,"showgrid":true,"gridcolor":"LightGray"},"xaxis3":{"anchor":"y3","domain":[0.7111111111111111,1.0],"title":{"text":"Sequence Length"},"showgrid":true,"gridcolor":"LightGray"},"yaxis3":{"anchor":"x3","domain":[0.0,1.0],"matches":"y","showticklabels":false,"range":[0,150],"dtick":20,"showgrid":true,"gridcolor":"LightGray"},"annotations":[{"font":{"size":16},"showarrow":false,"text":"No Parallelism (TP-1)","x":0.14444444444444446,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"TP=8","x":0.5,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"TP=16","x":0.8555555555555556,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"}],"shapes":[{"line":{"color":"red","dash":"dash"},"type":"line","x0":0,"x1":1,"xref":"x domain","y0":80,"y1":80,"yref":"y"},{"line":{"color":"red","dash":"dash"},"type":"line","x0":0,"x1":1,"xref":"x2 domain","y0":80,"y1":80,"yref":"y2"},{"line":{"color":"red","dash":"dash"},"type":"line","x0":0,"x1":1,"xref":"x3 domain","y0":80,"y1":80,"yref":"y3"}],"title":{"text":"Memory Usage for 70B Model"},"legend":{"orientation":"v","x":1.02,"y":0.5},"margin":{"r":150},"barmode":"stack","width":1000,"height":400}, {"responsive": true} ) }; </script> </div>
|
src/fragments/tp_scaling.html
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
<div> <div id="8de1ceac-2d60-4368-8691-415db492ed15" class="plotly-graph-div" style="height:400px; width:1000px;"></div> <script type="text/javascript"> window.PLOTLYENV=window.PLOTLYENV || {}; if (document.getElementById("8de1ceac-2d60-4368-8691-415db492ed15")) { Plotly.newPlot( "8de1ceac-2d60-4368-8691-415db492ed15", [{"marker":{"color":"#4ea5b7"},"name":"Tokens\u002fsec\u002fGPU","width":0.7,"x":["2","4","8","16","32"],"y":[13923.18,12420.76,10903.32,6245.6,2146.44],"type":"bar","xaxis":"x","yaxis":"y"},{"base":[12420.76],"marker":{"color":"#e889ab"},"name":"Performance Drop","showlegend":true,"width":0.0875,"x":["4"],"y":[1502.42],"type":"bar","xaxis":"x","yaxis":"y"},{"base":[10903.32],"marker":{"color":"#e889ab"},"showlegend":false,"width":0.0875,"x":["8"],"y":[1517.4400000000005],"type":"bar","xaxis":"x","yaxis":"y"},{"base":[6245.6],"marker":{"color":"#e889ab"},"showlegend":false,"width":0.0875,"x":["16"],"y":[4657.719999999999],"type":"bar","xaxis":"x","yaxis":"y"},{"base":[2146.44],"marker":{"color":"#e889ab"},"showlegend":false,"width":0.0875,"x":["32"],"y":[4099.16],"type":"bar","xaxis":"x","yaxis":"y"},{"marker":{"color":"#cec0fa"},"name":"Max Batch Size","text":["3","8","12","16","20"],"textposition":"inside","width":0.7,"x":["2","4","8","16","32"],"y":[3,8,12,16,20],"type":"bar","xaxis":"x2","yaxis":"y2"}], {"template":{"data":{"histogram2dcontour":[{"type":"histogram2dcontour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"choropleth":[{"type":"choropleth","colorbar":{"outlinewidth":0,"ticks":""}}],"histogram2d":[{"type":"histogram2d","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmap":[{"type":"heatmap","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmapgl":[{"type":"heatmapgl","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"contourcarpet":[{"type":"contourcarpet","colorbar":{"outlinewidth":0,"ticks":""}}],"contour":[{"type":"contour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"surface":[{"type":"surface","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"mesh3d":[{"type":"mesh3d","colorbar":{"outlinewidth":0,"ticks":""}}],"scatter":[{"fillpattern":{"fillmode":"overlay","size":10,"solidity":0.2},"type":"scatter"}],"parcoords":[{"type":"parcoords","line":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolargl":[{"type":"scatterpolargl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"bar":[{"error_x":{"color":"#2a3f5f"},"error_y":{"color":"#2a3f5f"},"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"scattergeo":[{"type":"scattergeo","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolar":[{"type":"scatterpolar","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"scattergl":[{"type":"scattergl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatter3d":[{"type":"scatter3d","line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattermapbox":[{"type":"scattermapbox","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterternary":[{"type":"scatterternary","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattercarpet":[{"type":"scattercarpet","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"carpet":[{"aaxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"baxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"type":"carpet"}],"table":[{"cells":{"fill":{"color":"#EBF0F8"},"line":{"color":"white"}},"header":{"fill":{"color":"#C8D4E3"},"line":{"color":"white"}},"type":"table"}],"barpolar":[{"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"pie":[{"automargin":true,"type":"pie"}]},"layout":{"autotypenumbers":"strict","colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#2a3f5f"},"hovermode":"closest","hoverlabel":{"align":"left"},"paper_bgcolor":"white","plot_bgcolor":"#E5ECF6","polar":{"bgcolor":"#E5ECF6","angularaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"radialaxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"ternary":{"bgcolor":"#E5ECF6","aaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"baxis":{"gridcolor":"white","linecolor":"white","ticks":""},"caxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]]},"xaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"yaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"scene":{"xaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"yaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"zaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2}},"shapedefaults":{"line":{"color":"#2a3f5f"}},"annotationdefaults":{"arrowcolor":"#2a3f5f","arrowhead":0,"arrowwidth":1},"geo":{"bgcolor":"white","landcolor":"#E5ECF6","subunitcolor":"white","showland":true,"showlakes":true,"lakecolor":"white"},"title":{"x":0.05},"mapbox":{"style":"light"}}},"xaxis":{"anchor":"y","domain":[0.0,0.45],"title":{"text":"Tensor Parallelism (TP)"},"showgrid":true,"gridcolor":"LightGray"},"yaxis":{"anchor":"x","domain":[0.0,1.0],"title":{"text":"Tokens\u002fsec\u002fGPU"},"showgrid":true,"gridcolor":"LightGray"},"xaxis2":{"anchor":"y2","domain":[0.55,1.0],"title":{"text":"Tensor Parallelism (TP)"},"showgrid":true,"gridcolor":"LightGray"},"yaxis2":{"anchor":"x2","domain":[0.0,1.0],"title":{"text":"Maximum Batch Size"},"showgrid":true,"gridcolor":"LightGray"},"annotations":[{"font":{"size":16},"showarrow":false,"text":"Throughput Scaling with TP (3B Model)","x":0.225,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"Maximum Batch Size per TP Value","x":0.775,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"},{"font":{"color":"#e889ab"},"showarrow":false,"text":"-10.8%","x":1,"xanchor":"center","xref":"x","xshift":30,"y":13171.970000000001,"yanchor":"middle","yref":"y"},{"font":{"color":"#e889ab"},"showarrow":false,"text":"-12.2%","x":2,"xanchor":"center","xref":"x","xshift":30,"y":11662.04,"yanchor":"middle","yref":"y"},{"font":{"color":"#e889ab"},"showarrow":false,"text":"-42.7%","x":3,"xanchor":"center","xref":"x","xshift":30,"y":8574.46,"yanchor":"middle","yref":"y"},{"font":{"color":"#e889ab"},"showarrow":false,"text":"-65.6%","x":4,"xanchor":"center","xref":"x","xshift":30,"y":4196.02,"yanchor":"middle","yref":"y"}],"legend":{"x":0.55,"y":1.0},"width":1000,"height":400,"barmode":"stack"}, {"responsive": true} ) }; </script> </div>
|
src/fragments/tp_sp_memoryusage.html
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
<div> <div id="16242941-9fab-40ed-996e-ddd93a5a627c" class="plotly-graph-div" style="height:410px; width:1000px;"></div> <script type="text/javascript"> window.PLOTLYENV=window.PLOTLYENV || {}; if (document.getElementById("16242941-9fab-40ed-996e-ddd93a5a627c")) { Plotly.newPlot( "16242941-9fab-40ed-996e-ddd93a5a627c", [{"legendgroup":"Model Parameters","marker":{"color":"#4ea5b7"},"name":"Model Parameters","showlegend":true,"x":["1024","4096","16384"],"y":[131.5,131.5,131.5],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"Model Parameters","marker":{"color":"#4ea5b7"},"name":"Model Parameters","showlegend":false,"x":["1024","4096","16384"],"y":[16.4375,16.4375,16.4375],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"Model Parameters","marker":{"color":"#4ea5b7"},"name":"Model Parameters","showlegend":false,"x":["1024","4096","16384"],"y":[8.21875,8.21875,8.21875],"type":"bar","xaxis":"x3","yaxis":"y3"},{"legendgroup":"Gradients","marker":{"color":"#e889ab"},"name":"Gradients","showlegend":true,"x":["1024","4096","16384"],"y":[131.5,131.5,131.5],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"Gradients","marker":{"color":"#e889ab"},"name":"Gradients","showlegend":false,"x":["1024","4096","16384"],"y":[16.4375,16.4375,16.4375],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"Gradients","marker":{"color":"#e889ab"},"name":"Gradients","showlegend":false,"x":["1024","4096","16384"],"y":[8.21875,8.21875,8.21875],"type":"bar","xaxis":"x3","yaxis":"y3"},{"legendgroup":"Optimizer States","marker":{"color":"#cec0fa"},"name":"Optimizer States","showlegend":true,"x":["1024","4096","16384"],"y":[526.0,526.0,526.0],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"Optimizer States","marker":{"color":"#cec0fa"},"name":"Optimizer States","showlegend":false,"x":["1024","4096","16384"],"y":[65.75,65.75,65.75],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"Optimizer States","marker":{"color":"#cec0fa"},"name":"Optimizer States","showlegend":false,"x":["1024","4096","16384"],"y":[32.875,32.875,32.875],"type":"bar","xaxis":"x3","yaxis":"y3"},{"legendgroup":"Activations","marker":{"color":"#e38a42"},"name":"Activations","showlegend":true,"x":["1024","4096","16384"],"y":[21.25,85.0,340.0],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"Activations","marker":{"color":"#e38a42"},"name":"Activations","showlegend":false,"x":["1024","4096","16384"],"y":[2.65625,10.625,42.5],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"Activations","marker":{"color":"#e38a42"},"name":"Activations","showlegend":false,"x":["1024","4096","16384"],"y":[1.328125,5.3125,21.25],"type":"bar","xaxis":"x3","yaxis":"y3"}], {"template":{"data":{"histogram2dcontour":[{"type":"histogram2dcontour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"choropleth":[{"type":"choropleth","colorbar":{"outlinewidth":0,"ticks":""}}],"histogram2d":[{"type":"histogram2d","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmap":[{"type":"heatmap","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmapgl":[{"type":"heatmapgl","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"contourcarpet":[{"type":"contourcarpet","colorbar":{"outlinewidth":0,"ticks":""}}],"contour":[{"type":"contour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"surface":[{"type":"surface","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"mesh3d":[{"type":"mesh3d","colorbar":{"outlinewidth":0,"ticks":""}}],"scatter":[{"fillpattern":{"fillmode":"overlay","size":10,"solidity":0.2},"type":"scatter"}],"parcoords":[{"type":"parcoords","line":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolargl":[{"type":"scatterpolargl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"bar":[{"error_x":{"color":"#2a3f5f"},"error_y":{"color":"#2a3f5f"},"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"scattergeo":[{"type":"scattergeo","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolar":[{"type":"scatterpolar","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"scattergl":[{"type":"scattergl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatter3d":[{"type":"scatter3d","line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattermapbox":[{"type":"scattermapbox","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterternary":[{"type":"scatterternary","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattercarpet":[{"type":"scattercarpet","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"carpet":[{"aaxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"baxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"type":"carpet"}],"table":[{"cells":{"fill":{"color":"#EBF0F8"},"line":{"color":"white"}},"header":{"fill":{"color":"#C8D4E3"},"line":{"color":"white"}},"type":"table"}],"barpolar":[{"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"pie":[{"automargin":true,"type":"pie"}]},"layout":{"autotypenumbers":"strict","colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#2a3f5f"},"hovermode":"closest","hoverlabel":{"align":"left"},"paper_bgcolor":"white","plot_bgcolor":"#E5ECF6","polar":{"bgcolor":"#E5ECF6","angularaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"radialaxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"ternary":{"bgcolor":"#E5ECF6","aaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"baxis":{"gridcolor":"white","linecolor":"white","ticks":""},"caxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]]},"xaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"yaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"scene":{"xaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"yaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"zaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2}},"shapedefaults":{"line":{"color":"#2a3f5f"}},"annotationdefaults":{"arrowcolor":"#2a3f5f","arrowhead":0,"arrowwidth":1},"geo":{"bgcolor":"white","landcolor":"#E5ECF6","subunitcolor":"white","showland":true,"showlakes":true,"lakecolor":"white"},"title":{"x":0.05},"mapbox":{"style":"light"}}},"xaxis":{"anchor":"y","domain":[0.0,0.2888888888888889],"title":{"text":"Sequence Length"},"showgrid":true,"gridcolor":"LightGray"},"yaxis":{"anchor":"x","domain":[0.0,1.0],"range":[0,150],"dtick":20,"showgrid":true,"gridcolor":"LightGray","title":{"text":"Memory Usage (GB)"}},"xaxis2":{"anchor":"y2","domain":[0.35555555555555557,0.6444444444444445],"title":{"text":"Sequence Length"},"showgrid":true,"gridcolor":"LightGray"},"yaxis2":{"anchor":"x2","domain":[0.0,1.0],"matches":"y","showticklabels":false,"range":[0,150],"dtick":20,"showgrid":true,"gridcolor":"LightGray"},"xaxis3":{"anchor":"y3","domain":[0.7111111111111111,1.0],"title":{"text":"Sequence Length"},"showgrid":true,"gridcolor":"LightGray"},"yaxis3":{"anchor":"x3","domain":[0.0,1.0],"matches":"y","showticklabels":false,"range":[0,150],"dtick":20,"showgrid":true,"gridcolor":"LightGray"},"annotations":[{"font":{"size":16},"showarrow":false,"text":"No Parallelism","x":0.14444444444444446,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"TP=8 (with SP)","x":0.5,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"TP=16 (with SP)","x":0.8555555555555556,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"}],"shapes":[{"line":{"color":"red","dash":"dash"},"type":"line","x0":0,"x1":1,"xref":"x domain","y0":80,"y1":80,"yref":"y"},{"line":{"color":"red","dash":"dash"},"type":"line","x0":0,"x1":1,"xref":"x2 domain","y0":80,"y1":80,"yref":"y2"},{"line":{"color":"red","dash":"dash"},"type":"line","x0":0,"x1":1,"xref":"x3 domain","y0":80,"y1":80,"yref":"y3"}],"title":{"text":"Memory Usage for 70B Model"},"legend":{"orientation":"v","x":1.02,"y":0.5},"margin":{"r":150},"barmode":"stack","width":1000,"height":410}, {"responsive": true} ) }; </script> </div>
|
src/fragments/tp_sp_scaling.html
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
<div> <div id="e996eac5-dceb-42af-9cc6-d21f247621f5" class="plotly-graph-div" style="height:400px; width:1000px;"></div> <script type="text/javascript"> window.PLOTLYENV=window.PLOTLYENV || {}; if (document.getElementById("e996eac5-dceb-42af-9cc6-d21f247621f5")) { Plotly.newPlot( "e996eac5-dceb-42af-9cc6-d21f247621f5", [{"marker":{"color":"#4ea5b7"},"name":"Tokens\u002fsec\u002fGPU","width":0.7,"x":["2","4","8","16","32"],"y":[14167.25,13460.16,10888.53,6159.3,3609.73],"type":"bar","xaxis":"x","yaxis":"y"},{"base":[13460.16],"marker":{"color":"#e889ab"},"name":"Performance Drop","showlegend":true,"width":0.0875,"x":["4"],"y":[707.0900000000001],"type":"bar","xaxis":"x","yaxis":"y"},{"base":[10888.53],"marker":{"color":"#e889ab"},"showlegend":false,"width":0.0875,"x":["8"],"y":[2571.629999999999],"type":"bar","xaxis":"x","yaxis":"y"},{"base":[6159.3],"marker":{"color":"#e889ab"},"showlegend":false,"width":0.0875,"x":["16"],"y":[4729.2300000000005],"type":"bar","xaxis":"x","yaxis":"y"},{"base":[3609.73],"marker":{"color":"#e889ab"},"showlegend":false,"width":0.0875,"x":["32"],"y":[2549.57],"type":"bar","xaxis":"x","yaxis":"y"},{"marker":{"color":"#cec0fa"},"name":"Max Batch Size","text":["4","10","20","40","100"],"textposition":"inside","width":0.7,"x":["2","4","8","16","32"],"y":[4,10,20,40,100],"type":"bar","xaxis":"x2","yaxis":"y2"}], {"template":{"data":{"histogram2dcontour":[{"type":"histogram2dcontour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"choropleth":[{"type":"choropleth","colorbar":{"outlinewidth":0,"ticks":""}}],"histogram2d":[{"type":"histogram2d","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmap":[{"type":"heatmap","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmapgl":[{"type":"heatmapgl","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"contourcarpet":[{"type":"contourcarpet","colorbar":{"outlinewidth":0,"ticks":""}}],"contour":[{"type":"contour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"surface":[{"type":"surface","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"mesh3d":[{"type":"mesh3d","colorbar":{"outlinewidth":0,"ticks":""}}],"scatter":[{"fillpattern":{"fillmode":"overlay","size":10,"solidity":0.2},"type":"scatter"}],"parcoords":[{"type":"parcoords","line":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolargl":[{"type":"scatterpolargl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"bar":[{"error_x":{"color":"#2a3f5f"},"error_y":{"color":"#2a3f5f"},"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"scattergeo":[{"type":"scattergeo","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolar":[{"type":"scatterpolar","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"scattergl":[{"type":"scattergl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatter3d":[{"type":"scatter3d","line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattermapbox":[{"type":"scattermapbox","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterternary":[{"type":"scatterternary","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattercarpet":[{"type":"scattercarpet","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"carpet":[{"aaxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"baxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"type":"carpet"}],"table":[{"cells":{"fill":{"color":"#EBF0F8"},"line":{"color":"white"}},"header":{"fill":{"color":"#C8D4E3"},"line":{"color":"white"}},"type":"table"}],"barpolar":[{"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"pie":[{"automargin":true,"type":"pie"}]},"layout":{"autotypenumbers":"strict","colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#2a3f5f"},"hovermode":"closest","hoverlabel":{"align":"left"},"paper_bgcolor":"white","plot_bgcolor":"#E5ECF6","polar":{"bgcolor":"#E5ECF6","angularaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"radialaxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"ternary":{"bgcolor":"#E5ECF6","aaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"baxis":{"gridcolor":"white","linecolor":"white","ticks":""},"caxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]]},"xaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"yaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"scene":{"xaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"yaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"zaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2}},"shapedefaults":{"line":{"color":"#2a3f5f"}},"annotationdefaults":{"arrowcolor":"#2a3f5f","arrowhead":0,"arrowwidth":1},"geo":{"bgcolor":"white","landcolor":"#E5ECF6","subunitcolor":"white","showland":true,"showlakes":true,"lakecolor":"white"},"title":{"x":0.05},"mapbox":{"style":"light"}}},"xaxis":{"anchor":"y","domain":[0.0,0.45],"title":{"text":"Tensor Parallelism (TP)"},"showgrid":true,"gridcolor":"LightGray"},"yaxis":{"anchor":"x","domain":[0.0,1.0],"title":{"text":"Tokens\u002fsec\u002fGPU"},"showgrid":true,"gridcolor":"LightGray"},"xaxis2":{"anchor":"y2","domain":[0.55,1.0],"title":{"text":"Tensor Parallelism (TP)"},"showgrid":true,"gridcolor":"LightGray"},"yaxis2":{"anchor":"x2","domain":[0.0,1.0],"title":{"text":"Maximum Batch Size"},"showgrid":true,"gridcolor":"LightGray"},"annotations":[{"font":{"size":16},"showarrow":false,"text":"Throughput Scaling with TP\u002fSP (3B Model)","x":0.225,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"Maximum Batch Size per TP Value","x":0.775,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"},{"font":{"color":"#e889ab"},"showarrow":false,"text":"-5.0%","x":1,"xanchor":"center","xref":"x","xshift":30,"y":13813.705,"yanchor":"middle","yref":"y"},{"font":{"color":"#e889ab"},"showarrow":false,"text":"-19.1%","x":2,"xanchor":"center","xref":"x","xshift":30,"y":12174.345000000001,"yanchor":"middle","yref":"y"},{"font":{"color":"#e889ab"},"showarrow":false,"text":"-43.4%","x":3,"xanchor":"center","xref":"x","xshift":30,"y":8523.915,"yanchor":"middle","yref":"y"},{"font":{"color":"#e889ab"},"showarrow":false,"text":"-41.4%","x":4,"xanchor":"center","xref":"x","xshift":30,"y":4884.515,"yanchor":"middle","yref":"y"}],"legend":{"x":0.55,"y":1.0},"width":1000,"height":400,"barmode":"stack"}, {"responsive": true} ) }; </script> </div>
|
src/fragments/zero3_memoryusage.html
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
<div> <div id="be6d1423-335a-45b6-a848-dc4d0d70706f" class="plotly-graph-div" style="height:400px; width:1000px;"></div> <script type="text/javascript"> window.PLOTLYENV=window.PLOTLYENV || {}; if (document.getElementById("be6d1423-335a-45b6-a848-dc4d0d70706f")) { Plotly.newPlot( "be6d1423-335a-45b6-a848-dc4d0d70706f", [{"legendgroup":"Model Parameters","marker":{"color":"#4ea5b7"},"name":"Model Parameters","showlegend":true,"x":["1024","4096","16384"],"y":[15.0,15.0,15.0],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"Gradients","marker":{"color":"#e889ab"},"name":"Gradients","showlegend":true,"x":["1024","4096","16384"],"y":[15.0,15.0,15.0],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"Optimizer States","marker":{"color":"#cec0fa"},"name":"Optimizer States","showlegend":true,"x":["1024","4096","16384"],"y":[60.0,60.0,60.0],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"Activations","marker":{"color":"#e38a42"},"name":"Activations","showlegend":true,"x":["1024","4096","16384"],"y":[4.25,17.0,68.0],"type":"bar","xaxis":"x","yaxis":"y"},{"legendgroup":"Model Parameters","marker":{"color":"#4ea5b7"},"name":"Model Parameters","showlegend":false,"x":["1024","4096","16384"],"y":[15.0,15.0,15.0],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"Gradients","marker":{"color":"#e889ab"},"name":"Gradients","showlegend":false,"x":["1024","4096","16384"],"y":[15.0,15.0,15.0],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"Optimizer States","marker":{"color":"#cec0fa"},"name":"Optimizer States","showlegend":false,"x":["1024","4096","16384"],"y":[7.5,7.5,7.5],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"Activations","marker":{"color":"#e38a42"},"name":"Activations","showlegend":false,"x":["1024","4096","16384"],"y":[4.25,17.0,68.0],"type":"bar","xaxis":"x2","yaxis":"y2"},{"legendgroup":"Model Parameters","marker":{"color":"#4ea5b7"},"name":"Model Parameters","showlegend":false,"x":["1024","4096","16384"],"y":[15.0,15.0,15.0],"type":"bar","xaxis":"x3","yaxis":"y3"},{"legendgroup":"Gradients","marker":{"color":"#e889ab"},"name":"Gradients","showlegend":false,"x":["1024","4096","16384"],"y":[1.875,1.875,1.875],"type":"bar","xaxis":"x3","yaxis":"y3"},{"legendgroup":"Optimizer States","marker":{"color":"#cec0fa"},"name":"Optimizer States","showlegend":false,"x":["1024","4096","16384"],"y":[7.5,7.5,7.5],"type":"bar","xaxis":"x3","yaxis":"y3"},{"legendgroup":"Activations","marker":{"color":"#e38a42"},"name":"Activations","showlegend":false,"x":["1024","4096","16384"],"y":[4.25,17.0,68.0],"type":"bar","xaxis":"x3","yaxis":"y3"},{"legendgroup":"Model Parameters","marker":{"color":"#4ea5b7"},"name":"Model Parameters","showlegend":false,"x":["1024","4096","16384"],"y":[1.875,1.875,1.875],"type":"bar","xaxis":"x4","yaxis":"y4"},{"legendgroup":"Gradients","marker":{"color":"#e889ab"},"name":"Gradients","showlegend":false,"x":["1024","4096","16384"],"y":[1.875,1.875,1.875],"type":"bar","xaxis":"x4","yaxis":"y4"},{"legendgroup":"Optimizer States","marker":{"color":"#cec0fa"},"name":"Optimizer States","showlegend":false,"x":["1024","4096","16384"],"y":[7.5,7.5,7.5],"type":"bar","xaxis":"x4","yaxis":"y4"},{"legendgroup":"Activations","marker":{"color":"#e38a42"},"name":"Activations","showlegend":false,"x":["1024","4096","16384"],"y":[4.25,17.0,68.0],"type":"bar","xaxis":"x4","yaxis":"y4"}], {"template":{"data":{"histogram2dcontour":[{"type":"histogram2dcontour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"choropleth":[{"type":"choropleth","colorbar":{"outlinewidth":0,"ticks":""}}],"histogram2d":[{"type":"histogram2d","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmap":[{"type":"heatmap","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmapgl":[{"type":"heatmapgl","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"contourcarpet":[{"type":"contourcarpet","colorbar":{"outlinewidth":0,"ticks":""}}],"contour":[{"type":"contour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"surface":[{"type":"surface","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"mesh3d":[{"type":"mesh3d","colorbar":{"outlinewidth":0,"ticks":""}}],"scatter":[{"fillpattern":{"fillmode":"overlay","size":10,"solidity":0.2},"type":"scatter"}],"parcoords":[{"type":"parcoords","line":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolargl":[{"type":"scatterpolargl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"bar":[{"error_x":{"color":"#2a3f5f"},"error_y":{"color":"#2a3f5f"},"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"scattergeo":[{"type":"scattergeo","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolar":[{"type":"scatterpolar","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"scattergl":[{"type":"scattergl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatter3d":[{"type":"scatter3d","line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattermapbox":[{"type":"scattermapbox","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterternary":[{"type":"scatterternary","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattercarpet":[{"type":"scattercarpet","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"carpet":[{"aaxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"baxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"type":"carpet"}],"table":[{"cells":{"fill":{"color":"#EBF0F8"},"line":{"color":"white"}},"header":{"fill":{"color":"#C8D4E3"},"line":{"color":"white"}},"type":"table"}],"barpolar":[{"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"pie":[{"automargin":true,"type":"pie"}]},"layout":{"autotypenumbers":"strict","colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#2a3f5f"},"hovermode":"closest","hoverlabel":{"align":"left"},"paper_bgcolor":"white","plot_bgcolor":"#E5ECF6","polar":{"bgcolor":"#E5ECF6","angularaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"radialaxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"ternary":{"bgcolor":"#E5ECF6","aaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"baxis":{"gridcolor":"white","linecolor":"white","ticks":""},"caxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]]},"xaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"yaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"scene":{"xaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"yaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"zaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2}},"shapedefaults":{"line":{"color":"#2a3f5f"}},"annotationdefaults":{"arrowcolor":"#2a3f5f","arrowhead":0,"arrowwidth":1},"geo":{"bgcolor":"white","landcolor":"#E5ECF6","subunitcolor":"white","showland":true,"showlakes":true,"lakecolor":"white"},"title":{"x":0.05},"mapbox":{"style":"light"}}},"xaxis":{"anchor":"y","domain":[0.0,0.2125],"title":{"text":"Sequence Length"},"showgrid":true,"gridcolor":"LightGray"},"yaxis":{"anchor":"x","domain":[0.0,1.0],"title":{"text":"Memory Usage (GB)"},"dtick":20,"showgrid":true,"gridcolor":"LightGray"},"xaxis2":{"anchor":"y2","domain":[0.2625,0.475],"title":{"text":"Sequence Length"},"showgrid":true,"gridcolor":"LightGray"},"yaxis2":{"anchor":"x2","domain":[0.0,1.0],"matches":"y","showticklabels":false,"showgrid":true,"gridcolor":"LightGray"},"xaxis3":{"anchor":"y3","domain":[0.525,0.7375],"title":{"text":"Sequence Length"},"showgrid":true,"gridcolor":"LightGray"},"yaxis3":{"anchor":"x3","domain":[0.0,1.0],"matches":"y","showticklabels":false,"showgrid":true,"gridcolor":"LightGray"},"xaxis4":{"anchor":"y4","domain":[0.7875,1.0],"title":{"text":"Sequence Length"},"showgrid":true,"gridcolor":"LightGray"},"yaxis4":{"anchor":"x4","domain":[0.0,1.0],"matches":"y","showticklabels":false,"showgrid":true,"gridcolor":"LightGray"},"annotations":[{"font":{"size":16},"showarrow":false,"text":"DP=8","x":0.10625,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"DP=8 Zero-1","x":0.36875,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"DP=8 Zero-2","x":0.6312500000000001,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"DP=8 Zero-3","x":0.89375,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"}],"shapes":[{"line":{"color":"red","dash":"dash"},"type":"line","x0":0,"x1":1,"xref":"x domain","y0":80,"y1":80,"yref":"y"},{"line":{"color":"red","dash":"dash"},"type":"line","x0":0,"x1":1,"xref":"x2 domain","y0":80,"y1":80,"yref":"y2"},{"line":{"color":"red","dash":"dash"},"type":"line","x0":0,"x1":1,"xref":"x3 domain","y0":80,"y1":80,"yref":"y3"},{"line":{"color":"red","dash":"dash"},"type":"line","x0":0,"x1":1,"xref":"x4 domain","y0":80,"y1":80,"yref":"y4"}],"title":{"text":"Memory Usage for 8B Model"},"legend":{"orientation":"v","x":1.02,"y":0.5},"margin":{"r":150},"barmode":"stack","width":1000,"height":400}, {"responsive": true} ) }; </script> </div>
|
src/index.html
CHANGED
@@ -2408,53 +2408,45 @@
|
|
2408 |
<h2>Conclusion</h2>
|
2409 |
|
2410 |
|
2411 |
-
<p>Congratulations!
|
2412 |
|
2413 |
<p><img alt="image.png" src="/assets/images/conclusion_llama3_parallelism.png" /></p>
|
2414 |
|
2415 |
-
<p>
|
2416 |
|
2417 |
-
<p>
|
2418 |
-
|
2419 |
-
<p>Let’s start with a brief recap of all the things we covered in these past hours and days!</p>
|
2420 |
-
|
2421 |
-
<h3>What you learned</h3>
|
2422 |
-
|
2423 |
-
<p>Working through this whole blog post you mastered a ranged of concepts:</p>
|
2424 |
-
|
2425 |
-
<ul>
|
2426 |
-
<li>Basic principle of model training</li>
|
2427 |
-
<li>Collective communication primitives </li>
|
2428 |
-
<li>Memory anatomy of a LLM</li>
|
2429 |
-
<li>Distributed training with DP and ZeRO </li>
|
2430 |
-
<li>Model parallelism with TP, SP, CP and PP</li>
|
2431 |
-
<li>Fast kernels and mixed precision training</li>
|
2432 |
-
<li>Overlapping communication and computation</li>
|
2433 |
-
<li>Profiling distributed training</li>
|
2434 |
-
</ul>
|
2435 |
|
2436 |
-
<p>
|
2437 |
|
2438 |
<h3>What we learned</h3>
|
2439 |
|
2440 |
-
<p>
|
|
|
|
|
|
|
|
|
|
|
2441 |
</p>
|
2442 |
|
2443 |
<ul>
|
2444 |
<li>PyTorch processes would sometimes fail to clean up properly</li>
|
2445 |
<li>Slurm job manager would forcefully terminate jobs, leading to node failures </li>
|
2446 |
<li>Simple benchmarks that should take minutes would stretch into hours</li>
|
2447 |
-
<li>
|
2448 |
-
|
2449 |
-
|
2450 |
-
|
2451 |
-
|
2452 |
-
|
2453 |
-
</
|
|
|
|
|
|
|
2454 |
</ul>
|
2455 |
|
2456 |
<p>These challenges deserve their own story, but they taught us valuable lessons about the complexities of distributed training infrastructure. What looks simple in theory often requires careful attention to many moving parts in practice.</p>
|
2457 |
|
|
|
2458 |
<p>Let's analyze the results of our benchmarks and understand how different configurations affect each other. All benchmarks were run with a sequence length of 4096 and a global batch size of 1M tokens. We'll look at two key visualizations that help illustrate our findings.
|
2459 |
</p>
|
2460 |
|
@@ -2465,7 +2457,6 @@
|
|
2465 |
|
2466 |
<p>To complement this, let's look at the relationships between different parameters:</p>
|
2467 |
|
2468 |
-
<!-- <p><img alt="image.png" src="/assets/images/what_we_learnt_parallel_coordinates.html" /></p> -->
|
2469 |
<iframe id="plotFrame" src="/assets/images/what_we_learnt_parallel_coordinates.html" height="540" width="1000" scrolling="no" frameborder="0"></iframe>
|
2470 |
|
2471 |
<p>Parallel coordinates plot showing the relationship between different model parallelism configurations (Data Parallel degree, Tensor Parallel degree, Pipeline Parallel degree), training hyperparameters (gradient accumulation steps, micro batch size), ZeRO stage and the resulting Model FLOPs Utilization (MFU). Each line represents a different training configuration, with colors indicating the MFU value - warmer colors show higher efficiency.</p>
|
@@ -2479,19 +2470,19 @@
|
|
2479 |
<li>Larger models present a different challenge. As model size increases, memory requirements grow substantially. This creates two scenarios with fewer nodes: either the model doesn't fit at all, or it barely fits but runs inefficiently due to operating near the GPU memory limits.</li>
|
2480 |
<li>Our benchmarks demonstrate how performance heavily depends on implementation quality. When we first implemented both parallelism strategies, Tensor Parallelism (TP) outperformed Pipeline Parallelism (PP). After optimizing our PP code, it became the faster option. Now that we're improving the communication overlap in our TP implementation, we expect it to regain the performance lead.</li>
|
2481 |
</ol>
|
|
|
|
|
2482 |
|
2483 |
-
<
|
2484 |
-
|
2485 |
-
<h3>What’s next?</h3>
|
2486 |
|
2487 |
-
<p>You
|
2488 |
<ul>
|
2489 |
<li>Carefully read some of the landmark or very recent papers. You can find a list of some of the most impactful papers in <a target="_self" href="#references" class="">References</a>.</li>
|
2490 |
<li>Start from scratch and implement an algorithm yourself. Often a method only fully “clicks” if you implemented it yourself.</li>
|
2491 |
<li>Dive into one of the widely used frameworks and start contributing: fix bugs, answer issues, or implement a new feature. That’s the best way to get in any ML field!</li>
|
2492 |
</ul>
|
2493 |
|
2494 |
-
<p>We hope this
|
2495 |
|
2496 |
<h2>References</h2>
|
2497 |
|
|
|
2408 |
<h2>Conclusion</h2>
|
2409 |
|
2410 |
|
2411 |
+
<p>Congratulations, dear reader, you made it to the end! We've completed quite a journey: we started from understanding how to train a simple model on a single GPU, all the way to mastering all the intricate techniques used to efficiently train massive language models like Llama-405B and DeepSeek-V3 on thousands of GPUs. By now, you can read a diagram, like Llama-3's 4D parallel setup, with ease:</p>
|
2412 |
|
2413 |
<p><img alt="image.png" src="/assets/images/conclusion_llama3_parallelism.png" /></p>
|
2414 |
|
2415 |
+
<p>Orchestrating large clusters of GPUs to train LLMs efficiently is no easy feat. We learned how to optimize computations and communications between GPUs such that they run with maximum utilization at all times. It involves choosing the right parallelization strategy for a given model and cluster size, overlapping communication and computation where possible, and writing custom kernels that take into account the hardware layout to perform an operation as fast as possible on the GPU.</p>
|
2416 |
|
2417 |
+
<p>You might still believe that this knowledge is a bit niche and only concerns the small set of people that pretrain LLMs. Historically, that mayb be true, but as models are growing rapidly even people who want to fine-tune models require distributd training setups. So diving deeper into all things distributed might prove very timely.</p>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2418 |
|
2419 |
+
<p>This has been a long learning journey, but not just for you! Running thousands of benchmarks on a GPU cluster was more challenging than we anticipated and we want to share a few highlights of our learning experience.</p>
|
2420 |
|
2421 |
<h3>What we learned</h3>
|
2422 |
|
2423 |
+
<p>Our goal for this blogpost was not only to discuss theory and implementations but provide actual data points as well. So the plan was simple: lets run every possible distributed configuration for every model and a number of cluster sizes (namely 1-64 nodes of 8xH100s). Even after excluding impossible configuration we still needed to run thousands of experiments. </p>
|
2424 |
+
|
2425 |
+
<aside>We want to take this opportunity to apologize to our co-workers for blocking most of the science cluster and in turn forgive any threats that may have been whispered.</aside>
|
2426 |
+
|
2427 |
+
<p>
|
2428 |
+
On paper this sounds easy enough: we can easily launch big arrays of jobs on our cluster. However, when we launched the first batches is when the troubles began:
|
2429 |
</p>
|
2430 |
|
2431 |
<ul>
|
2432 |
<li>PyTorch processes would sometimes fail to clean up properly</li>
|
2433 |
<li>Slurm job manager would forcefully terminate jobs, leading to node failures </li>
|
2434 |
<li>Simple benchmarks that should take minutes would stretch into hours</li>
|
2435 |
+
<li>Some jobs would hang indefinitely</li>
|
2436 |
+
</ul>
|
2437 |
+
|
2438 |
+
<p>So in order to run all experiments in a finite amount of time required some additional engineering. In particular we spent a significant amount of time on the following:</p>
|
2439 |
+
|
2440 |
+
<ul>
|
2441 |
+
<li>Minimizing cluster restart times and optimize idle time</li>
|
2442 |
+
<li>Analyzing detailed NCCL debug logs</li>
|
2443 |
+
<li>Understand memory usage patterns and CUDA memory allocator behaviors</li>
|
2444 |
+
<li>Improving pipeline parallelism performance on multi-node</li>
|
2445 |
</ul>
|
2446 |
|
2447 |
<p>These challenges deserve their own story, but they taught us valuable lessons about the complexities of distributed training infrastructure. What looks simple in theory often requires careful attention to many moving parts in practice.</p>
|
2448 |
|
2449 |
+
<!--
|
2450 |
<p>Let's analyze the results of our benchmarks and understand how different configurations affect each other. All benchmarks were run with a sequence length of 4096 and a global batch size of 1M tokens. We'll look at two key visualizations that help illustrate our findings.
|
2451 |
</p>
|
2452 |
|
|
|
2457 |
|
2458 |
<p>To complement this, let's look at the relationships between different parameters:</p>
|
2459 |
|
|
|
2460 |
<iframe id="plotFrame" src="/assets/images/what_we_learnt_parallel_coordinates.html" height="540" width="1000" scrolling="no" frameborder="0"></iframe>
|
2461 |
|
2462 |
<p>Parallel coordinates plot showing the relationship between different model parallelism configurations (Data Parallel degree, Tensor Parallel degree, Pipeline Parallel degree), training hyperparameters (gradient accumulation steps, micro batch size), ZeRO stage and the resulting Model FLOPs Utilization (MFU). Each line represents a different training configuration, with colors indicating the MFU value - warmer colors show higher efficiency.</p>
|
|
|
2470 |
<li>Larger models present a different challenge. As model size increases, memory requirements grow substantially. This creates two scenarios with fewer nodes: either the model doesn't fit at all, or it barely fits but runs inefficiently due to operating near the GPU memory limits.</li>
|
2471 |
<li>Our benchmarks demonstrate how performance heavily depends on implementation quality. When we first implemented both parallelism strategies, Tensor Parallelism (TP) outperformed Pipeline Parallelism (PP). After optimizing our PP code, it became the faster option. Now that we're improving the communication overlap in our TP implementation, we expect it to regain the performance lead.</li>
|
2472 |
</ol>
|
2473 |
+
-->
|
2474 |
+
<p>Reproducing theoretical results in practice is challenging, especially given the limited availability of production training code. Through open-source projects like picotron and nanotron, we hope to make these distributed training techniques more accessible and foster collaboration on simpler, more efficient codebases that help researchers and practitioners make the most of their hardware resources.</p>
|
2475 |
|
2476 |
+
<h3>So, what’s next?</h3>
|
|
|
|
|
2477 |
|
2478 |
+
<p>You now have good overview of the main distributed training concepts but at the same time we just scratched to surface of on some aspects. There are many ways to dive deep into a subject but here are some steps that we recommend:</p>
|
2479 |
<ul>
|
2480 |
<li>Carefully read some of the landmark or very recent papers. You can find a list of some of the most impactful papers in <a target="_self" href="#references" class="">References</a>.</li>
|
2481 |
<li>Start from scratch and implement an algorithm yourself. Often a method only fully “clicks” if you implemented it yourself.</li>
|
2482 |
<li>Dive into one of the widely used frameworks and start contributing: fix bugs, answer issues, or implement a new feature. That’s the best way to get in any ML field!</li>
|
2483 |
</ul>
|
2484 |
|
2485 |
+
<p>We hope this book helps you get started in distributed training and that you will train the next generation of awesome models to the hum of your GPU cluster!</p>
|
2486 |
|
2487 |
<h2>References</h2>
|
2488 |
|