How did you distribute the weights in your Qwen 72B experiment? Did you just have it running w/ TP=8 on a single node, or did each node have it's own copy of Qwen 72B?
Jordan Conragan
JVP15
AI & ML interests
None yet
Recent Activity
commented on
an
article
2 months ago
No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL
commented on
an
article
2 months ago
No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL
commented on
a paper
4 months ago
Does Reinforcement Learning Really Incentivize Reasoning Capacity in
LLMs Beyond the Base Model?
Organizations
None yet