--- title: Post Training Techniques Guide emoji: πŸš€ colorFrom: purple colorTo: yellow sdk: docker app_port: 8501 tags: - streamlit pinned: false short_description: A visual guide to post-training techniques for LLMs license: mit --- # πŸ”§ Beyond Pretraining: A Visual Guide to Post-Training Techniques for LLMs This deck summarizes the key trade-offs between different post-training strategies for large language models β€” including: - πŸ“š Supervised Fine-Tuning (SFT) - 🀝 Preference Optimization (DPO, APO, GRPO) - 🎯 Reinforcement Learning (PPO) It also introduces a reward spectrum from rule-based to subjective feedback, and compares how real-world models like **SmolLM3**, **Tulu 2/3**, and **DeepSeek-R1** implement these strategies. > This is a companion resource to my ReTool rollout implementation and blog post. > > πŸ“– [Medium blog post](https://medium.com/@jenwei0312/beyond-generate-a-deep-dive-into-stateful-multi-turn-llm-rollouts-for-tool-use-336b00c99ac0) > πŸ’» [ReTool Hugging Face Space](https://huggingface.co/spaces/bird-of-paradise/ReTool-Implementation) --- ### πŸ“Ž Download the Slides πŸ‘‰ [PDF version](https://huggingface.co/spaces/bird-of-paradise/post-training-techniques-guide/blob/main/src/Post%20Training%20Techniques.pdf) --- ### 🀝 Reuse & Attribution This deck is free to share in talks, posts, or documentation β€” **with attribution**. Please credit: **Jen Wei β€” [Hugging Face πŸ€—](https://huggingface.co/bird-of-paradise) | [X/Twitter](https://x.com/JenniferWe17599)** Optional citation: *β€œBeyond Pretraining: Post-Training Techniques for LLMs (2025)”* Licensed under MIT License. β€” *Made with 🧠 by Jen Wei, July 2025*