Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published 9 days ago • 51
Towards General-Purpose Model-Free Reinforcement Learning Paper • 2501.16142 • Published 11 days ago • 24
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding Paper • 2501.13200 • Published 16 days ago • 61
view post Post 2070 Only a single RTX 4090 running model pre-training is really slow, even for small language models!!! (https://huggingface.co/collections/JingzeShi/doge-slm-677fd879f8c4fd0f43e05458) See translation 2 replies · 👀 8 8 🤯 6 6 👍 4 4 + Reply
Control LLM: Controlled Evolution for Intelligence Retention in LLM Paper • 2501.10979 • Published 19 days ago • 6
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning Paper • 2501.12570 • Published 16 days ago • 23
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 16 days ago • 302
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published 18 days ago • 90
MangaNinja: Line Art Colorization with Precise Reference Following Paper • 2501.08332 • Published 24 days ago • 56