Step-Audio-R1 Collection Step-Audio-R1 is the first audio language model to successfully unlock test-time compute scaling. • 3 items • Updated 16 days ago • 15
LightOnOCR Collection The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR • 7 items • Updated 24 days ago • 14
view article Article LightOnOCR-1B: The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR Oct 23 • 62
Running on Zero MCP Featured 32 Qwen3 VL HF Demo 🔥 32 object detection, visual grounding, keypoint detection
MathCanvas Collection Datasets and models for the paper "MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning" • 5 items • Updated 18 days ago • 3
MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning Paper • 2510.14958 • Published Oct 16 • 22