LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR Paper • 2601.14251 • Published 10 days ago • 23
LightOnOCR-2 🦉 Collection LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family • 12 items • Updated 9 days ago • 20
A BERTology View of LLM Orchestrations: Token- and Layer-Selective Probes for Efficient Single-Pass Classification Paper • 2601.13288 • Published 11 days ago • 12
Beyond Cosine Similarity: Taming Semantic Drift and Antonym Intrusion in a 15-Million Node Turkish Synonym Graph Paper • 2601.13251 • Published 11 days ago • 4
A Hybrid Protocol for Large-Scale Semantic Dataset Generation in Low-Resource Languages: The Turkish Semantic Relations Corpus Paper • 2601.13253 • Published 11 days ago • 4
Searching for Better ViT Baselines Collection Exploring ViT hparams and model shapes for the GPU poor (between tiny and base). • 36 items • Updated 2 days ago • 20
neutts-nano Collection NeuTTS Nano is a speech foundation model, 3x smaller than NeuTTS Air, that runs on CPU in real-time, with instant voice cloning. • 3 items • Updated 16 days ago • 6
neutts-air Collection NeuTTS Air is a speech foundation model that runs on CPU in real-time, with instant voice cloning. • 3 items • Updated Oct 9, 2025 • 19
🌏 Multilingual retrievers Collection A collection of multilingual retrieval models. • 4 items • Updated Oct 6, 2024 • 1
view article Article Introducing Falcon-H1-Arabic: Pushing the Boundaries of Arabic Language AI with Hybrid Architecture 25 days ago • 36
Arabic Matryoshka & GATE Embedding Models Collection A collection of advanced Arabic Matryoshka Embedding Models designed for efficient and high-performance Arabic NLP, available publicly on Hugging Face • 12 items • Updated Nov 28, 2025 • 15
ARCADE: A City-Scale Corpus for Fine-Grained Arabic Dialect Tagging Paper • 2601.02209 • Published 25 days ago • 3
Arabic Speech Datasets Collection Best Datasets for Arabic Speech Tasks • 16 items • Updated 29 days ago • 15
view article Article Small Yet Mighty: Improve Accuracy In Multimodal Search and Visual Document Retrieval with Llama Nemotron RAG Models 24 days ago • 19