Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval Mar 22, 2024 • 70
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency By not-lain • 1 day ago • 19
view article Article 🚀 Build a Qwen 2.5 VL API endpoint with Hugging Face spaces and Docker! By ariG23498 • 2 days ago • 13
Language Detection Collection StaticVectors models to detect language. Exports of FastText that run in NumPy without needing FastText • 2 items • Updated 4 days ago • 3
view article Article Hugging Face and FriendliAI partner to supercharge model deployment on the Hub 9 days ago • 29
CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval Paper • 2411.12644 • Published Nov 19, 2024 • 3
view article Article Train 400x faster Static Embedding Models with Sentence Transformers 16 days ago • 129
view article Article Python Is All You Need? Introducing Dria-Agent-α By andthattoo • 20 days ago • 22
Agentless: Demystifying LLM-based Software Engineering Agents Paper • 2407.01489 • Published Jul 1, 2024 • 59
Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP Paper • 2408.04303 • Published Aug 8, 2024 • 17
view article Article Announcing NVIDIA Cosmos World Foundation Models By mingyuliutw • 24 days ago • 23