view article Article Better RAG 2: Single-shot is not good enough By hrishioa β’ Mar 14, 2024 β’ 11
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations Paper β’ 2412.07626 β’ Published Dec 10, 2024 β’ 22
Cosmos Tokenizer Collection A suite of image and video tokenizers β’ 13 items β’ Updated Jan 17 β’ 39
view article Article BM25 for Python: Achieving high performance while simplifying dependencies with *BM25S*β‘ By xhluca β’ Jul 9, 2024 β’ 43
AMD-OLMo Collection AMD-OLMo are a series of 1 billion parameter language models trained by AMD on AMD Instinctβ’ MI250 GPUs based on OLMo. β’ 4 items β’ Updated Oct 31, 2024 β’ 18
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M β’ 16 items β’ Updated 5 days ago β’ 240
C4AI Aya Expanse Collection Aya Expanse is an open-weight research release of a model with highly advanced multilingual capabilities. β’ 3 items β’ Updated Dec 16, 2024 β’ 34
view article Article Training and Finetuning Embedding Models with Sentence Transformers v3 May 28, 2024 β’ 188
Awesome Document AI Collection A collection of open-source document AI π π π β’ 27 items β’ Updated Mar 11, 2024 β’ 80
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding Paper β’ 2407.12594 β’ Published Jul 17, 2024 β’ 19
view article Article Llama can now see and run on your device - welcome Llama 3.2 Sep 25, 2024 β’ 183
π» Local SmolLMs Collection SmolLM models in MLC, ONNX and GGUF format for local applications + in-browser demos β’ 14 items β’ Updated 5 days ago β’ 49