Collections
Discover the best community collections!
Collections including paper arxiv:2401.04088
-
Multilingual Instruction Tuning With Just a Pinch of Multilinguality
Paper ā¢ 2401.01854 ā¢ Published ā¢ 11 -
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper ā¢ 2401.01055 ā¢ Published ā¢ 54 -
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper ā¢ 2401.01325 ā¢ Published ā¢ 27 -
Improving Text Embeddings with Large Language Models
Paper ā¢ 2401.00368 ā¢ Published ā¢ 80
-
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper ā¢ 2312.16862 ā¢ Published ā¢ 31 -
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Paper ā¢ 2312.17172 ā¢ Published ā¢ 28 -
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Paper ā¢ 2401.01974 ā¢ Published ā¢ 7 -
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Paper ā¢ 2401.01885 ā¢ Published ā¢ 28
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper ā¢ 2401.00908 ā¢ Published ā¢ 180 -
Mixtral of Experts
Paper ā¢ 2401.04088 ā¢ Published ā¢ 158 -
RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
Paper ā¢ 2401.08406 ā¢ Published ā¢ 37
-
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper ā¢ 2312.15166 ā¢ Published ā¢ 58 -
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Paper ā¢ 2312.12456 ā¢ Published ā¢ 42 -
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Paper ā¢ 2312.12742 ā¢ Published ā¢ 14 -
Mini-GPTs: Efficient Large Language Models through Contextual Pruning
Paper ā¢ 2312.12682 ā¢ Published ā¢ 10
-
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Paper ā¢ 2312.09390 ā¢ Published ā¢ 33 -
OneLLM: One Framework to Align All Modalities with Language
Paper ā¢ 2312.03700 ā¢ Published ā¢ 24 -
Generative Multimodal Models are In-Context Learners
Paper ā¢ 2312.13286 ā¢ Published ā¢ 36 -
The LLM Surgeon
Paper ā¢ 2312.17244 ā¢ Published ā¢ 9
-
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Paper ā¢ 2312.10003 ā¢ Published ā¢ 40 -
Catwalk: A Unified Language Model Evaluation Framework for Many Datasets
Paper ā¢ 2312.10253 ā¢ Published ā¢ 8 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper ā¢ 2401.04081 ā¢ Published ā¢ 71 -
Mixtral of Experts
Paper ā¢ 2401.04088 ā¢ Published ā¢ 158
-
Memory Augmented Language Models through Mixture of Word Experts
Paper ā¢ 2311.10768 ā¢ Published ā¢ 18 -
Mixtral of Experts
Paper ā¢ 2401.04088 ā¢ Published ā¢ 158 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper ā¢ 2401.04081 ā¢ Published ā¢ 71 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper ā¢ 2401.06066 ā¢ Published ā¢ 52
-
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Paper ā¢ 2211.05100 ā¢ Published ā¢ 29 -
CsFEVER and CTKFacts: Acquiring Czech data for fact verification
Paper ā¢ 2201.11115 ā¢ Published -
Training language models to follow instructions with human feedback
Paper ā¢ 2203.02155 ā¢ Published ā¢ 17 -
FinGPT: Large Generative Models for a Small Language
Paper ā¢ 2311.05640 ā¢ Published ā¢ 32
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 55 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper ā¢ 2307.08691 ā¢ Published ā¢ 8 -
Mixtral of Experts
Paper ā¢ 2401.04088 ā¢ Published ā¢ 158 -
Mistral 7B
Paper ā¢ 2310.06825 ā¢ Published ā¢ 46