-
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Paper • 2312.10003 • Published • 40 -
Catwalk: A Unified Language Model Evaluation Framework for Many Datasets
Paper • 2312.10253 • Published • 8 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 71 -
Mixtral of Experts
Paper • 2401.04088 • Published • 158
Collections
Discover the best community collections!
Collections including paper arxiv:2401.04081
-
openchat/openchat-3.5-1210
Text Generation • Updated • 1.27k • 273 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 71 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 108 -
Babelscape/rebel-large
Text2Text Generation • Updated • 30k • • 216
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 20 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 12 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 14 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 48
-
havenhq/mamba-chat
Updated • 291 • 100 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 71 -
VMamba: Visual State Space Model
Paper • 2401.10166 • Published • 40 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 108
-
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 18 -
Mixtral of Experts
Paper • 2401.04088 • Published • 158 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 71 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 52
-
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper • 2310.16795 • Published • 27 -
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Paper • 2310.13961 • Published • 5 -
The Consensus Game: Language Model Generation via Equilibrium Search
Paper • 2310.09139 • Published • 14 -
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning
Paper • 2310.03094 • Published • 13
-
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 40 -
A Zero-Shot Language Agent for Computer Control with Structured Reflection
Paper • 2310.08740 • Published • 16 -
The Consensus Game: Language Model Generation via Equilibrium Search
Paper • 2310.09139 • Published • 14 -
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Paper • 2310.09199 • Published • 27
-
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Paper • 2310.09478 • Published • 21 -
Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams
Paper • 2310.08678 • Published • 14 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 244 -
LLaMA: Open and Efficient Foundation Language Models
Paper • 2302.13971 • Published • 14
-
FIAT: Fusing learning paradigms with Instruction-Accelerated Tuning
Paper • 2309.04663 • Published • 6 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper • 2309.05463 • Published • 87 -
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation
Paper • 2310.08541 • Published • 18 -
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Paper • 2310.13671 • Published • 19