-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 147 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 13 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 54 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 46
Collections
Discover the best community collections!
Collections including paper arxiv:2401.13660
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 17 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 9 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 12 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 48
-
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
Paper • 2404.15420 • Published • 8 -
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 127 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 256 -
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
Paper • 2404.14047 • Published • 45
-
ByT5: Towards a token-free future with pre-trained byte-to-byte models
Paper • 2105.13626 • Published • 3 -
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 50 -
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Paper • 2305.07185 • Published • 9 -
Byte-Level Recursive Convolutional Auto-Encoder for Text
Paper • 1802.01817 • Published
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 140 -
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Paper • 2401.09417 • Published • 60 -
Vivim: a Video Vision Mamba for Medical Video Object Segmentation
Paper • 2401.14168 • Published • 2 -
HiPPO: Recurrent Memory with Optimal Polynomial Projections
Paper • 2008.07669 • Published • 1
-
Graph Mamba: Towards Learning on Graphs with State Space Models
Paper • 2402.08678 • Published • 15 -
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Paper • 2402.04248 • Published • 31 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 54 -
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Paper • 2401.09417 • Published • 60
-
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 54 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 140 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 70 -
hustvl/Vim-tiny
Updated • 21