Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.07143

Running on CPU Upgrade

1.74k

1.74k

Anychat

🏢
Running

261

261

Qwen2.5 Coder Artifacts

🐢

Generate application code with Qwen2.5-Coder-32B
Runtime error

877

877

QwQ-32B-Preview

🔍

QwQ-32B-Preview
Running on CPU Upgrade

12.4k

12.4k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots

advancing research

STaR: Bootstrapping Reasoning With Reasoning

Paper • 2203.14465 • Published Mar 28, 2022 • 8
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 47
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 17
Prompt Cache: Modular Attention Reuse for Low-Latency Inference

Paper • 2311.04934 • Published Nov 7, 2023 • 29

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 140
Elucidating the Design Space of Diffusion-Based Generative Models

Paper • 2206.00364 • Published Jun 1, 2022 • 15
GLU Variants Improve Transformer

Paper • 2002.05202 • Published Feb 12, 2020 • 2
StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 137

LLoCO: Learning Long Contexts Offline

Paper • 2404.07979 • Published Apr 11, 2024 • 21
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21, 2024 • 115
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration

Paper • 2402.11550 • Published Feb 18, 2024 • 17
LongAlign: A Recipe for Long Context Alignment of Large Language Models

Paper • 2401.18058 • Published Jan 31, 2024 • 21

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10, 2024 • 105

Toward Joint Language Modeling for Speech Units and Text

Paper • 2310.08715 • Published Oct 12, 2023 • 8
facebook/wav2vec2-large-xlsr-53

Updated Mar 18, 2022 • 570k • 121
bczhou/TinyLLaVA-1.5B

Image-Text-to-Text • Updated Jun 14, 2024 • 345 • 16
Running on T4

394

394

LLaVA

🔥

Chat with an AI assistant using text and images

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12, 2024 • 66
TransformerFAM: Feedback attention is working memory

Paper • 2404.09173 • Published Apr 14, 2024 • 43
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10, 2024 • 105
Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4, 2024 • 38

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 608
BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 96
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 104
TransformerFAM: Feedback attention is working memory

Paper • 2404.09173 • Published Apr 14, 2024 • 43

Transformers_papers

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10, 2024 • 105
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12, 2024 • 66

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10, 2024 • 105
RULER: What's the Real Context Size of Your Long-Context Language Models?

Paper • 2404.06654 • Published Apr 9, 2024 • 35

Previous
1
2
3
4
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs