Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Abstract
We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching for enhanced multilinguality. We trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two Finch models with 1.6 and 3.1 billion parameters and find that they achieve competitive performance across a wide variety of benchmarks. We release all our models on HuggingFace under the Apache 2.0 license. Models at: https://huggingface.co/RWKV Training code at: https://github.com/RWKV/RWKV-LM Inference code at: https://github.com/RWKV/ChatRWKV Time-parallel training code at: https://github.com/RWKV/RWKV-infctx-trainer
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Sliding Window Attention Training for Efficient Large Language Models (2025)
- Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing (2025)
- ReGLA: Refining Gated Linear Attention (2025)
- Training Sparse Mixture Of Experts Text Embedding Models (2025)
- Multilingual State Space Models for Structured Question Answering in Indic Languages (2025)
- LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs (2025)
- ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 5
Browse 5 models citing this paperDatasets citing this paper 0
No dataset linking this paper