UltraRAG: A Modular and Automated Toolkit for Adaptive Retrieval-Augmented Generation Paper • 2504.08761 • Published Mar 31, 2025 • 7
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts Paper • 2409.16040 • Published Sep 24, 2024 • 16
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing Paper • 2412.14711 • Published Dec 19, 2024 • 16
Aria: An Open Multimodal Native Mixture-of-Experts Model Paper • 2410.05993 • Published Oct 8, 2024 • 111
view article Article Introducing RWKV - An RNN with the advantages of a transformer +2 May 15, 2023 • 23
TransformerFAM: Feedback attention is working memory Paper • 2404.09173 • Published Apr 14, 2024 • 43
A Time Series is Worth 64 Words: Long-term Forecasting with Transformers Paper • 2211.14730 • Published Nov 27, 2022 • 3