Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.07129

Papers - ResNet

Wide Residual Networks

Paper • 1605.07146 • Published May 23, 2016 • 2
Characterizing signal propagation to close the performance gap in unnormalized ResNets

Paper • 2101.08692 • Published Jan 21, 2021 • 2
Pareto-Optimal Quantized ResNet Is Mostly 4-bit

Paper • 2105.03536 • Published May 7, 2021 • 2
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations

Paper • 2106.01548 • Published Jun 3, 2021 • 2

Resonance RoPE: Improving Context Length Generalization of Large Language Models

Paper • 2403.00071 • Published Feb 29, 2024 • 24
Scaling Laws of RoPE-based Extrapolation

Paper • 2310.05209 • Published Oct 8, 2023 • 7
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

Paper • 2404.12387 • Published Apr 18, 2024 • 39
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22, 2024 • 127

Papers - Attention

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16, 2024 • 81
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 5
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Paper • 2402.15220 • Published Feb 23, 2024 • 21
Sequence Parallelism: Long Sequence Training from System Perspective

Paper • 2105.13120 • Published May 26, 2021 • 5

🔍 Interpretability & Analysis of LMs

Outstanding research in LM interpretability and evaluation, summarized

A Close Look at Decomposition-based XAI-Methods for Transformer Language Models

Paper • 2502.15886 • Published 21 days ago • 1
We Can't Understand AI Using our Existing Vocabulary

Paper • 2502.07586 • Published Feb 11 • 10
Position-aware Automatic Circuit Discovery

Paper • 2502.04577 • Published Feb 7 • 1
Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution

Paper • 2501.18887 • Published Jan 31 • 1

Previous
1
2
3
4
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs