-
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models
Paper • 2311.00871 • Published • 2 -
Can large language models explore in-context?
Paper • 2403.15371 • Published • 33 -
Data Distributional Properties Drive Emergent In-Context Learning in Transformers
Paper • 2205.05055 • Published • 2 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 37
Collections
Discover the best community collections!
Collections including paper arxiv:2403.07809
-
Neural networks behave as hash encoders: An empirical study
Paper • 2101.05490 • Published • 2 -
A Multiscale Visualization of Attention in the Transformer Model
Paper • 1906.05714 • Published • 2 -
BERT Rediscovers the Classical NLP Pipeline
Paper • 1905.05950 • Published • 2 -
Using Explainable AI and Transfer Learning to understand and predict the maintenance of Atlantic blocking with limited observational data
Paper • 2404.08613 • Published • 1
-
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 18 -
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Paper • 2310.00535 • Published • 2 -
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Paper • 2307.09458 • Published • 11 -
The Impact of Depth and Width on Transformer Language Model Generalization
Paper • 2310.19956 • Published • 10