Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2403.07809

Papers - ICL - In-Context Learning

Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models

Paper • 2311.00871 • Published Nov 1, 2023 • 2
Can large language models explore in-context?

Paper • 2403.15371 • Published Mar 22, 2024 • 33
Data Distributional Properties Drive Emergent In-Context Learning in Transformers

Paper • 2205.05055 • Published Apr 22, 2022 • 2
Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2, 2024 • 37

Papers - Explainability

Neural networks behave as hash encoders: An empirical study

Paper • 2101.05490 • Published Jan 14, 2021 • 2
A Multiscale Visualization of Attention in the Transformer Model

Paper • 1906.05714 • Published Jun 12, 2019 • 2
BERT Rediscovers the Classical NLP Pipeline

Paper • 1905.05950 • Published May 15, 2019 • 2
Using Explainable AI and Transfer Learning to understand and predict the maintenance of Atlantic blocking with limited observational data

Paper • 2404.08613 • Published Apr 12, 2024 • 1

Papers - Custom Layers

Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning

Paper • 2310.20587 • Published Oct 31, 2023 • 18
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention

Paper • 2310.00535 • Published Oct 1, 2023 • 2
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

Paper • 2307.09458 • Published Jul 18, 2023 • 11
The Impact of Depth and Width on Transformer Language Model Generalization

Paper • 2310.19956 • Published Oct 30, 2023 • 10

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs