Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2405.14860

Selective Attention Improves Transformer

Paper • 2410.02703 • Published Oct 3, 2024 • 24
Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 169
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Paper • 2410.05076 • Published Oct 7, 2024 • 8
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Paper • 2410.13276 • Published Oct 17, 2024 • 26

Fundational - Deep Learning

Just How Flexible are Neural Networks in Practice?

Paper • 2406.11463 • Published Jun 17, 2024 • 7
Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 39
KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30, 2024 • 109
An Interactive Agent Foundation Model

Paper • 2402.05929 • Published Feb 8, 2024 • 28

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published Jun 10, 2024 • 67
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Paper • 2406.06469 • Published Jun 10, 2024 • 25
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

Paper • 2406.04271 • Published Jun 6, 2024 • 29
Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4, 2024 • 38

Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 39
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Paper • 2410.02707 • Published Oct 3, 2024 • 48
RepVideo: Rethinking Cross-Layer Representation for Video Generation

Paper • 2501.08994 • Published 28 days ago • 15

Phased Consistency Model

Paper • 2405.18407 • Published May 28, 2024 • 46
AutoCoder: Enhancing Code Large Language Model with AIEV-Instruct

Paper • 2405.14906 • Published May 23, 2024 • 25
Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 39

Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 39
TimeGPT-1

Paper • 2310.03589 • Published Oct 5, 2023 • 5
A Careful Examination of Large Language Model Performance on Grade School Arithmetic

Paper • 2405.00332 • Published May 1, 2024 • 32
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

Paper • 2405.18991 • Published May 29, 2024 • 12

Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 39
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published May 31, 2024 • 64

Models and Linearity

Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 151
Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 39

Interesting things.

AtP*: An efficient and scalable method for localizing LLM behaviour to components

Paper • 2403.00745 • Published Mar 1, 2024 • 13
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 608
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

Paper • 2402.16840 • Published Feb 26, 2024 • 24
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21, 2024 • 115

Relative representations enable zero-shot latent space communication

Paper • 2209.15430 • Published Sep 30, 2022 • 1
Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 39
The Geometry of Categorical and Hierarchical Concepts in Large Language Models

Paper • 2406.01506 • Published Jun 3, 2024 • 3

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs