Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2310.01889

Simple linear attention language models balance the recall-throughput tradeoff

Paper • 2402.18668 • Published Feb 28, 2024 • 20
Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16, 2024 • 81
Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1, 2024 • 24
Zoology: Measuring and Improving Recall in Efficient Language Models

Paper • 2312.04927 • Published Dec 8, 2023 • 2

Transformer Arch

Checkout: https://bbycroft.net/llm and http://nlp.seas.harvard.edu/2018/04/03/attention.html

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 55
ImageNet Large Scale Visual Recognition Challenge

Paper • 1409.0575 • Published Sep 1, 2014 • 8
Sequence to Sequence Learning with Neural Networks

Paper • 1409.3215 • Published Sep 10, 2014 • 3
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 13

Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon

Paper • 2401.03462 • Published Jan 7, 2024 • 27
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

Paper • 2305.07185 • Published May 12, 2023 • 9
YaRN: Efficient Context Window Extension of Large Language Models

Paper • 2309.00071 • Published Aug 31, 2023 • 68
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache

Paper • 2401.02669 • Published Jan 5, 2024 • 16

attention and long context

Efficient Streaming Language Models with Attention Sinks

Paper • 2309.17453 • Published Sep 29, 2023 • 13
Effective Long-Context Scaling of Foundation Models

Paper • 2309.16039 • Published Sep 27, 2023 • 30
allenai/longformer-base-4096

Updated Apr 5, 2023 • 5.88M • 189
google/bigbird-roberta-base

Updated Jun 2, 2021 • 37.5k • 52

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs