Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:1911.02150

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 17
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 1

inference optimization

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 13
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 8
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Paper • 2407.08608 • Published Jul 11, 2024 • 1
Fast Transformer Decoding: One Write-Head is All You Need

Paper • 1911.02150 • Published Nov 6, 2019 • 6

code-assistant-llm

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 13
Fast Transformer Decoding: One Write-Head is All You Need

Paper • 1911.02150 • Published Nov 6, 2019 • 6
Efficient Training of Language Models to Fill in the Middle

Paper • 2207.14255 • Published Jul 28, 2022 • 1

Papers - Decoders

Lossless Acceleration for Seq2seq Generation with Aggressive Decoding

Paper • 2205.10350 • Published May 20, 2022 • 2
Blockwise Parallel Decoding for Deep Autoregressive Models

Paper • 1811.03115 • Published Nov 7, 2018 • 2
Fast Transformer Decoding: One Write-Head is All You Need

Paper • 1911.02150 • Published Nov 6, 2019 • 6
Sequence-Level Knowledge Distillation

Paper • 1606.07947 • Published Jun 25, 2016 • 2

The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

Paper • 2306.01116 • Published Jun 1, 2023 • 34
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 13
RoFormer: Enhanced Transformer with Rotary Position Embedding

Paper • 2104.09864 • Published Apr 20, 2021 • 12
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 13

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs