Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2310.19102

LLM Optimization

A Survey on Efficient Inference for Large Language Models

Paper • 2404.14294 • Published Apr 22, 2024 • 2
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Paper • 2310.19102 • Published Oct 29, 2023 • 11
Reducing Activation Recomputation in Large Transformer Models

Paper • 2205.05198 • Published May 10, 2022

HensonResearchFor1BitLLMs

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 608
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Paper • 2310.19102 • Published Oct 29, 2023 • 11
AMSP: Super-Scaling LLM Training via Advanced Model States Partitioning

Paper • 2311.00257 • Published Nov 1, 2023 • 9
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Paper • 2402.04291 • Published Feb 6, 2024 • 49

QuIP: 2-Bit Quantization of Large Language Models With Guarantees

Paper • 2307.13304 • Published Jul 25, 2023 • 2
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

Paper • 2306.03078 • Published Jun 5, 2023 • 3
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

Paper • 2308.13137 • Published Aug 25, 2023 • 18
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Paper • 2306.00978 • Published Jun 1, 2023 • 9

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

Paper • 2310.08659 • Published Oct 12, 2023 • 25
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Paper • 2309.14717 • Published Sep 26, 2023 • 44
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models

Paper • 2309.02784 • Published Sep 6, 2023 • 1
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers

Paper • 2309.16119 • Published Sep 28, 2023 • 1

Large Language Models for Compiler Optimization

Paper • 2309.07062 • Published Sep 11, 2023 • 23
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

Paper • 2310.17157 • Published Oct 26, 2023 • 13
FP8-LM: Training FP8 Large Language Models

Paper • 2310.18313 • Published Oct 27, 2023 • 33
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Paper • 2310.19102 • Published Oct 29, 2023 • 11

Approximating Two-Layer Feedforward Networks for Efficient Transformers

Paper • 2310.10837 • Published Oct 16, 2023 • 11
BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 96
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models

Paper • 2310.16795 • Published Oct 25, 2023 • 27
LLM-FP4: 4-Bit Floating-Point Quantized Transformers

Paper • 2310.16836 • Published Oct 25, 2023 • 14

interesting stuff

Chain-of-Verification Reduces Hallucination in Large Language Models

Paper • 2309.11495 • Published Sep 20, 2023 • 37
Adapting Large Language Models via Reading Comprehension

Paper • 2309.09530 • Published Sep 18, 2023 • 77
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

Paper • 2309.09400 • Published Sep 17, 2023 • 85
Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 83

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

Paper • 2309.04662 • Published Sep 9, 2023 • 23
Neurons in Large Language Models: Dead, N-gram, Positional

Paper • 2309.04827 • Published Sep 9, 2023 • 16
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Paper • 2309.05516 • Published Sep 11, 2023 • 9
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs

Paper • 2309.03907 • Published May 18, 2023 • 11

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs