-
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Paper • 2310.10837 • Published • 11 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 96 -
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper • 2310.16795 • Published • 27 -
LLM-FP4: 4-Bit Floating-Point Quantized Transformers
Paper • 2310.16836 • Published • 14
Collections
Discover the best community collections!
Collections including paper arxiv:2401.15024
-
XGen-7B Technical Report
Paper • 2309.03450 • Published • 8 -
FLM-101B: An Open LLM and How to Train It with $100K Budget
Paper • 2309.03852 • Published • 44 -
Robotic Table Tennis: A Case Study into a High Speed Learning System
Paper • 2309.03315 • Published • 7 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 80