-
meta-llama/Llama-2-13b
Text Generation β’ Updated β’ 337 -
mistralai/Mixtral-8x7B-v0.1
Text Generation β’ Updated β’ 461k β’ β’ 1.67k -
mistralai/Mixtral-8x7B-Instruct-v0.1
Text Generation β’ Updated β’ 912k β’ β’ 4.29k -
Adapting Large Language Models via Reading Comprehension
Paper β’ 2309.09530 β’ Published β’ 77
Collections
Discover the best community collections!
Collections including paper arxiv:2402.13753
-
Exponentially Faster Language Modelling
Paper β’ 2311.10770 β’ Published β’ 118 -
stabilityai/stable-video-diffusion-img2vid-xt
Image-to-Video β’ Updated β’ 394k β’ 2.85k -
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Paper β’ 2311.13384 β’ Published β’ 51 -
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Paper β’ 2311.12454 β’ Published β’ 30
-
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
Paper β’ 2309.10400 β’ Published β’ 26 -
CONFLATOR: Incorporating Switching Point based Rotatory Positional Encodings for Code-Mixed Language Modeling
Paper β’ 2309.05270 β’ Published β’ 1 -
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper β’ 2402.13753 β’ Published β’ 115
-
Attention Is All You Need
Paper β’ 1706.03762 β’ Published β’ 49 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper β’ 2307.08691 β’ Published β’ 8 -
Mixtral of Experts
Paper β’ 2401.04088 β’ Published β’ 157 -
Mistral 7B
Paper β’ 2310.06825 β’ Published β’ 46
-
TRAMS: Training-free Memory Selection for Long-range Language Modeling
Paper β’ 2310.15494 β’ Published β’ 2 -
A Long Way to Go: Investigating Length Correlations in RLHF
Paper β’ 2310.03716 β’ Published β’ 10 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper β’ 2309.00071 β’ Published β’ 66 -
Giraffe: Adventures in Expanding Context Lengths in LLMs
Paper β’ 2308.10882 β’ Published β’ 1
-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper β’ 2309.14509 β’ Published β’ 18 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper β’ 2401.02412 β’ Published β’ 37 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper β’ 2401.06066 β’ Published β’ 47 -
Tuning Language Models by Proxy
Paper β’ 2401.08565 β’ Published β’ 22
-
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper β’ 2309.12307 β’ Published β’ 88 -
LMDX: Language Model-based Document Information Extraction and Localization
Paper β’ 2309.10952 β’ Published β’ 65 -
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper β’ 2310.09263 β’ Published β’ 39 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper β’ 2310.11453 β’ Published β’ 96