Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.13753

meta-llama/Llama-2-13b

Text Generation • Updated Apr 17, 2024 • 337
mistralai/Mixtral-8x7B-v0.1

Text Generation • Updated Jul 24, 2024 • 461k • • 1.67k
mistralai/Mixtral-8x7B-Instruct-v0.1

Text Generation • Updated Aug 19, 2024 • 912k • • 4.29k
Adapting Large Language Models via Reading Comprehension

Paper • 2309.09530 • Published Sep 18, 2023 • 77

WizardLMTeam/WizardCoder-15B-V1.0

Text Generation • Updated Jan 19, 2024 • 981 • 750
mistralai/Mixtral-8x7B-v0.1

Text Generation • Updated Jul 24, 2024 • 461k • • 1.67k
TheBloke/Mixtral-8x7B-v0.1-GPTQ

Text Generation • Updated Dec 14, 2023 • 558 • 127
zhangtao103239/Qwen-1.8B-GGUF

Updated Dec 20, 2023 • 22 • 4

Language Models

Exponentially Faster Language Modelling

Paper • 2311.10770 • Published Nov 15, 2023 • 118
stabilityai/stable-video-diffusion-img2vid-xt

Image-to-Video • Updated Jul 10, 2024 • 394k • 2.85k
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes

Paper • 2311.13384 • Published Nov 22, 2023 • 51
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis

Paper • 2311.12454 • Published Nov 21, 2023 • 30

01-ai/Yi-6B

Text Generation • Updated Nov 11, 2024 • 6.98k • 371
meta-llama/Llama-2-7b

Text Generation • Updated Apr 17, 2024 • 4.23k
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21, 2024 • 115

PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training

Paper • 2309.10400 • Published Sep 19, 2023 • 26
CONFLATOR: Incorporating Switching Point based Rotatory Positional Encodings for Code-Mixed Language Modeling

Paper • 2309.05270 • Published Sep 11, 2023 • 1
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21, 2024 • 115

Training & Architectures

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 49
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 8
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 157
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 46

TRAMS: Training-free Memory Selection for Long-range Language Modeling

Paper • 2310.15494 • Published Oct 24, 2023 • 2
A Long Way to Go: Investigating Length Correlations in RLHF

Paper • 2310.03716 • Published Oct 5, 2023 • 10
YaRN: Efficient Context Window Extension of Large Language Models

Paper • 2309.00071 • Published Aug 31, 2023 • 66
Giraffe: Adventures in Expanding Context Lengths in LLMs

Paper • 2308.10882 • Published Aug 21, 2023 • 1

attention and long context

Efficient Streaming Language Models with Attention Sinks

Paper • 2309.17453 • Published Sep 29, 2023 • 13
Effective Long-Context Scaling of Foundation Models

Paper • 2309.16039 • Published Sep 27, 2023 • 30
allenai/longformer-base-4096

Updated Apr 5, 2023 • 2.8M • 185
google/bigbird-roberta-base

Updated Jun 2, 2021 • 35k • 51

DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

Paper • 2309.14509 • Published Sep 25, 2023 • 18
LLM Augmented LLMs: Expanding Capabilities through Composition

Paper • 2401.02412 • Published Jan 4, 2024 • 37
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 47
Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16, 2024 • 22

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

Paper • 2309.12307 • Published Sep 21, 2023 • 88
LMDX: Language Model-based Document Information Extraction and Localization

Paper • 2309.10952 • Published Sep 19, 2023 • 65
Table-GPT: Table-tuned GPT for Diverse Table Tasks

Paper • 2310.09263 • Published Oct 13, 2023 • 39
BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 96

Previous
1
2
3
4
5
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs