Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2403.03507

Papers I find interesting

Scaling Instruction-Finetuned Language Models

Paper • 2210.11416 • Published Oct 20, 2022 • 7
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 143
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Paper • 2403.05530 • Published Mar 8, 2024 • 64
Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7, 2024 • 63

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 186
Mixture-of-Subspaces in Low-Rank Adaptation

Paper • 2406.11909 • Published Jun 16, 2024 • 3
Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients

Paper • 2406.17660 • Published Jun 25, 2024 • 5
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients

Paper • 2407.11239 • Published Jul 15, 2024 • 8

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 186
Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7, 2024 • 63
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs

Paper • 2407.02552 • Published Jul 2, 2024 • 4
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

Paper • 2407.16741 • Published Jul 23, 2024 • 71

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 186

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 186

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 186

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 186
RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15, 2024 • 70
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Paper • 2403.13372 • Published Mar 20, 2024 • 70
InternLM2 Technical Report

Paper • 2403.17297 • Published Mar 26, 2024 • 31

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 55
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 186
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Paper • 2402.04291 • Published Feb 6, 2024 • 49
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 610

ibm-research/AttaQ

Viewer • Updated Jan 26, 2024 • 1.4k • 2.41k • 15
snorkelai/snorkel-curated-instruction-tuning

Preview • Updated Mar 11, 2024 • 145 • 8
corbyrosset/researchy_questions

Viewer • Updated Feb 29, 2024 • 96.4k • 207 • 27
argilla/ultrafeedback-binarized-preferences

Viewer • Updated Nov 30, 2023 • 63.6k • 624 • 72

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 186
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 104

Previous
1
2
3
4
5
6
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs