Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2405.12250

ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models

Paper • 2405.09220 • Published May 15, 2024 • 28
Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 157

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Paper • 2405.08748 • Published May 14, 2024 • 24
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published May 16, 2024 • 29
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 131
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Paper • 2405.11143 • Published May 20, 2024 • 38

Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

Paper • 2404.18796 • Published Apr 29, 2024 • 70
KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30, 2024 • 111
The Road Less Scheduled

Paper • 2405.15682 • Published May 24, 2024 • 27
Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 157

about 20 hours ago

Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11, 2024 • 92
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

Paper • 2404.10667 • Published Apr 16, 2024 • 19
Instruction-tuned Language Models are Better Knowledge Learners

Paper • 2402.12847 • Published Feb 20, 2024 • 26
DoRA: Weight-Decomposed Low-Rank Adaptation

Paper • 2402.09353 • Published Feb 14, 2024 • 28

Interesting Papers

Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11, 2024 • 92
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Paper • 2404.05961 • Published Apr 9, 2024 • 65
Compression Represents Intelligence Linearly

Paper • 2404.09937 • Published Apr 15, 2024 • 27
Multi-Head Mixture-of-Experts

Paper • 2404.15045 • Published Apr 23, 2024 • 60

To read... eventually

A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics.

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 127
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19, 2024 • 53
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6, 2024 • 14
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25, 2024 • 68

Transformers & Attention

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16, 2024 • 81
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Paper • 2401.04658 • Published Jan 9, 2024 • 27
KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30, 2024 • 111
Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 157

efficient inference

Hydragen: High-Throughput LLM Inference with Shared Prefixes

Paper • 2402.05099 • Published Feb 7, 2024 • 20
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting

Paper • 2402.13720 • Published Feb 21, 2024 • 7
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21, 2024 • 32
Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 157

machine learning and neural network papers 📜

SMOTE: Synthetic Minority Over-sampling Technique

Paper • 1106.1813 • Published Jun 9, 2011 • 1
Scikit-learn: Machine Learning in Python

Paper • 1201.0490 • Published Jan 2, 2012 • 1
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Paper • 1406.1078 • Published Jun 3, 2014
Distributed Representations of Sentences and Documents

Paper • 1405.4053 • Published May 16, 2014

PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models

Paper • 2402.08714 • Published Feb 13, 2024 • 14
Data Engineering for Scaling Language Models to 128K Context

Paper • 2402.10171 • Published Feb 15, 2024 • 25
RLVF: Learning from Verbal Feedback without Overgeneralization

Paper • 2402.10893 • Published Feb 16, 2024 • 12
Coercing LLMs to do and reveal (almost) anything

Paper • 2402.14020 • Published Feb 21, 2024 • 13

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs