-
Efficient Agents: Building Effective Agents While Reducing Cost
Paper • 2508.02694 • Published • 85 -
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Paper • 2504.11651 • Published • 30 -
leliuga/Phi-3-mini-4k-instruct-bnb-4bit
Text Generation • 2B • Updated • 21 • 5 -
solidrust/Phi-3-mini-4k-instruct-AWQ
Text Generation • 0.7B • Updated • 47
Collections
Discover the best community collections!
Collections including paper arxiv:2504.11651
-
deepseek-ai/DeepSeek-V3-0324
Text Generation • 685B • Updated • 345k • • 3.05k -
OuteAI/Llama-OuteTTS-1.0-1B
Text-to-Speech • 1B • Updated • 60.3k • 204 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 49 -
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Paper • 2504.11651 • Published • 30
-
Low-Rank Adapters Meet Neural Architecture Search for LLM Compression
Paper • 2501.16372 • Published • 11 -
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Paper • 2501.16937 • Published • 7 -
Matryoshka Quantization
Paper • 2502.06786 • Published • 30 -
Identifying Sensitive Weights via Post-quantization Integral
Paper • 2503.01901 • Published • 8
-
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Paper • 2411.04952 • Published • 31 -
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models
Paper • 2411.05005 • Published • 13 -
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Paper • 2411.04075 • Published • 17 -
Self-Consistency Preference Optimization
Paper • 2411.04109 • Published • 19
-
LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers
Paper • 2507.04404 • Published • 21 -
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Paper • 2504.11651 • Published • 30 -
A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone
Paper • 2505.12781 • Published • 2 -
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 248
-
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Paper • 2411.11504 • Published • 24 -
Top-nσ: Not All Logits Are You Need
Paper • 2411.07641 • Published • 23 -
Adaptive Decoding via Latent Preference Optimization
Paper • 2411.09661 • Published • 10 -
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Paper • 2411.13476 • Published • 16
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 35 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 28 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 127 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 23
-
Efficient Agents: Building Effective Agents While Reducing Cost
Paper • 2508.02694 • Published • 85 -
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Paper • 2504.11651 • Published • 30 -
leliuga/Phi-3-mini-4k-instruct-bnb-4bit
Text Generation • 2B • Updated • 21 • 5 -
solidrust/Phi-3-mini-4k-instruct-AWQ
Text Generation • 0.7B • Updated • 47
-
LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers
Paper • 2507.04404 • Published • 21 -
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Paper • 2504.11651 • Published • 30 -
A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone
Paper • 2505.12781 • Published • 2 -
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 248
-
deepseek-ai/DeepSeek-V3-0324
Text Generation • 685B • Updated • 345k • • 3.05k -
OuteAI/Llama-OuteTTS-1.0-1B
Text-to-Speech • 1B • Updated • 60.3k • 204 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 49 -
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Paper • 2504.11651 • Published • 30
-
Low-Rank Adapters Meet Neural Architecture Search for LLM Compression
Paper • 2501.16372 • Published • 11 -
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Paper • 2501.16937 • Published • 7 -
Matryoshka Quantization
Paper • 2502.06786 • Published • 30 -
Identifying Sensitive Weights via Post-quantization Integral
Paper • 2503.01901 • Published • 8
-
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Paper • 2411.11504 • Published • 24 -
Top-nσ: Not All Logits Are You Need
Paper • 2411.07641 • Published • 23 -
Adaptive Decoding via Latent Preference Optimization
Paper • 2411.09661 • Published • 10 -
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Paper • 2411.13476 • Published • 16
-
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Paper • 2411.04952 • Published • 31 -
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models
Paper • 2411.05005 • Published • 13 -
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Paper • 2411.04075 • Published • 17 -
Self-Consistency Preference Optimization
Paper • 2411.04109 • Published • 19
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 35 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 28 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 127 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 23