Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2504.08791

about 5 hours ago

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published 20 days ago • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published 19 days ago • 17
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published 27 days ago • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published 14 days ago • 44

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7 • 134

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7 • 134

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7 • 134
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14 • 281

inference optimization

Low-Rank Adapters Meet Neural Architecture Search for LLM Compression

Paper • 2501.16372 • Published Jan 23 • 11
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models

Paper • 2501.16937 • Published Jan 28 • 7
Matryoshka Quantization

Paper • 2502.06786 • Published Feb 10 • 30
Identifying Sensitive Weights via Post-quantization Integral

Paper • 2503.01901 • Published Feb 28 • 8

LLM resource-constrained Inference

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7 • 134
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float

Paper • 2504.11651 • Published Apr 15 • 30

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7 • 134
TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 120
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Paper • 2504.17192 • Published Apr 24 • 115
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Paper • 2506.16406 • Published Jun 19 • 126

Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization

Paper • 2504.08641 • Published Apr 11 • 7
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7 • 134
Describe Anything: Detailed Localized Image and Video Captioning

Paper • 2504.16072 • Published Apr 22 • 63
A Survey of Interactive Generative Video

Paper • 2504.21853 • Published Apr 30 • 47

Distributed Inference

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7 • 134

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 58
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 53
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 43
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20, 2024 • 64

about 5 hours ago

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published 20 days ago • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published 19 days ago • 17
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published 27 days ago • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published 14 days ago • 44

LLM resource-constrained Inference

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7 • 134
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float

Paper • 2504.11651 • Published Apr 15 • 30

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7 • 134

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7 • 134
TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 120
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Paper • 2504.17192 • Published Apr 24 • 115
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Paper • 2506.16406 • Published Jun 19 • 126

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7 • 134

Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization

Paper • 2504.08641 • Published Apr 11 • 7
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7 • 134
Describe Anything: Detailed Localized Image and Video Captioning

Paper • 2504.16072 • Published Apr 22 • 63
A Survey of Interactive Generative Video

Paper • 2504.21853 • Published Apr 30 • 47

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7 • 134
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14 • 281

Distributed Inference

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7 • 134

inference optimization

Low-Rank Adapters Meet Neural Architecture Search for LLM Compression

Paper • 2501.16372 • Published Jan 23 • 11
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models

Paper • 2501.16937 • Published Jan 28 • 7
Matryoshka Quantization

Paper • 2502.06786 • Published Feb 10 • 30
Identifying Sensitive Weights via Post-quantization Integral

Paper • 2503.01901 • Published Feb 28 • 8

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 58
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 53
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 43
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20, 2024 • 64

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs