Applications and Uses
updated
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
Paper
• 2506.09790
• Published
• 53
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety
Assurance
Paper
• 2506.06444
• Published
• 73
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
Paper
• 2506.11763
• Published
• 74
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research
Paper
• 2502.04644
• Published
• 4
Deep Research Agents: A Systematic Examination And Roadmap
Paper
• 2506.18096
• Published
• 3
Can LLMs Identify Critical Limitations within Scientific Research? A
Systematic Evaluation on AI Research Papers
Paper
• 2507.02694
• Published
• 19
Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs
More Realistic and Less Risky
Paper
• 2507.03336
• Published
• 7
PresentAgent: Multimodal Agent for Presentation Video Generation
Paper
• 2507.04036
• Published
• 11
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning
Systems in LLMs
Paper
• 2507.09477
• Published
• 88
AbGen: Evaluating Large Language Models in Ablation Study Design and
Evaluation for Scientific Research
Paper
• 2507.13300
• Published
• 20
Voost: A Unified and Scalable Diffusion Transformer for Bidirectional
Virtual Try-On and Try-Off
Paper
• 2508.04825
• Published
• 60
Complex Logical Instruction Generation
Paper
• 2508.09125
• Published
• 40
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges
Paper
• 2508.18076
• Published
• 6
UQ: Assessing Language Models on Unsolved Questions
Paper
• 2508.17580
• Published
• 15
A Survey of Scientific Large Language Models: From Data Foundations to
Agent Frontiers
Paper
• 2508.21148
• Published
• 140
AutoIntent: AutoML for Text Classification
Paper
• 2509.21138
• Published
• 36
StockBench: Can LLM Agents Trade Stocks Profitably In Real-world
Markets?
Paper
• 2510.02209
• Published
• 56
DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel
Translation
Paper
• 2510.09116
• Published
• 96
Back to Basics: Let Denoising Generative Models Denoise
Paper
• 2511.13720
• Published
• 69
Rethinking Training Dynamics in Scale-wise Autoregressive Generation
Paper
• 2512.06421
• Published
• 7
MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics
Paper
• 2601.02075
• Published
• 8
Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models
Paper
• 2601.01321
• Published
• 19
Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility
Paper
• 2601.17027
• Published
• 41
OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution
Paper
• 2601.20380
• Published
• 8
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models
Paper
• 2601.22060
• Published
• 156
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
Paper
• 2602.12670
• Published
• 51