Sayambhu Sen
Testerpce
AI & ML interests
None yet
Recent Activity
updated
a collection
about 12 hours ago
Reinforcement learning
updated
a collection
1 day ago
Reasoning
updated
a collection
1 day ago
Tool
Organizations
Foundation Models
Test time
Physics and operators
Vision Language Action models
-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
Paper • 2507.01925 • Published • 37 -
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
Paper • 2507.16746 • Published • 34 -
MolmoAct: Action Reasoning Models that can Reason in Space
Paper • 2508.07917 • Published • 41 -
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
Paper • 2508.20072 • Published • 28
World model
Compression
Process Reward Modelling
-
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs
Paper • 2506.18896 • Published • 29 -
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Paper • 2505.15277 • Published • 103 -
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models
Paper • 2501.03124 • Published • 14 -
Is PRM Necessary? Problem-Solving RL Implicitly Induces PRM Capability in LLMs
Paper • 2505.11227 • Published
SAE
-
Resa: Transparent Reasoning Models via SAEs
Paper • 2506.09967 • Published • 22 -
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Paper • 2408.05147 • Published • 41 -
Train Sparse Autoencoders Efficiently by Utilizing Features Correlation
Paper • 2505.22255 • Published • 25 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 121
Theory and Representation learning
Graph
-
NodeRAG: Structuring Graph-based RAG with Heterogeneous Nodes
Paper • 2504.11544 • Published • 43 -
On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models
Paper • 2307.09793 • Published • 46 -
GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks
Paper • 2504.12764 • Published • 41 -
Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks
Paper • 2505.16901 • Published • 47
Search
-
Open Deep Search: Democratizing Search with Open-source Reasoning Agents
Paper • 2503.20201 • Published • 48 -
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
Paper • 2503.19470 • Published • 19 -
Spacer: Towards Engineered Scientific Inspiration
Paper • 2508.17661 • Published • 31
Diversity
Self correction
Speech
-
Slamming: Training a Speech Language Model on One GPU in a Day
Paper • 2502.15814 • Published • 70 -
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
Paper • 2503.04724 • Published • 72 -
Audio-Aware Large Language Models as Judges for Speaking Styles
Paper • 2506.05984 • Published • 15 -
Optimizing Multilingual Text-To-Speech with Accents & Emotions
Paper • 2506.16310 • Published • 24
Synthetic data
MoE
-
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
Paper • 2410.10814 • Published • 52 -
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
Paper • 2502.16894 • Published • 31 -
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs
Paper • 2506.14731 • Published • 9 -
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation
Paper • 2506.18349 • Published • 13
Markov chain
Planning
-
Compositional Foundation Models for Hierarchical Planning
Paper • 2309.08587 • Published • 11 -
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models
Paper • 2405.09220 • Published • 29 -
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents
Paper • 2504.15785 • Published • 19 -
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning
Paper • 2508.20096 • Published • 35
Multilingual
-
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 85 -
Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages
Paper • 2401.05811 • Published • 8 -
Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis
Paper • 2409.20059 • Published • 17 -
Are Character-level Translations Worth the Wait? Comparing Character- and Subword-level Models for Machine Translation
Paper • 2302.14220 • Published
Partial layer training LLMs
-
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)
Paper • 2309.08968 • Published • 23 -
GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning
Paper • 2505.20355 • Published • 36 -
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding
Paper • 2505.22618 • Published • 42
Evaluation
Math
-
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 55 -
Solving Inequality Proofs with Large Language Models
Paper • 2506.07927 • Published • 20 -
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
Paper • 2507.00432 • Published • 76 -
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization
Paper • 2507.06181 • Published • 42
Style transfer
Reinforcement learning
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 25 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 39 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 100 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
Knowledge
-
Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models
Paper • 2408.15915 • Published • 20 -
ReLearn: Unlearning via Learning for Large Language Models
Paper • 2502.11190 • Published • 30 -
ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation
Paper • 2503.21729 • Published • 29 -
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Paper • 2504.00509 • Published • 23
Eval
LLM judge
3D
Materials and structures
Vision
Code
-
The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs
Paper • 2506.18403 • Published • 3 -
ReCode: Updating Code API Knowledge with Reinforcement Learning
Paper • 2506.20495 • Published • 8 -
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
Paper • 2507.23348 • Published • 10
Data
-
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
Paper • 2506.19290 • Published • 51 -
Data Efficacy for Language Model Training
Paper • 2506.21545 • Published • 11 -
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents
Paper • 2507.04009 • Published • 41 -
RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs
Paper • 2507.03253 • Published • 18
Memory
-
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team
Paper • 2506.14234 • Published • 41 -
MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models
Paper • 2506.14435 • Published • 8 -
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Paper • 2504.19413 • Published • 18 -
MemOS: A Memory OS for AI System
Paper • 2507.03724 • Published • 153
Applications and Uses
-
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
Paper • 2506.09790 • Published • 54 -
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance
Paper • 2506.06444 • Published • 74 -
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
Paper • 2506.11763 • Published • 70 -
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research
Paper • 2502.04644 • Published • 3
Adversarial
Multimodal
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 165 -
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 23 -
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 14
Interpretable
-
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 121 -
Large Language Models are Locally Linear Mappings
Paper • 2505.24293 • Published • 15 -
Thought Anchors: Which LLM Reasoning Steps Matter?
Paper • 2506.19143 • Published • 13
Diffusion
-
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Paper • 2503.09573 • Published • 73 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 55 -
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding
Paper • 2505.16990 • Published • 21 -
D-AR: Diffusion via Autoregressive Models
Paper • 2505.23660 • Published • 34
Information_retrieval
-
Rank1: Test-Time Compute for Reranking in Information Retrieval
Paper • 2502.18418 • Published • 28 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 35 -
Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval
Paper • 2505.16967 • Published • 24 -
SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension
Paper • 2508.01959 • Published • 57
Attention
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 298 -
Lizard: An Efficient Linearization Framework for Large Language Models
Paper • 2507.09025 • Published • 17 -
On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective
Paper • 2507.23632 • Published • 6
Agent
-
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 69 -
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning
Paper • 2502.06060 • Published • 38 -
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 193 -
SurveyX: Academic Survey Automation via Large Language Models
Paper • 2502.14776 • Published • 101
RAG
-
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Paper • 2410.08815 • Published • 49 -
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval
Paper • 2412.15443 • Published • 10 -
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement
Paper • 2412.12881 • Published • 2 -
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
Paper • 2506.06962 • Published • 29
Prompt papers
Sparsity
State space LLM
Reasoning
-
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 39 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 55 -
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
Paper • 2411.04282 • Published • 38 -
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Paper • 2411.14432 • Published • 26
Fine tuning
-
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 26 -
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Paper • 2410.23743 • Published • 64 -
Direct Preference Optimization Using Sparse Feature-Level Constraints
Paper • 2411.07618 • Published • 17 -
Transformer^2: Self-adaptive LLMs
Paper • 2501.06252 • Published • 55
Dataset and Data processing
-
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
Paper • 2405.20541 • Published • 24 -
RedPajama: an Open Dataset for Training Large Language Models
Paper • 2411.12372 • Published • 57 -
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
Paper • 2503.22230 • Published • 46
Video understanding
-
Wolf: Captioning Everything with a World Summarization Framework
Paper • 2407.18908 • Published • 33 -
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Paper • 2407.19985 • Published • 37 -
TPDiff: Temporal Pyramid Video Diffusion Model
Paper • 2503.09566 • Published • 46 -
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Paper • 2506.07464 • Published • 14
Long context
-
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
Paper • 2408.14906 • Published • 143 -
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Paper • 2410.10819 • Published • 8 -
LLMtimesMapReduce: Simplified Long-Sequence Processing using Large Language Models
Paper • 2410.09342 • Published • 40 -
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 53
Tool
Eval
Foundation Models
LLM judge
Test time
3D
Physics and operators
Materials and structures
Vision Language Action models
-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
Paper • 2507.01925 • Published • 37 -
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
Paper • 2507.16746 • Published • 34 -
MolmoAct: Action Reasoning Models that can Reason in Space
Paper • 2508.07917 • Published • 41 -
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
Paper • 2508.20072 • Published • 28
Vision
World model
Code
-
The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs
Paper • 2506.18403 • Published • 3 -
ReCode: Updating Code API Knowledge with Reinforcement Learning
Paper • 2506.20495 • Published • 8 -
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
Paper • 2507.23348 • Published • 10
Compression
Data
-
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
Paper • 2506.19290 • Published • 51 -
Data Efficacy for Language Model Training
Paper • 2506.21545 • Published • 11 -
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents
Paper • 2507.04009 • Published • 41 -
RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs
Paper • 2507.03253 • Published • 18
Process Reward Modelling
-
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs
Paper • 2506.18896 • Published • 29 -
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Paper • 2505.15277 • Published • 103 -
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models
Paper • 2501.03124 • Published • 14 -
Is PRM Necessary? Problem-Solving RL Implicitly Induces PRM Capability in LLMs
Paper • 2505.11227 • Published
Memory
-
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team
Paper • 2506.14234 • Published • 41 -
MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models
Paper • 2506.14435 • Published • 8 -
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Paper • 2504.19413 • Published • 18 -
MemOS: A Memory OS for AI System
Paper • 2507.03724 • Published • 153
SAE
-
Resa: Transparent Reasoning Models via SAEs
Paper • 2506.09967 • Published • 22 -
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Paper • 2408.05147 • Published • 41 -
Train Sparse Autoencoders Efficiently by Utilizing Features Correlation
Paper • 2505.22255 • Published • 25 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 121
Applications and Uses
-
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
Paper • 2506.09790 • Published • 54 -
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance
Paper • 2506.06444 • Published • 74 -
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
Paper • 2506.11763 • Published • 70 -
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research
Paper • 2502.04644 • Published • 3
Theory and Representation learning
Adversarial
Graph
-
NodeRAG: Structuring Graph-based RAG with Heterogeneous Nodes
Paper • 2504.11544 • Published • 43 -
On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models
Paper • 2307.09793 • Published • 46 -
GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks
Paper • 2504.12764 • Published • 41 -
Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks
Paper • 2505.16901 • Published • 47
Multimodal
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 165 -
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 23 -
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 14
Search
-
Open Deep Search: Democratizing Search with Open-source Reasoning Agents
Paper • 2503.20201 • Published • 48 -
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
Paper • 2503.19470 • Published • 19 -
Spacer: Towards Engineered Scientific Inspiration
Paper • 2508.17661 • Published • 31
Interpretable
-
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 121 -
Large Language Models are Locally Linear Mappings
Paper • 2505.24293 • Published • 15 -
Thought Anchors: Which LLM Reasoning Steps Matter?
Paper • 2506.19143 • Published • 13
Diversity
Diffusion
-
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Paper • 2503.09573 • Published • 73 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 55 -
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding
Paper • 2505.16990 • Published • 21 -
D-AR: Diffusion via Autoregressive Models
Paper • 2505.23660 • Published • 34
Self correction
Information_retrieval
-
Rank1: Test-Time Compute for Reranking in Information Retrieval
Paper • 2502.18418 • Published • 28 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 35 -
Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval
Paper • 2505.16967 • Published • 24 -
SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension
Paper • 2508.01959 • Published • 57
Speech
-
Slamming: Training a Speech Language Model on One GPU in a Day
Paper • 2502.15814 • Published • 70 -
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
Paper • 2503.04724 • Published • 72 -
Audio-Aware Large Language Models as Judges for Speaking Styles
Paper • 2506.05984 • Published • 15 -
Optimizing Multilingual Text-To-Speech with Accents & Emotions
Paper • 2506.16310 • Published • 24
Attention
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 298 -
Lizard: An Efficient Linearization Framework for Large Language Models
Paper • 2507.09025 • Published • 17 -
On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective
Paper • 2507.23632 • Published • 6
Synthetic data
Agent
-
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 69 -
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning
Paper • 2502.06060 • Published • 38 -
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 193 -
SurveyX: Academic Survey Automation via Large Language Models
Paper • 2502.14776 • Published • 101
MoE
-
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
Paper • 2410.10814 • Published • 52 -
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
Paper • 2502.16894 • Published • 31 -
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs
Paper • 2506.14731 • Published • 9 -
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation
Paper • 2506.18349 • Published • 13
RAG
-
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Paper • 2410.08815 • Published • 49 -
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval
Paper • 2412.15443 • Published • 10 -
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement
Paper • 2412.12881 • Published • 2 -
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
Paper • 2506.06962 • Published • 29
Markov chain
Prompt papers
Planning
-
Compositional Foundation Models for Hierarchical Planning
Paper • 2309.08587 • Published • 11 -
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models
Paper • 2405.09220 • Published • 29 -
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents
Paper • 2504.15785 • Published • 19 -
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning
Paper • 2508.20096 • Published • 35
Sparsity
Multilingual
-
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 85 -
Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages
Paper • 2401.05811 • Published • 8 -
Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis
Paper • 2409.20059 • Published • 17 -
Are Character-level Translations Worth the Wait? Comparing Character- and Subword-level Models for Machine Translation
Paper • 2302.14220 • Published
State space LLM
Partial layer training LLMs
-
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)
Paper • 2309.08968 • Published • 23 -
GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning
Paper • 2505.20355 • Published • 36 -
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding
Paper • 2505.22618 • Published • 42
Reasoning
-
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 39 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 55 -
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
Paper • 2411.04282 • Published • 38 -
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Paper • 2411.14432 • Published • 26
Evaluation
Fine tuning
-
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 26 -
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Paper • 2410.23743 • Published • 64 -
Direct Preference Optimization Using Sparse Feature-Level Constraints
Paper • 2411.07618 • Published • 17 -
Transformer^2: Self-adaptive LLMs
Paper • 2501.06252 • Published • 55
Math
-
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 55 -
Solving Inequality Proofs with Large Language Models
Paper • 2506.07927 • Published • 20 -
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
Paper • 2507.00432 • Published • 76 -
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization
Paper • 2507.06181 • Published • 42
Dataset and Data processing
-
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
Paper • 2405.20541 • Published • 24 -
RedPajama: an Open Dataset for Training Large Language Models
Paper • 2411.12372 • Published • 57 -
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
Paper • 2503.22230 • Published • 46
Style transfer
Video understanding
-
Wolf: Captioning Everything with a World Summarization Framework
Paper • 2407.18908 • Published • 33 -
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Paper • 2407.19985 • Published • 37 -
TPDiff: Temporal Pyramid Video Diffusion Model
Paper • 2503.09566 • Published • 46 -
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Paper • 2506.07464 • Published • 14
Reinforcement learning
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 25 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 39 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 100 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
Long context
-
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
Paper • 2408.14906 • Published • 143 -
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Paper • 2410.10819 • Published • 8 -
LLMtimesMapReduce: Simplified Long-Sequence Processing using Large Language Models
Paper • 2410.09342 • Published • 40 -
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 53
Knowledge
-
Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models
Paper • 2408.15915 • Published • 20 -
ReLearn: Unlearning via Learning for Large Language Models
Paper • 2502.11190 • Published • 30 -
ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation
Paper • 2503.21729 • Published • 29 -
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Paper • 2504.00509 • Published • 23