Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models
Abstract
Think in Games (TiG) framework enables large language models to develop procedural knowledge through interactive game environments, achieving competitive performance with reduced data and computational demands while providing transparent explanations.
Large language models (LLMs) excel at complex reasoning tasks such as mathematics and coding, yet they frequently struggle with simple interactive tasks that young children perform effortlessly. This discrepancy highlights a critical gap between declarative knowledge (knowing about something) and procedural knowledge (knowing how to do something). Although traditional reinforcement learning (RL) agents can acquire procedural knowledge through environmental interaction, they often operate as black boxes and require substantial training data. In contrast, LLMs possess extensive world knowledge and reasoning capabilities, but are unable to effectively convert this static knowledge into dynamic decision-making in interactive settings. To address this challenge, we propose Think in Games (TiG), a novel framework that empowers LLMs to develop procedural understanding through direct interaction with game environments, while retaining their inherent reasoning and explanatory abilities. Specifically, TiG reformulates RL-based decision-making as a language modeling task: LLMs generate language-guided policies, which are refined iteratively through online reinforcement learning based on environmental feedback. Our experimental results show that TiG successfully bridges the gap between declarative and procedural knowledge, achieving competitive performance with dramatically lower data and computational demands compared to conventional RL methods. Moreover, TiG provides step-by-step natural language explanations for its decisions, greatly improving transparency and interpretability in complex interactive tasks.
Community
Think in Games reforms RL for decision-making by using LLMs to generate language-guided policies refined through online RL, enabling procedural reasoning with less data and interpretable explanations.
Interesting work, but in my experience you don’t necessarily need a game-based RL framework to bridge declarative and procedural knowledge. I usually just teach the agent to identify recursion. Once it can recognize recursion, it can emulate it, and from there apply it functionally. That single recursive scaffold often achieves the same procedural grounding with less overhead.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Omni-Think: Scaling Cross-Domain Generalization in LLMs via Multi-Task RL with Hybrid Rewards (2025)
- DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding (2025)
- CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards (2025)
- A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning (2025)
- Xiangqi-R1: Enhancing Spatial Strategic Reasoning in LLMs for Chinese Chess via Reinforcement Learning (2025)
- VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning (2025)
- Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper