Add missing metadata (license, library_name, pipeline_tag) (#1)

Browse files

- Add missing metadata (license, library_name, pipeline_tag) (92fe08eba28d763874b2eed0e68addf0203e14d8)

Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show

README.md +7 -8

README.md CHANGED Viewed

@@ -1,6 +1,9 @@
 ---
-{}
 ---
 <h1 align="center">
 <em>AReaL</em>: Ant Reasoning Reinforcement Learning for LLMs
 </h1>
@@ -23,7 +26,7 @@ AReaL (Ant Reasoning RL) is an open-source **fully asynchronous reinforcement le
 **[2025/06/03] (v0.3, boba²)** We release **boba²** (double-boba) for fully asynchronous RL training, which achieves a **2.77x speedup while obtaining on-par or even better training performance** compared to synchronous systems. Moreover, asynchronous RL makes it extremely easy to set up multi-turn agentic RL training! Check out [our v0.3 overview blog](/blog/AReaL_v0_3.md) and the [research paper](https://arxiv.org/pdf/2505.24298).
-**[2025/03/31] (v0.2, Boba)** Here comes our next milestone release - Boba! Please call it A-ReaL-Boba! This release includes much faster training with SGLang support and SOTA 7B and 32B models on math reasoning. Check our [v0.2 technical blog](/blog/AReaL_v0_2.md).
 **[2025/02/24] (v0.1)** Our initial release includes reproducible results for 1.5B and 7B LRMs. Check our [v0.1 technical blog](/blog/AReaL_v0_1.md).
@@ -92,7 +95,7 @@ We highlight the [tutorials](https://inclusionai.github.io/AReaL/customization/d
 + [Streaming generation and reward computation](https://inclusionai.github.io/AReaL/developer/rollout/rollout_worker.html)
 + [Interruptible rollout](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
 + [Data staleness control with the rollout controller](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
-+ [The adoption of decoupled PPO loss](https://inclusionai.github.io/AReaL/customization/algorithm.html)
 ### RL Training for Multi-turn Agent
@@ -100,12 +103,8 @@ AReaL-boba² allows you to independently customize the [dataset](https://inclusi
 In particular, we show a simple example to develop a multi-turn math agent for RL training. Please see the learning curve below and reference the [step-by-step guide](https://inclusionai.github.io/AReaL/customization/agent.html) if you want to implement your own agentic RL project.
-**Multi-turn Agent Learning Curve**
 ## Getting Started
-### Quick Start
 Train Qwen3 1.7B locally:
 ```bash
@@ -214,4 +213,4 @@ We also appreciate all the pioneering works from the community, particularly the
       primaryClass={cs.LG},
       url={https://arxiv.org/abs/2505.24298},
 }
-```

 ---
+license: apache-2.0
+library_name: transformers
+pipeline_tag: text-generation
 ---
 <h1 align="center">
 <em>AReaL</em>: Ant Reasoning Reinforcement Learning for LLMs
 </h1>
 **[2025/06/03] (v0.3, boba²)** We release **boba²** (double-boba) for fully asynchronous RL training, which achieves a **2.77x speedup while obtaining on-par or even better training performance** compared to synchronous systems. Moreover, asynchronous RL makes it extremely easy to set up multi-turn agentic RL training! Check out [our v0.3 overview blog](/blog/AReaL_v0_3.md) and the [research paper](https://arxiv.org/pdf/2505.24298).
+**[2025/03/31] (v0.2, Boba)** Here comes our next milestone release - Boba! Please call it A-ReaL-boba! This release includes much faster training with SGLang support and SOTA 7B and 32B models on math reasoning. Check our [v0.2 technical blog](/blog/AReaL_v0_2.md).
 **[2025/02/24] (v0.1)** Our initial release includes reproducible results for 1.5B and 7B LRMs. Check our [v0.1 technical blog](/blog/AReaL_v0_1.md).
 + [Streaming generation and reward computation](https://inclusionai.github.io/AReaL/developer/rollout/rollout_worker.html)
 + [Interruptible rollout](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
 + [Data staleness control with the rollout controller](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
++ [The adoption of decoupled PPO loss](https://inclusionai.github.io/AReaL/customization/algorithm.html#grouped-advantage-normalization)
 ### RL Training for Multi-turn Agent
 In particular, we show a simple example to develop a multi-turn math agent for RL training. Please see the learning curve below and reference the [step-by-step guide](https://inclusionai.github.io/AReaL/customization/agent.html) if you want to implement your own agentic RL project.
 ## Getting Started
 Train Qwen3 1.7B locally:
 ```bash
       primaryClass={cs.LG},
       url={https://arxiv.org/abs/2505.24298},
 }
+```