Add missing metadata (license, library_name, pipeline_tag) (#1)
Browse files- Add missing metadata (license, library_name, pipeline_tag) (92fe08eba28d763874b2eed0e68addf0203e14d8)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
|
@@ -1,6 +1,9 @@
|
|
| 1 |
---
|
| 2 |
-
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
| 4 |
<h1 align="center">
|
| 5 |
<em>AReaL</em>: Ant Reasoning Reinforcement Learning for LLMs
|
| 6 |
</h1>
|
|
@@ -23,7 +26,7 @@ AReaL (Ant Reasoning RL) is an open-source **fully asynchronous reinforcement le
|
|
| 23 |
|
| 24 |
**[2025/06/03] (v0.3, boba²)** We release **boba²** (double-boba) for fully asynchronous RL training, which achieves a **2.77x speedup while obtaining on-par or even better training performance** compared to synchronous systems. Moreover, asynchronous RL makes it extremely easy to set up multi-turn agentic RL training! Check out [our v0.3 overview blog](/blog/AReaL_v0_3.md) and the [research paper](https://arxiv.org/pdf/2505.24298).
|
| 25 |
|
| 26 |
-
**[2025/03/31] (v0.2, Boba)** Here comes our next milestone release - Boba! Please call it A-ReaL-
|
| 27 |
|
| 28 |
**[2025/02/24] (v0.1)** Our initial release includes reproducible results for 1.5B and 7B LRMs. Check our [v0.1 technical blog](/blog/AReaL_v0_1.md).
|
| 29 |
|
|
@@ -92,7 +95,7 @@ We highlight the [tutorials](https://inclusionai.github.io/AReaL/customization/d
|
|
| 92 |
+ [Streaming generation and reward computation](https://inclusionai.github.io/AReaL/developer/rollout/rollout_worker.html)
|
| 93 |
+ [Interruptible rollout](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
|
| 94 |
+ [Data staleness control with the rollout controller](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
|
| 95 |
-
+ [The adoption of decoupled PPO loss](https://inclusionai.github.io/AReaL/customization/algorithm.html)
|
| 96 |
|
| 97 |
### RL Training for Multi-turn Agent
|
| 98 |
|
|
@@ -100,12 +103,8 @@ AReaL-boba² allows you to independently customize the [dataset](https://inclusi
|
|
| 100 |
|
| 101 |
In particular, we show a simple example to develop a multi-turn math agent for RL training. Please see the learning curve below and reference the [step-by-step guide](https://inclusionai.github.io/AReaL/customization/agent.html) if you want to implement your own agentic RL project.
|
| 102 |
|
| 103 |
-
**Multi-turn Agent Learning Curve**
|
| 104 |
-
|
| 105 |
## Getting Started
|
| 106 |
|
| 107 |
-
### Quick Start
|
| 108 |
-
|
| 109 |
Train Qwen3 1.7B locally:
|
| 110 |
|
| 111 |
```bash
|
|
@@ -214,4 +213,4 @@ We also appreciate all the pioneering works from the community, particularly the
|
|
| 214 |
primaryClass={cs.LG},
|
| 215 |
url={https://arxiv.org/abs/2505.24298},
|
| 216 |
}
|
| 217 |
-
```
|
|
|
|
| 1 |
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
library_name: transformers
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
---
|
| 6 |
+
|
| 7 |
<h1 align="center">
|
| 8 |
<em>AReaL</em>: Ant Reasoning Reinforcement Learning for LLMs
|
| 9 |
</h1>
|
|
|
|
| 26 |
|
| 27 |
**[2025/06/03] (v0.3, boba²)** We release **boba²** (double-boba) for fully asynchronous RL training, which achieves a **2.77x speedup while obtaining on-par or even better training performance** compared to synchronous systems. Moreover, asynchronous RL makes it extremely easy to set up multi-turn agentic RL training! Check out [our v0.3 overview blog](/blog/AReaL_v0_3.md) and the [research paper](https://arxiv.org/pdf/2505.24298).
|
| 28 |
|
| 29 |
+
**[2025/03/31] (v0.2, Boba)** Here comes our next milestone release - Boba! Please call it A-ReaL-boba! This release includes much faster training with SGLang support and SOTA 7B and 32B models on math reasoning. Check our [v0.2 technical blog](/blog/AReaL_v0_2.md).
|
| 30 |
|
| 31 |
**[2025/02/24] (v0.1)** Our initial release includes reproducible results for 1.5B and 7B LRMs. Check our [v0.1 technical blog](/blog/AReaL_v0_1.md).
|
| 32 |
|
|
|
|
| 95 |
+ [Streaming generation and reward computation](https://inclusionai.github.io/AReaL/developer/rollout/rollout_worker.html)
|
| 96 |
+ [Interruptible rollout](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
|
| 97 |
+ [Data staleness control with the rollout controller](https://inclusionai.github.io/AReaL/developer/rollout/gserver.html)
|
| 98 |
+
+ [The adoption of decoupled PPO loss](https://inclusionai.github.io/AReaL/customization/algorithm.html#grouped-advantage-normalization)
|
| 99 |
|
| 100 |
### RL Training for Multi-turn Agent
|
| 101 |
|
|
|
|
| 103 |
|
| 104 |
In particular, we show a simple example to develop a multi-turn math agent for RL training. Please see the learning curve below and reference the [step-by-step guide](https://inclusionai.github.io/AReaL/customization/agent.html) if you want to implement your own agentic RL project.
|
| 105 |
|
|
|
|
|
|
|
| 106 |
## Getting Started
|
| 107 |
|
|
|
|
|
|
|
| 108 |
Train Qwen3 1.7B locally:
|
| 109 |
|
| 110 |
```bash
|
|
|
|
| 213 |
primaryClass={cs.LG},
|
| 214 |
url={https://arxiv.org/abs/2505.24298},
|
| 215 |
}
|
| 216 |
+
```
|