Spaces:
Runtime error
Runtime error
File size: 2,658 Bytes
263af70 86ffb20 263af70 0233854 263af70 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
The Open RL Leaderboard is a community-driven benchmark for reinforcement learning models.
## 🔌 How to have your agent evaluated?
The Open RL Leaderboard constantly scans the 🤗 Hub to detect new models to be evaluated. For your model to be evaluated, it must meet the following criteria.
1. The model must be public on the 🤗 Hub
2. The model must contain an `agent.pt` file.
3. The model must be [tagged](https://huggingface.co/docs/hub/model-cards#model-cards) `reinforcement-learning`
4. The model must be [tagged](https://huggingface.co/docs/hub/model-cards#model-cards) with the name of the environment you want to evaluate (for example `MountainCar-v0`)
Once your model meets these criteria, it will be automatically evaluated on the Open RL Leaderboard. It usually takes a few minutes for the evaluation to be completed.
That's it!
## 🕵 How are the models evaluated?
The evaluation is done by running the agent on the environment for 50 episodes. You can get the raw evaluation scores in the [Leaderboard dataset](https://huggingface.co/datasets/open-rl-leaderboard/results).
For further information, please refer to the [Open RL Leaderboard evaluation script](https://huggingface.co/spaces/open-rl-leaderboard/leaderboard/blob/main/src/evaluation.py).
### The particular case of Atari environments
Atari environments are evaluated on the `NoFrameskip-v4` version of the environment. For example, to evaluate an agent on the `Pong` environment, you must tag your model with `PongNoFrameskip-v4`. The environment is then wrapped to match the standard Atari preprocessing pipeline.
- No-op reset with a maximum of 30 no-ops
- Max and skip with a skip of 4
- Episodic life (although the reported score is for the full episode, not the life)
- Fire reset
- Clip reward (although the reported score is not clipped)
- Resize observation to 84x84
- Grayscale observation
- Frame stack of 4
## 🚑 Troubleshooting
If you encounter any issue, please [open an issue](https://huggingface.co/spaces/open-rl-leaderboard/leaderboard/discussions/new) on the Open RL Leaderboard repository.
## 🏃 Next steps
We are working on adding more environments and metrics to the Open RL Leaderboard.
If you have any suggestions, please [open an discussion](https://huggingface.co/spaces/open-rl-leaderboard/leaderboard/discussions/new) on the Open RL Leaderboard repository.
## 📜 Citation
```bibtex
@misc{open-rl-leaderboard,
author = {Quentin Gallouédec and TODO},
title = {Open RL Leaderboard},
year = {2024},
publisher = {Hugging Face},
howpublished = "\url{https://huggingface.co/spaces/open-rl-leaderboard/leaderboard}",
}
```
|