leaderboard

Runtime error

App Files Files Community

leaderboard / texts /about.md

qgallouedec HF staff

typo

0233854 11 months ago

preview code

raw

history blame contribute delete

2.66 kB

	The Open RL Leaderboard is a community-driven benchmark for reinforcement learning models.

	## 🔌 How to have your agent evaluated?

	The Open RL Leaderboard constantly scans the 🤗 Hub to detect new models to be evaluated. For your model to be evaluated, it must meet the following criteria.

	1. The model must be public on the 🤗 Hub
	2. The model must contain an `agent.pt` file.
	3. The model must be [tagged](https://huggingface.co/docs/hub/model-cards#model-cards) `reinforcement-learning`
	4. The model must be [tagged](https://huggingface.co/docs/hub/model-cards#model-cards) with the name of the environment you want to evaluate (for example `MountainCar-v0`)

	Once your model meets these criteria, it will be automatically evaluated on the Open RL Leaderboard. It usually takes a few minutes for the evaluation to be completed.
	That's it!

	## 🕵 How are the models evaluated?

	The evaluation is done by running the agent on the environment for 50 episodes. You can get the raw evaluation scores in the [Leaderboard dataset](https://huggingface.co/datasets/open-rl-leaderboard/results).

	For further information, please refer to the [Open RL Leaderboard evaluation script](https://huggingface.co/spaces/open-rl-leaderboard/leaderboard/blob/main/src/evaluation.py).

	### The particular case of Atari environments

	Atari environments are evaluated on the `NoFrameskip-v4` version of the environment. For example, to evaluate an agent on the `Pong` environment, you must tag your model with `PongNoFrameskip-v4`. The environment is then wrapped to match the standard Atari preprocessing pipeline.

	- No-op reset with a maximum of 30 no-ops
	- Max and skip with a skip of 4
	- Episodic life (although the reported score is for the full episode, not the life)
	- Fire reset
	- Clip reward (although the reported score is not clipped)
	- Resize observation to 84x84
	- Grayscale observation
	- Frame stack of 4

	## 🚑 Troubleshooting

	If you encounter any issue, please [open an issue](https://huggingface.co/spaces/open-rl-leaderboard/leaderboard/discussions/new) on the Open RL Leaderboard repository.

	## 🏃 Next steps

	We are working on adding more environments and metrics to the Open RL Leaderboard.
	If you have any suggestions, please [open an discussion](https://huggingface.co/spaces/open-rl-leaderboard/leaderboard/discussions/new) on the Open RL Leaderboard repository.

	## 📜 Citation

	```bibtex
	@misc{open-rl-leaderboard,
	author = {Quentin Gallouédec and TODO},
	title = {Open RL Leaderboard},
	year = {2024},
	publisher = {Hugging Face},
	howpublished = "\url{https://huggingface.co/spaces/open-rl-leaderboard/leaderboard}",
	}
	```