File size: 2,658 Bytes
263af70
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86ffb20
263af70
0233854
263af70
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
The Open RL Leaderboard is a community-driven benchmark for reinforcement learning models.

## 🔌 How to have your agent evaluated?

The Open RL Leaderboard constantly scans the 🤗 Hub to detect new models to be evaluated. For your model to be evaluated, it must meet the following criteria.

1. The model must be public on the 🤗 Hub
2. The model must contain an `agent.pt` file.
3. The model must be [tagged](https://huggingface.co/docs/hub/model-cards#model-cards) `reinforcement-learning`
4. The model must be [tagged](https://huggingface.co/docs/hub/model-cards#model-cards) with the name of the environment you want to evaluate (for example `MountainCar-v0`)

Once your model meets these criteria, it will be automatically evaluated on the Open RL Leaderboard. It usually takes a few minutes for the evaluation to be completed.
That's it!

## 🕵 How are the models evaluated?

The evaluation is done by running the agent on the environment for 50 episodes. You can get the raw evaluation scores in the [Leaderboard dataset](https://huggingface.co/datasets/open-rl-leaderboard/results).

For further information, please refer to the [Open RL Leaderboard evaluation script](https://huggingface.co/spaces/open-rl-leaderboard/leaderboard/blob/main/src/evaluation.py).

### The particular case of Atari environments

Atari environments are evaluated on the `NoFrameskip-v4` version of the environment. For example, to evaluate an agent on the `Pong` environment, you must tag your model with `PongNoFrameskip-v4`. The environment is then wrapped to match the standard Atari preprocessing pipeline.

- No-op reset with a maximum of 30 no-ops
- Max and skip with a skip of 4
- Episodic life (although the reported score is for the full episode, not the life)
- Fire reset
- Clip reward (although the reported score is not clipped)
- Resize observation to 84x84
- Grayscale observation
- Frame stack of 4

## 🚑 Troubleshooting

If you encounter any issue, please [open an issue](https://huggingface.co/spaces/open-rl-leaderboard/leaderboard/discussions/new) on the Open RL Leaderboard repository.

## 🏃 Next steps

We are working on adding more environments and metrics to the Open RL Leaderboard.
If you have any suggestions, please [open an discussion](https://huggingface.co/spaces/open-rl-leaderboard/leaderboard/discussions/new) on the Open RL Leaderboard repository.

## 📜 Citation

```bibtex
@misc{open-rl-leaderboard,
  author = {Quentin Gallouédec and TODO},
  title = {Open RL Leaderboard},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = "\url{https://huggingface.co/spaces/open-rl-leaderboard/leaderboard}",
}
```