File size: 895 Bytes
5790299
 
 
 
 
 
 
 
 
 
 
 
07c9dc6
5790299
07c9dc6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---
title: PoCLeaderboard
emoji: 🏆
colorFrom: green
colorTo: pink
sdk: gradio
sdk_version: 5.4.0
app_file: app.py
pinned: false
license: mit
short_description: Example Leaderboard
---
This Space provides an interactive leaderboard for comparing language model performance across various benchmarks and custom tasks.

## Features
- Automated model evaluation using lm-evaluation-harness
- Support for standard and custom benchmarks
- Interactive visualization of results
- Daily automated evaluations
- Easy submission of new models and custom tasks

## Usage
1. Visit the Space to view current leaderboard
2. Submit new models for evaluation
3. Create custom evaluation tasks
4. Track performance trends over time

## Custom Task Format
```json
{
  "examples": [
    {
      "input": "question or prompt",
      "ideal": "expected answer",
      "metrics": ["accuracy", "f1"]
    }
  ]
}
```