Spaces:
Sleeping
Sleeping
| title: LLM-Perf Leaderboard | |
| emoji: πποΈ | |
| colorFrom: green | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: 5.1.0 | |
| app_file: app.py | |
| pinned: true | |
| license: apache-2.0 | |
| tags: [llm perf leaderboard, llm performance leaderboard, llm, performance, leaderboard] | |
| # LLM-perf leaderboard | |
| ## π About | |
| The π€ LLM-Perf Leaderboard ποΈ is a laderboard at the intersection of quality and performance. | |
| Its aim is to benchmark the performance (latency, throughput, memory & energy) | |
| of Large Language Models (LLMs) with different hardwares, backends and optimizations | |
| using [Optimum-Benhcmark](https://github.com/huggingface/optimum-benchmark). | |
| Anyone from the community can request a new base model or hardware/backend/optimization | |
| configuration for automated benchmarking: | |
| - Model evaluation requests should be made in the | |
| [π€ Open LLM Leaderboard π ](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) ; | |
| we scrape the [list of canonical base models](https://github.com/huggingface/optimum-benchmark/blob/main/llm_perf/utils.py) from there. | |
| - Hardware/Backend/Optimization configuration requests should be made in the | |
| [π€ LLM-Perf Leaderboard ποΈ](https://huggingface.co/spaces/optimum/llm-perf-leaderboard) or | |
| [Optimum-Benhcmark](https://github.com/huggingface/optimum-benchmark) repository (where the code is hosted). | |
| ## βοΈ Details | |
| - To avoid communication-dependent results, only one GPU is used. | |
| - Score is the average evaluation score obtained from the [π€ Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) | |
| - LLMs are running on a singleton batch with a prompt size of 256 and generating a 64 tokens for at least 10 iterations and 10 seconds. | |
| - Energy consumption is measured in kWh using CodeCarbon and taking into consideration the GPU, CPU, RAM and location of the machine. | |
| - We measure three types of memory: Max Allocated Memory, Max Reserved Memory and Max Used Memory. The first two being reported by PyTorch and the last one being observed using PyNVML. | |
| All of our benchmarks are ran by this single script | |
| [benchmark_cuda_pytorch.py](https://github.com/huggingface/optimum-benchmark/blob/llm-perf/llm-perf/benchmark_cuda_pytorch.py) | |
| using the power of [Optimum-Benhcmark](https://github.com/huggingface/optimum-benchmark) to garantee reproducibility and consistency. | |
| ## π How to run locally | |
| To run the LLM-Perf Leaderboard locally on your machine, follow these steps: | |
| ### 1. Clone the Repository | |
| First, clone the repository to your local machine: | |
| ```bash | |
| git clone https://huggingface.co/spaces/optimum/llm-perf-leaderboard | |
| cd llm-perf-leaderboard | |
| ``` | |
| ### 2. Install the Required Dependencies | |
| Install the necessary Python packages listed in the requirements.txt file: | |
| `pip install -r requirements.txt` | |
| ### 3. Run the Application | |
| You can run the Gradio application in one of the following ways: | |
| - Option 1: Using Python | |
| `python app.py` | |
| - Option 2: Using Gradio CLI (include hot-reload) | |
| `gradio app.py` | |
| ### 4. Access the Application | |
| Once the application is running, you can access it locally in your web browser at http://127.0.0.1:7860/ |