Spaces:
Running
Running
| title: WhisperKit Benchmarks | |
| emoji: π | |
| colorFrom: green | |
| colorTo: indigo | |
| sdk: gradio | |
| app_file: main.py | |
| license: mit | |
| ## Prerequisites | |
| Ensure you have the following software installed: | |
| - Python 3.10 or higher | |
| - pip (Python package installer) | |
| ## Installation | |
| 1. **Clone the repository**: | |
| ```sh | |
| git clone https://github.com/argmaxinc/model-performance-dashboard.git | |
| cd model-performance-dashboard | |
| ``` | |
| 2. **Create a virtual environment**: | |
| ```sh | |
| python -m venv venv | |
| source venv/bin/activate | |
| ``` | |
| 3. **Install required packages**: | |
| ```sh | |
| pip install -r requirements.txt | |
| ``` | |
| ## Usage | |
| 1. **Run the application**: | |
| ```sh | |
| gradio main.py | |
| ``` | |
| 2. **Access the application**: | |
| After running main.py, a local server will start, and you will see an interface URL in the terminal. Open the URL in your web browser to interact with Argmax Benchmark dashboard. | |
| ## Data Generation | |
| The data generation process involves three main scripts: performance_generate.py, multilingual_generate.py, and quality_generate.py. Each script is responsible for updating a specific aspect of the benchmark data. | |
| 1. **Performance Data Update (performance_generate.py)**: | |
| - Downloads benchmark data from [WhisperKit Evals Dataset](https://huggingface.co/datasets/argmaxinc/whisperkit-evals-dataset). | |
| - Processes the data to extract performance metrics for various models, devices, and operating systems. | |
| - Calculates metrics such as speed, tokens per second for long and short-form data. | |
| - Saves the results in `performance_data.json` and `support_data.csv`. | |
| 2. **Multilingual Data Update (multilingual_generate.py)**: | |
| - Downloads multilingual evaluation data from [WhisperKit Multilingual Evals Dataset](https://huggingface.co/datasets/argmaxinc/whisperkit-evals-multilingual). | |
| - Processes the data to generate confusion matrices for language detection. | |
| - Calculates metrics for both forced and unforced language detection scenarios. | |
| - Saves the results in `multilingual_confusion_matrices.json` and `multilingual_results.csv`. | |
| 3. **Quality Data Update (quality_generate.py)**: | |
| - Downloads quality evaluation data from [WhisperKit Evals](https://huggingface.co/datasets/argmaxinc/whisperkit-evals). | |
| - Processes the data to calculate Word Error Rate (WER) and Quality of Inference (QoI) metrics for each dataset. | |
| - Saves the results in `quality_data.json`. | |
| ## Data Update | |
| To update the dashboard with latest data from our HuggingFace datasets, run: | |
| ```sh | |
| make use-huggingface-data | |
| ``` | |
| Alternatively, you can use our on-device testing code [TODO:INSERT_LINK_TO_OS_TEST_CODE] on your device to update the dashboard with your own data. After generating the Xcode data, place the resulting `.json` files in the `whisperkit-evals/xcresults/benchmark_data` directory, then run: | |
| ```sh | |
| make use-local-data | |
| ``` | |