Spaces:

argmaxinc
/

whisperkit-benchmarks

Running

App Files Files Community

whisperkit-benchmarks / README.md

ardaatahan

initial commit

1543414 about 1 year ago

preview code

raw

history blame

2.88 kB

	---
	title: WhisperKit Benchmarks
	emoji: 🏆
	colorFrom: green
	colorTo: indigo
	sdk: gradio
	app_file: main.py
	license: mit
	---

	## Prerequisites

	Ensure you have the following software installed:

	- Python 3.10 or higher
	- pip (Python package installer)

	## Installation

	1. Clone the repository:

	```sh
	git clone https://github.com/argmaxinc/model-performance-dashboard.git
	cd model-performance-dashboard
	```

	2. Create a virtual environment:

	```sh
	python -m venv venv
	source venv/bin/activate
	```

	3. Install required packages:
	```sh
	pip install -r requirements.txt
	```

	## Usage

	1. Run the application:

	```sh
	gradio main.py
	```

	2. Access the application:
	After running main.py, a local server will start, and you will see an interface URL in the terminal. Open the URL in your web browser to interact with Argmax Benchmark dashboard.

	## Data Generation

	The data generation process involves three main scripts: performance_generate.py, multilingual_generate.py, and quality_generate.py. Each script is responsible for updating a specific aspect of the benchmark data.

	1. Performance Data Update (performance_generate.py):

	- Downloads benchmark data from [WhisperKit Evals Dataset](https://huggingface.co/datasets/argmaxinc/whisperkit-evals-dataset).
	- Processes the data to extract performance metrics for various models, devices, and operating systems.
	- Calculates metrics such as speed, tokens per second for long and short-form data.
	- Saves the results in `performance_data.json` and `support_data.csv`.

	2. Multilingual Data Update (multilingual_generate.py):

	- Downloads multilingual evaluation data from [WhisperKit Multilingual Evals Dataset](https://huggingface.co/datasets/argmaxinc/whisperkit-evals-multilingual).
	- Processes the data to generate confusion matrices for language detection.
	- Calculates metrics for both forced and unforced language detection scenarios.
	- Saves the results in `multilingual_confusion_matrices.json` and `multilingual_results.csv`.

	3. Quality Data Update (quality_generate.py):
	- Downloads quality evaluation data from [WhisperKit Evals](https://huggingface.co/datasets/argmaxinc/whisperkit-evals).
	- Processes the data to calculate Word Error Rate (WER) and Quality of Inference (QoI) metrics for each dataset.
	- Saves the results in `quality_data.json`.

	## Data Update

	To update the dashboard with latest data from our HuggingFace datasets, run:

	```sh
	make use-huggingface-data
	```

	Alternatively, you can use our on-device testing code [TODO:INSERT_LINK_TO_OS_TEST_CODE] on your device to update the dashboard with your own data. After generating the Xcode data, place the resulting `.json` files in the `whisperkit-evals/xcresults/benchmark_data` directory, then run:

	```sh
	make use-local-data
	```