A newer version of the Gradio SDK is available:
5.44.1
metadata
title: Vibe Check Translations
emoji: π
colorFrom: gray
colorTo: green
sdk: gradio
sdk_version: 5.38.2
app_file: app.py
pinned: false
short_description: A/B test translations
Translation A/B Testing App
A Gradio app for comparing translation quality between different model configurations through A/B testing.
Features
- Language Selection: Choose from available languages in the S3 bucket
- Side-by-Side Comparison: Compare translations from "few-shots" vs "no-few-shots" configurations
- Randomized Presentation: The order of configurations is randomized to avoid bias
- Progress Tracking: Shows current progress through the dataset
- Results Summary: Displays final vote counts and percentages
Setup
- Install dependencies:
pip install -r requirements.txt
- Configure AWS credentials (for S3 access):
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
# or use AWS CLI: aws configure
- Run the app:
python app.py
The app will be available at http://localhost:7860
Usage
- Select Language: Choose a language from the dropdown menu
- Load Data: Click "Load Data" to fetch translation pairs from S3
- Compare Translations:
- Original text is shown at the top
- Two translations (A and B) are shown side by side
- Click "Choose Left" or "Choose Right" to select the better translation
- View Results: After all comparisons, see the final vote counts
Data Source
The app loads translation data from s3://fineweb-multilingual-v1/experiments/translations/vibe-checks/
with the following structure:
{language}_Latn/few-shots.jsonl
- Translations with few-shot examples{language}_Latn/no-few-shots.jsonl
- Translations without few-shot examples
Each JSONL file contains documents with:
text
: Original text to translateid
: Unique document identifierinference_results
: Array with translation results