--- title: Vibe Check Translations emoji: 📈 colorFrom: gray colorTo: green sdk: gradio sdk_version: 5.38.2 app_file: app.py pinned: false short_description: A/B test translations --- # Translation A/B Testing App A Gradio app for comparing translation quality between different model configurations through A/B testing. ## Features - **Language Selection**: Choose from available languages in the S3 bucket - **Side-by-Side Comparison**: Compare translations from "few-shots" vs "no-few-shots" configurations - **Randomized Presentation**: The order of configurations is randomized to avoid bias - **Progress Tracking**: Shows current progress through the dataset - **Results Summary**: Displays final vote counts and percentages ## Setup 1. Install dependencies: ```bash pip install -r requirements.txt ``` 2. Configure AWS credentials (for S3 access): ```bash export AWS_ACCESS_KEY_ID=your_key export AWS_SECRET_ACCESS_KEY=your_secret # or use AWS CLI: aws configure ``` 3. Run the app: ```bash python app.py ``` The app will be available at `http://localhost:7860` ## Usage 1. **Select Language**: Choose a language from the dropdown menu 2. **Load Data**: Click "Load Data" to fetch translation pairs from S3 3. **Compare Translations**: - Original text is shown at the top - Two translations (A and B) are shown side by side - Click "Choose Left" or "Choose Right" to select the better translation 4. **View Results**: After all comparisons, see the final vote counts ## Data Source The app loads translation data from `s3://fineweb-multilingual-v1/experiments/translations/vibe-checks/` with the following structure: - `{language}_Latn/few-shots.jsonl` - Translations with few-shot examples - `{language}_Latn/no-few-shots.jsonl` - Translations without few-shot examples Each JSONL file contains documents with: - `text`: Original text to translate - `id`: Unique document identifier - `inference_results`: Array with translation results