--- title: English Accent Detector emoji: 🎤 colorFrom: blue colorTo: purple sdk: streamlit sdk_version: "1.28.0" app_file: app.py pinned: false --- # English Accent Detection Tool A practical AI tool that analyzes English accents from video content. Built for REM Waste's hiring automation system. ## 🚀 Live Demo **Deployed App:** [https://accent-detector.streamlit.app](https://accent-detector.streamlit.app) ## Features - **Video Processing**: Accepts public video URLs (MP4, Loom, etc.) - **Audio Extraction**: Automatically extracts audio from video files - **Speech Transcription**: Converts speech to text using Google Speech Recognition - **Accent Analysis**: Detects English accents with confidence scoring - **Web Interface**: Simple Streamlit UI for easy testing ## Supported Accents - American English - British English - Australian English - Canadian English - South African English ## Quick Start ### Method 1: Use the Deployed App (Recommended) 1. Visit: [https://accent-detector.streamlit.app](https://accent-detector.streamlit.app) 2. Paste a public video URL 3. Click "Analyze Accent" 4. View results with confidence scores ### Method 2: Local Installation ```bash # Clone or download the script git clone cd accent-detector # Install dependencies pip install -r requirements.txt # Install ffmpeg (required for video processing) # On macOS: brew install ffmpeg # On Ubuntu/Debian: sudo apt update && sudo apt install ffmpeg # On Windows: # Download from https://ffmpeg.org/download.html # Run the app streamlit run accent_detector.py ``` ## Installation 1. Clone this repository and navigate to the project folder. 2. (Recommended) Create and activate a Python virtual environment: ```sh python3 -m venv ad_venv source ad_venv/bin/activate ``` 3. Install all dependencies: ```sh pip install -r requirements.txt ``` 4. (Optional, but recommended for better performance) Install Watchdog: ```sh xcode-select --install # macOS only, for build tools pip install watchdog ``` ## Usage Examples ### Test URLs ``` # Direct MP4 link https://sample-videos.com/zip/10/mp4/SampleVideo_1280x720_1mb.mp4 # Loom video (public) https://www.loom.com/share/your-video-id # Google Drive (public) https://drive.google.com/file/d/your-file-id/view ``` ### Expected Output ```json { "accent": "American", "confidence": 78.5, "explanation": "High confidence in American accent with strong linguistic indicators.", "all_scores": { "American": 78.5, "British": 23.1, "Australian": 15.7, "Canadian": 19.2, "South African": 8.3 } } ``` ## Technical Architecture ### Core Components 1. **Video Downloader**: Downloads videos from public URLs 2. **Audio Extractor**: Uses ffmpeg to extract WAV audio 3. **Speech Recognizer**: Google Speech Recognition API 4. **Accent Analyzer**: Pattern matching for linguistic markers 5. **Web Interface**: Streamlit-based UI ### Accent Detection Algorithm The system analyzes multiple linguistic features: - **Vocabulary Patterns**: Accent-specific word choices - **Phonetic Markers**: Pronunciation characteristics - **Spelling Patterns**: Regional spelling differences - **Linguistic Markers**: Characteristic phrases and expressions ### Confidence Scoring - **0-20%**: Insufficient markers detected - **21-50%**: Moderate confidence with limited indicators - **51-75%**: Good confidence with multiple patterns - **76-100%**: High confidence with strong linguistic evidence ## API Integration For programmatic access, use the core `AccentDetector` class: ```python from accent_detector import AccentDetector detector = AccentDetector() result = detector.process_video("https://your-video-url.com/video.mp4") print(f"Accent: {result['accent']}") print(f"Confidence: {result['confidence']}%") ``` ## Deployment ### Streamlit Cloud (Recommended) 1. Fork this repository 2. Connect to Streamlit Cloud 3. Deploy from your GitHub repo 4. Share the public URL ### Docker Deployment ```dockerfile FROM python:3.9-slim # Install system dependencies RUN apt-get update && apt-get install -y ffmpeg WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . EXPOSE 8501 CMD ["streamlit", "run", "accent_detector.py", "--server.port=8501", "--server.address=0.0.0.0"] ``` ## Limitations & Considerations ### Current Limitations - Requires clear speech audio (background noise affects accuracy) - Works best with 30+ seconds of speech - Free Google Speech Recognition has daily limits - Accent detection based on vocabulary/patterns, not phonetic analysis ### Potential Improvements - Integrate phonetic analysis libraries - Add more accent varieties (Indian, Irish, etc.) - Implement batch processing for multiple videos - Add voice activity detection for better audio segmentation ## Testing ### Manual Testing 1. Test with different accent samples 2. Verify confidence scores are reasonable 3. Check error handling with invalid URLs 4. Test with various video formats ### Automated Testing ```python def test_accent_detection(): detector = AccentDetector() # Test American accent american_text = "I'm gonna grab some cookies from the elevator" scores = detector.analyze_accent_patterns(american_text) assert scores['American'] > scores['British'] # Test British accent british_text = "That's brilliant, quite lovely indeed" scores = detector.analyze_accent_patterns(british_text) assert scores['British'] > scores['American'] ``` ## Performance Metrics - **Video Download**: ~10-30 seconds (depends on file size) - **Audio Extraction**: ~5-15 seconds - **Speech Recognition**: ~10-30 seconds - **Accent Analysis**: <1 second - **Total Processing**: ~30-90 seconds per video ## Troubleshooting ### Common Issues **Error: "Could not understand the audio"** - Solution: Ensure clear speech, minimal background noise **Error: "Failed to download video"** - Solution: Verify URL is public and accessible **Error: "ffmpeg not found"** - Solution: Install ffmpeg system dependency **Low confidence scores** - Solution: Ensure longer speech samples (30+ seconds) ### Support For technical issues or feature requests: 1. Check the error messages in the Streamlit interface 2. Verify all dependencies are installed correctly 3. Test with known working video URLs ## License MIT License - Free for commercial and personal use. --- **Built for REM Waste Interview Challenge** *Practical AI tools for automated hiring decisions*