metadata

title: English Accent Detector
emoji: 🎤
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.28.0
app_file: app.py
pinned: false

English Accent Detection Tool

A practical AI tool that analyzes English accents from video content. Built for REM Waste's hiring automation system.

🚀 Live Demo

Deployed App: https://accent-detector.streamlit.app

Features

Video Processing: Accepts public video URLs (MP4, Loom, etc.)
Audio Extraction: Automatically extracts audio from video files
Speech Transcription: Converts speech to text using Google Speech Recognition
Accent Analysis: Detects English accents with confidence scoring
Web Interface: Simple Streamlit UI for easy testing

Supported Accents

American English
British English
Australian English
Canadian English
South African English

Quick Start

Method 1: Use the Deployed App (Recommended)

Visit: https://accent-detector.streamlit.app
Paste a public video URL
Click "Analyze Accent"
View results with confidence scores

Method 2: Local Installation

# Clone or download the script
git clone <repository-url>
cd accent-detector

# Install dependencies
pip install -r requirements.txt

# Install ffmpeg (required for video processing)
# On macOS:
brew install ffmpeg

# On Ubuntu/Debian:
sudo apt update && sudo apt install ffmpeg

# On Windows:
# Download from https://ffmpeg.org/download.html

# Run the app
streamlit run accent_detector.py

Installation

Clone this repository and navigate to the project folder.
(Recommended) Create and activate a Python virtual environment:
```
python3 -m venv ad_venv
source ad_venv/bin/activate
```
Install all dependencies:
```
pip install -r requirements.txt
```

(Optional, but recommended for better performance) Install Watchdog:

xcode-select --install  # macOS only, for build tools
pip install watchdog

Usage Examples

Test URLs

# Direct MP4 link
https://sample-videos.com/zip/10/mp4/SampleVideo_1280x720_1mb.mp4

# Loom video (public)
https://www.loom.com/share/your-video-id

# Google Drive (public)
https://drive.google.com/file/d/your-file-id/view

Expected Output

{
  "accent": "American",
  "confidence": 78.5,
  "explanation": "High confidence in American accent with strong linguistic indicators.",
  "all_scores": {
    "American": 78.5,
    "British": 23.1,
    "Australian": 15.7,
    "Canadian": 19.2,
    "South African": 8.3
  }
}

Technical Architecture

Core Components

Video Downloader: Downloads videos from public URLs
Audio Extractor: Uses ffmpeg to extract WAV audio
Speech Recognizer: Google Speech Recognition API
Accent Analyzer: Pattern matching for linguistic markers
Web Interface: Streamlit-based UI

Accent Detection Algorithm

The system analyzes multiple linguistic features:

Vocabulary Patterns: Accent-specific word choices
Phonetic Markers: Pronunciation characteristics
Spelling Patterns: Regional spelling differences
Linguistic Markers: Characteristic phrases and expressions

Confidence Scoring

0-20%: Insufficient markers detected
21-50%: Moderate confidence with limited indicators
51-75%: Good confidence with multiple patterns
76-100%: High confidence with strong linguistic evidence

API Integration

For programmatic access, use the core AccentDetector class:

from accent_detector import AccentDetector

detector = AccentDetector()
result = detector.process_video("https://your-video-url.com/video.mp4")

print(f"Accent: {result['accent']}")
print(f"Confidence: {result['confidence']}%")

Deployment

Streamlit Cloud (Recommended)

Fork this repository
Connect to Streamlit Cloud
Deploy from your GitHub repo
Share the public URL

Docker Deployment

FROM python:3.9-slim

# Install system dependencies
RUN apt-get update && apt-get install -y ffmpeg

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .
EXPOSE 8501

CMD ["streamlit", "run", "accent_detector.py", "--server.port=8501", "--server.address=0.0.0.0"]

Limitations & Considerations

Current Limitations

Requires clear speech audio (background noise affects accuracy)
Works best with 30+ seconds of speech
Free Google Speech Recognition has daily limits
Accent detection based on vocabulary/patterns, not phonetic analysis

Potential Improvements

Integrate phonetic analysis libraries
Add more accent varieties (Indian, Irish, etc.)
Implement batch processing for multiple videos
Add voice activity detection for better audio segmentation

Testing

Manual Testing

Test with different accent samples
Verify confidence scores are reasonable
Check error handling with invalid URLs
Test with various video formats

Automated Testing

def test_accent_detection():
    detector = AccentDetector()
    
    # Test American accent
    american_text = "I'm gonna grab some cookies from the elevator"
    scores = detector.analyze_accent_patterns(american_text)
    assert scores['American'] > scores['British']
    
    # Test British accent  
    british_text = "That's brilliant, quite lovely indeed"
    scores = detector.analyze_accent_patterns(british_text)
    assert scores['British'] > scores['American']

Performance Metrics

Video Download: ~10-30 seconds (depends on file size)
Audio Extraction: ~5-15 seconds
Speech Recognition: ~10-30 seconds
Accent Analysis: <1 second
Total Processing: ~30-90 seconds per video

Troubleshooting

Common Issues

Error: "Could not understand the audio"

Solution: Ensure clear speech, minimal background noise

Error: "Failed to download video"

Solution: Verify URL is public and accessible

Error: "ffmpeg not found"

Solution: Install ffmpeg system dependency

Low confidence scores

Solution: Ensure longer speech samples (30+ seconds)

Support

For technical issues or feature requests:

Check the error messages in the Streamlit interface
Verify all dependencies are installed correctly
Test with known working video URLs

License

MIT License - Free for commercial and personal use.

Built for REM Waste Interview Challenge
Practical AI tools for automated hiring decisions