Spaces:
Sleeping
Sleeping
A newer version of the Streamlit SDK is available:
1.49.1
metadata
title: English Accent Detector
emoji: π€
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.28.0
app_file: app.py
pinned: false
English Accent Detection Tool
A practical AI tool that analyzes English accents from video content. Built for REM Waste's hiring automation system.
π Live Demo
Deployed App: https://accent-detector.streamlit.app
Features
- Video Processing: Accepts public video URLs (MP4, Loom, etc.)
- Audio Extraction: Automatically extracts audio from video files
- Speech Transcription: Converts speech to text using Google Speech Recognition
- Accent Analysis: Detects English accents with confidence scoring
- Web Interface: Simple Streamlit UI for easy testing
Supported Accents
- American English
- British English
- Australian English
- Canadian English
- South African English
Quick Start
Method 1: Use the Deployed App (Recommended)
- Visit: https://accent-detector.streamlit.app
- Paste a public video URL
- Click "Analyze Accent"
- View results with confidence scores
Method 2: Local Installation
# Clone or download the script
git clone <repository-url>
cd accent-detector
# Install dependencies
pip install -r requirements.txt
# Install ffmpeg (required for video processing)
# On macOS:
brew install ffmpeg
# On Ubuntu/Debian:
sudo apt update && sudo apt install ffmpeg
# On Windows:
# Download from https://ffmpeg.org/download.html
# Run the app
streamlit run accent_detector.py
Installation
- Clone this repository and navigate to the project folder.
- (Recommended) Create and activate a Python virtual environment:
python3 -m venv ad_venv source ad_venv/bin/activate
- Install all dependencies:
pip install -r requirements.txt
- (Optional, but recommended for better performance) Install Watchdog:
xcode-select --install # macOS only, for build tools pip install watchdog
Usage Examples
Test URLs
# Direct MP4 link
https://sample-videos.com/zip/10/mp4/SampleVideo_1280x720_1mb.mp4
# Loom video (public)
https://www.loom.com/share/your-video-id
# Google Drive (public)
https://drive.google.com/file/d/your-file-id/view
Expected Output
{
"accent": "American",
"confidence": 78.5,
"explanation": "High confidence in American accent with strong linguistic indicators.",
"all_scores": {
"American": 78.5,
"British": 23.1,
"Australian": 15.7,
"Canadian": 19.2,
"South African": 8.3
}
}
Technical Architecture
Core Components
- Video Downloader: Downloads videos from public URLs
- Audio Extractor: Uses ffmpeg to extract WAV audio
- Speech Recognizer: Google Speech Recognition API
- Accent Analyzer: Pattern matching for linguistic markers
- Web Interface: Streamlit-based UI
Accent Detection Algorithm
The system analyzes multiple linguistic features:
- Vocabulary Patterns: Accent-specific word choices
- Phonetic Markers: Pronunciation characteristics
- Spelling Patterns: Regional spelling differences
- Linguistic Markers: Characteristic phrases and expressions
Confidence Scoring
- 0-20%: Insufficient markers detected
- 21-50%: Moderate confidence with limited indicators
- 51-75%: Good confidence with multiple patterns
- 76-100%: High confidence with strong linguistic evidence
API Integration
For programmatic access, use the core AccentDetector
class:
from accent_detector import AccentDetector
detector = AccentDetector()
result = detector.process_video("https://your-video-url.com/video.mp4")
print(f"Accent: {result['accent']}")
print(f"Confidence: {result['confidence']}%")
Deployment
Streamlit Cloud (Recommended)
- Fork this repository
- Connect to Streamlit Cloud
- Deploy from your GitHub repo
- Share the public URL
Docker Deployment
FROM python:3.9-slim
# Install system dependencies
RUN apt-get update && apt-get install -y ffmpeg
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "accent_detector.py", "--server.port=8501", "--server.address=0.0.0.0"]
Limitations & Considerations
Current Limitations
- Requires clear speech audio (background noise affects accuracy)
- Works best with 30+ seconds of speech
- Free Google Speech Recognition has daily limits
- Accent detection based on vocabulary/patterns, not phonetic analysis
Potential Improvements
- Integrate phonetic analysis libraries
- Add more accent varieties (Indian, Irish, etc.)
- Implement batch processing for multiple videos
- Add voice activity detection for better audio segmentation
Testing
Manual Testing
- Test with different accent samples
- Verify confidence scores are reasonable
- Check error handling with invalid URLs
- Test with various video formats
Automated Testing
def test_accent_detection():
detector = AccentDetector()
# Test American accent
american_text = "I'm gonna grab some cookies from the elevator"
scores = detector.analyze_accent_patterns(american_text)
assert scores['American'] > scores['British']
# Test British accent
british_text = "That's brilliant, quite lovely indeed"
scores = detector.analyze_accent_patterns(british_text)
assert scores['British'] > scores['American']
Performance Metrics
- Video Download: ~10-30 seconds (depends on file size)
- Audio Extraction: ~5-15 seconds
- Speech Recognition: ~10-30 seconds
- Accent Analysis: <1 second
- Total Processing: ~30-90 seconds per video
Troubleshooting
Common Issues
Error: "Could not understand the audio"
- Solution: Ensure clear speech, minimal background noise
Error: "Failed to download video"
- Solution: Verify URL is public and accessible
Error: "ffmpeg not found"
- Solution: Install ffmpeg system dependency
Low confidence scores
- Solution: Ensure longer speech samples (30+ seconds)
Support
For technical issues or feature requests:
- Check the error messages in the Streamlit interface
- Verify all dependencies are installed correctly
- Test with known working video URLs
License
MIT License - Free for commercial and personal use.
Built for REM Waste Interview Challenge
Practical AI tools for automated hiring decisions