Aziz3 commited on
Commit
9154e2d
·
1 Parent(s): 21615fc

adding config

Browse files
Files changed (2) hide show
  1. Info.md +249 -0
  2. README.md +11 -0
Info.md ADDED
@@ -0,0 +1,249 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # English Accent Detection Tool
2
+
3
+ A practical AI tool that analyzes English accents from video content. Built for REM Waste's hiring automation system.
4
+
5
+ ## 🚀 Live Demo
6
+
7
+ **Deployed App:** [https://accent-detector.streamlit.app](https://accent-detector.streamlit.app)
8
+
9
+ ## Features
10
+
11
+ - **Video Processing**: Accepts public video URLs (MP4, Loom, etc.)
12
+ - **Audio Extraction**: Automatically extracts audio from video files
13
+ - **Speech Transcription**: Converts speech to text using Google Speech Recognition
14
+ - **Accent Analysis**: Detects English accents with confidence scoring
15
+ - **Web Interface**: Simple Streamlit UI for easy testing
16
+
17
+ ## Supported Accents
18
+
19
+ - American English
20
+ - British English
21
+ - Australian English
22
+ - Canadian English
23
+ - South African English
24
+
25
+ ## Quick Start
26
+
27
+ ### Method 1: Use the Deployed App (Recommended)
28
+
29
+ 1. Visit: [https://accent-detector.streamlit.app](https://accent-detector.streamlit.app)
30
+ 2. Paste a public video URL
31
+ 3. Click "Analyze Accent"
32
+ 4. View results with confidence scores
33
+
34
+ ### Method 2: Local Installation
35
+
36
+ ```bash
37
+ # Clone or download the script
38
+ git clone <repository-url>
39
+ cd accent-detector
40
+
41
+ # Install dependencies
42
+ pip install -r requirements.txt
43
+
44
+ # Install ffmpeg (required for video processing)
45
+ # On macOS:
46
+ brew install ffmpeg
47
+
48
+ # On Ubuntu/Debian:
49
+ sudo apt update && sudo apt install ffmpeg
50
+
51
+ # On Windows:
52
+ # Download from https://ffmpeg.org/download.html
53
+
54
+ # Run the app
55
+ streamlit run accent_detector.py
56
+ ```
57
+
58
+ ## Installation
59
+
60
+ 1. Clone this repository and navigate to the project folder.
61
+ 2. (Recommended) Create and activate a Python virtual environment:
62
+ ```sh
63
+ python3 -m venv ad_venv
64
+ source ad_venv/bin/activate
65
+ ```
66
+ 3. Install all dependencies:
67
+ ```sh
68
+ pip install -r requirements.txt
69
+ ```
70
+ 4. (Optional, but recommended for better performance) Install Watchdog:
71
+ ```sh
72
+ xcode-select --install # macOS only, for build tools
73
+ pip install watchdog
74
+ ```
75
+
76
+ ## Usage Examples
77
+
78
+ ### Test URLs
79
+ ```
80
+ # Direct MP4 link
81
+ https://sample-videos.com/zip/10/mp4/SampleVideo_1280x720_1mb.mp4
82
+
83
+ # Loom video (public)
84
+ https://www.loom.com/share/your-video-id
85
+
86
+ # Google Drive (public)
87
+ https://drive.google.com/file/d/your-file-id/view
88
+ ```
89
+
90
+ ### Expected Output
91
+ ```json
92
+ {
93
+ "accent": "American",
94
+ "confidence": 78.5,
95
+ "explanation": "High confidence in American accent with strong linguistic indicators.",
96
+ "all_scores": {
97
+ "American": 78.5,
98
+ "British": 23.1,
99
+ "Australian": 15.7,
100
+ "Canadian": 19.2,
101
+ "South African": 8.3
102
+ }
103
+ }
104
+ ```
105
+
106
+ ## Technical Architecture
107
+
108
+ ### Core Components
109
+
110
+ 1. **Video Downloader**: Downloads videos from public URLs
111
+ 2. **Audio Extractor**: Uses ffmpeg to extract WAV audio
112
+ 3. **Speech Recognizer**: Google Speech Recognition API
113
+ 4. **Accent Analyzer**: Pattern matching for linguistic markers
114
+ 5. **Web Interface**: Streamlit-based UI
115
+
116
+ ### Accent Detection Algorithm
117
+
118
+ The system analyzes multiple linguistic features:
119
+
120
+ - **Vocabulary Patterns**: Accent-specific word choices
121
+ - **Phonetic Markers**: Pronunciation characteristics
122
+ - **Spelling Patterns**: Regional spelling differences
123
+ - **Linguistic Markers**: Characteristic phrases and expressions
124
+
125
+ ### Confidence Scoring
126
+
127
+ - **0-20%**: Insufficient markers detected
128
+ - **21-50%**: Moderate confidence with limited indicators
129
+ - **51-75%**: Good confidence with multiple patterns
130
+ - **76-100%**: High confidence with strong linguistic evidence
131
+
132
+ ## API Integration
133
+
134
+ For programmatic access, use the core `AccentDetector` class:
135
+
136
+ ```python
137
+ from accent_detector import AccentDetector
138
+
139
+ detector = AccentDetector()
140
+ result = detector.process_video("https://your-video-url.com/video.mp4")
141
+
142
+ print(f"Accent: {result['accent']}")
143
+ print(f"Confidence: {result['confidence']}%")
144
+ ```
145
+
146
+ ## Deployment
147
+
148
+ ### Streamlit Cloud (Recommended)
149
+
150
+ 1. Fork this repository
151
+ 2. Connect to Streamlit Cloud
152
+ 3. Deploy from your GitHub repo
153
+ 4. Share the public URL
154
+
155
+ ### Docker Deployment
156
+
157
+ ```dockerfile
158
+ FROM python:3.9-slim
159
+
160
+ # Install system dependencies
161
+ RUN apt-get update && apt-get install -y ffmpeg
162
+
163
+ WORKDIR /app
164
+ COPY requirements.txt .
165
+ RUN pip install -r requirements.txt
166
+
167
+ COPY . .
168
+ EXPOSE 8501
169
+
170
+ CMD ["streamlit", "run", "accent_detector.py", "--server.port=8501", "--server.address=0.0.0.0"]
171
+ ```
172
+
173
+ ## Limitations & Considerations
174
+
175
+ ### Current Limitations
176
+ - Requires clear speech audio (background noise affects accuracy)
177
+ - Works best with 30+ seconds of speech
178
+ - Free Google Speech Recognition has daily limits
179
+ - Accent detection based on vocabulary/patterns, not phonetic analysis
180
+
181
+ ### Potential Improvements
182
+ - Integrate phonetic analysis libraries
183
+ - Add more accent varieties (Indian, Irish, etc.)
184
+ - Implement batch processing for multiple videos
185
+ - Add voice activity detection for better audio segmentation
186
+
187
+ ## Testing
188
+
189
+ ### Manual Testing
190
+ 1. Test with different accent samples
191
+ 2. Verify confidence scores are reasonable
192
+ 3. Check error handling with invalid URLs
193
+ 4. Test with various video formats
194
+
195
+ ### Automated Testing
196
+ ```python
197
+ def test_accent_detection():
198
+ detector = AccentDetector()
199
+
200
+ # Test American accent
201
+ american_text = "I'm gonna grab some cookies from the elevator"
202
+ scores = detector.analyze_accent_patterns(american_text)
203
+ assert scores['American'] > scores['British']
204
+
205
+ # Test British accent
206
+ british_text = "That's brilliant, quite lovely indeed"
207
+ scores = detector.analyze_accent_patterns(british_text)
208
+ assert scores['British'] > scores['American']
209
+ ```
210
+
211
+ ## Performance Metrics
212
+
213
+ - **Video Download**: ~10-30 seconds (depends on file size)
214
+ - **Audio Extraction**: ~5-15 seconds
215
+ - **Speech Recognition**: ~10-30 seconds
216
+ - **Accent Analysis**: <1 second
217
+ - **Total Processing**: ~30-90 seconds per video
218
+
219
+ ## Troubleshooting
220
+
221
+ ### Common Issues
222
+
223
+ **Error: "Could not understand the audio"**
224
+ - Solution: Ensure clear speech, minimal background noise
225
+
226
+ **Error: "Failed to download video"**
227
+ - Solution: Verify URL is public and accessible
228
+
229
+ **Error: "ffmpeg not found"**
230
+ - Solution: Install ffmpeg system dependency
231
+
232
+ **Low confidence scores**
233
+ - Solution: Ensure longer speech samples (30+ seconds)
234
+
235
+ ### Support
236
+
237
+ For technical issues or feature requests:
238
+ 1. Check the error messages in the Streamlit interface
239
+ 2. Verify all dependencies are installed correctly
240
+ 3. Test with known working video URLs
241
+
242
+ ## License
243
+
244
+ MIT License - Free for commercial and personal use.
245
+
246
+ ---
247
+
248
+ **Built for REM Waste Interview Challenge**
249
+ *Practical AI tools for automated hiring decisions*
README.md CHANGED
@@ -1,3 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
1
  # English Accent Detection Tool
2
 
3
  A practical AI tool that analyzes English accents from video content. Built for REM Waste's hiring automation system.
 
1
+ ---
2
+ title: English Accent Detector
3
+ emoji: 🎤
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: streamlit
7
+ sdk_version: "1.28.0"
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
11
+
12
  # English Accent Detection Tool
13
 
14
  A practical AI tool that analyzes English accents from video content. Built for REM Waste's hiring automation system.