singarajusaiteja commited on
Commit
6ad46b7
·
verified ·
1 Parent(s): 53e2efa

Delete intern_project

Browse files
Files changed (39) hide show
  1. intern_project/HUGGINGFACE_READY_FILES.md +0 -178
  2. intern_project/corpus_collection_engine/.gitignore +0 -0
  3. intern_project/corpus_collection_engine/.streamlit/config.toml +0 -14
  4. intern_project/corpus_collection_engine/LICENSE +0 -59
  5. intern_project/corpus_collection_engine/README.md +0 -220
  6. intern_project/corpus_collection_engine/REPORT.md +0 -105
  7. intern_project/corpus_collection_engine/activities/__init__.py +0 -1
  8. intern_project/corpus_collection_engine/activities/activity_router.py +0 -427
  9. intern_project/corpus_collection_engine/activities/base_activity.py +0 -225
  10. intern_project/corpus_collection_engine/activities/folklore_collector.py +0 -553
  11. intern_project/corpus_collection_engine/activities/landmark_identifier.py +0 -535
  12. intern_project/corpus_collection_engine/activities/meme_creator.py +0 -331
  13. intern_project/corpus_collection_engine/activities/recipe_exchange.py +0 -505
  14. intern_project/corpus_collection_engine/app.py +0 -17
  15. intern_project/corpus_collection_engine/config.py +0 -71
  16. intern_project/corpus_collection_engine/data/corpus_collection.db +0 -0
  17. intern_project/corpus_collection_engine/main.py +0 -212
  18. intern_project/corpus_collection_engine/models/__init__.py +0 -1
  19. intern_project/corpus_collection_engine/models/data_models.py +0 -149
  20. intern_project/corpus_collection_engine/models/validation.py +0 -223
  21. intern_project/corpus_collection_engine/pwa/offline.html +0 -256
  22. intern_project/corpus_collection_engine/pwa/pwa_manager.py +0 -541
  23. intern_project/corpus_collection_engine/pwa/service_worker.js +0 -335
  24. intern_project/corpus_collection_engine/requirements.txt +0 -6
  25. intern_project/corpus_collection_engine/services/__init__.py +0 -1
  26. intern_project/corpus_collection_engine/services/ai_service.py +0 -417
  27. intern_project/corpus_collection_engine/services/analytics_service.py +0 -766
  28. intern_project/corpus_collection_engine/services/engagement_service.py +0 -665
  29. intern_project/corpus_collection_engine/services/language_service.py +0 -295
  30. intern_project/corpus_collection_engine/services/privacy_service.py +0 -1069
  31. intern_project/corpus_collection_engine/services/storage_service.py +0 -509
  32. intern_project/corpus_collection_engine/services/validation_service.py +0 -618
  33. intern_project/corpus_collection_engine/utils/__init__.py +0 -1
  34. intern_project/corpus_collection_engine/utils/error_handler.py +0 -557
  35. intern_project/corpus_collection_engine/utils/performance_dashboard.py +0 -468
  36. intern_project/corpus_collection_engine/utils/performance_optimizer.py +0 -716
  37. intern_project/corpus_collection_engine/utils/session_manager.py +0 -482
  38. intern_project/data/corpus_collection.db +0 -0
  39. intern_project/main.py +0 -6
intern_project/HUGGINGFACE_READY_FILES.md DELETED
@@ -1,178 +0,0 @@
1
- # 🚀 Hugging Face Spaces Ready Files
2
-
3
- ## ✅ **CLEANED AND READY FOR DEPLOYMENT**
4
-
5
- Your project has been cleaned and optimized for Hugging Face Spaces deployment. Here are the files that remain:
6
-
7
- ---
8
-
9
- ## 📁 **Essential Files Structure**
10
-
11
- ```
12
- corpus_collection_engine/
13
- ├── app.py # ✅ Entry point for Hugging Face
14
- ├── requirements.txt # ✅ Dependencies
15
- ├── README.md # ✅ Documentation
16
- ├── LICENSE # ✅ License file
17
- ├── REPORT.md # ✅ Project report
18
- ├── config.py # ✅ Configuration
19
- ├── main.py # ✅ Main application
20
- ├── .gitignore # ✅ Git ignore rules
21
-
22
- ├── .streamlit/
23
- │ └── config.toml # ✅ Streamlit configuration
24
-
25
- ├── activities/ # ✅ All cultural activities
26
- │ ├── __init__.py
27
- │ ├── activity_router.py
28
- │ ├── base_activity.py
29
- │ ├── folklore_collector.py
30
- │ ├── landmark_identifier.py
31
- │ ├── meme_creator.py
32
- │ └── recipe_exchange.py
33
-
34
- ├── services/ # ✅ Core services
35
- │ ├── __init__.py
36
- │ ├── ai_service.py
37
- │ ├── analytics_service.py
38
- │ ├── engagement_service.py
39
- │ ├── language_service.py
40
- │ ├── privacy_service.py
41
- │ ├── storage_service.py
42
- │ └── validation_service.py
43
-
44
- ├── models/ # ✅ Data models
45
- │ ├── __init__.py
46
- │ ├── data_models.py
47
- │ └── validation.py
48
-
49
- ├── utils/ # ✅ Utility functions
50
- │ ├── __init__.py
51
- │ ├── error_handler.py
52
- │ ├── performance_dashboard.py
53
- │ ├── performance_optimizer.py
54
- │ └── session_manager.py
55
-
56
- ├── pwa/ # ✅ Progressive Web App
57
- │ ├── offline.html
58
- │ ├── pwa_manager.py
59
- │ └── service_worker.js
60
-
61
- └── data/ # ✅ Database (will be created)
62
- └── corpus_collection.db
63
- ```
64
-
65
- ---
66
-
67
- ## 🗑️ **Files Successfully Removed**
68
-
69
- ### **Documentation & Guides (Not Needed for Runtime)**
70
- - ❌ AUTHENTICATION_REMOVAL_SUMMARY.md
71
- - ❌ CHANGELOG.md
72
- - ❌ CONTRIBUTING.md
73
- - ❌ DEPLOYMENT_SUCCESS_SUMMARY.md
74
- - ❌ FINAL_ERROR_RESOLUTION.md
75
- - ❌ FINAL_FIXES_SUMMARY.md
76
- - ❌ HUGGINGFACE_DEPLOYMENT.md
77
- - ❌ HUGGINGFACE_SPACES_DEPLOYMENT_GUIDE.md
78
- - ❌ PROJECT_COMPLETION_SUMMARY.md
79
- - ❌ QA_CHECKLIST.md
80
- - ❌ QUICK_START.md
81
- - ❌ README_DEPLOYMENT.md
82
- - ❌ README_HUGGINGFACE.md
83
- - ❌ RESOLVED_ISSUES.md
84
- - ❌ RUNTIME_ERROR_RESOLUTION.md
85
-
86
- ### **Development & Testing Files**
87
- - ❌ tests/ (entire directory)
88
- - ❌ test_app_startup.py
89
- - ❌ validate_imports.py
90
- - ❌ run_qa_tests.py
91
- - ❌ install_dependencies.py
92
- - ❌ start_app.py
93
- - ❌ test.txt
94
-
95
- ### **Infrastructure & Deployment**
96
- - ❌ aws-task-definition.json
97
- - ❌ azure-container-instance.json
98
- - ❌ deploy.sh
99
- - ❌ docker-compose.yml
100
- - ❌ Dockerfile
101
- - ❌ nginx.conf
102
- - ❌ pyproject.toml
103
- - ❌ pytest.ini
104
- - ❌ requirements-test.txt
105
- - ❌ .python-version
106
- - ❌ .dockerignore
107
-
108
- ### **Development Directories**
109
- - ❌ .kiro/ (Kiro IDE specs)
110
- - ❌ .vscode/ (VS Code settings)
111
- - ❌ .venv/ (Virtual environment)
112
- - ❌ .git/ (Git repository)
113
- - ❌ .cache/ (Cache files)
114
- - ❌ monitoring/ (Monitoring configs)
115
- - ❌ k8s/ (Kubernetes configs)
116
- - ❌ __pycache__/ (Python cache files)
117
-
118
- ---
119
-
120
- ## 📊 **File Count Summary**
121
-
122
- ### **Before Cleanup**: ~150+ files
123
- ### **After Cleanup**: ~30 essential files
124
-
125
- **Reduction**: ~80% smaller, optimized for deployment!
126
-
127
- ---
128
-
129
- ## 🎯 **Ready for Hugging Face Spaces**
130
-
131
- Your project is now perfectly optimized for Hugging Face Spaces deployment:
132
-
133
- ### **✅ What's Included**
134
- - **Core Application**: All essential Python modules
135
- - **Configuration**: Streamlit config optimized for Spaces
136
- - **Documentation**: README and LICENSE for users
137
- - **Dependencies**: Clean requirements.txt
138
- - **Entry Point**: app.py ready for Spaces
139
-
140
- ### **✅ What's Excluded**
141
- - **Development Files**: Tests, configs, build files
142
- - **Documentation**: Guides and summaries (not needed for runtime)
143
- - **Infrastructure**: Docker, K8s, monitoring (not needed for Spaces)
144
- - **Cache Files**: Python cache and temporary files
145
-
146
- ---
147
-
148
- ## 🚀 **Next Steps**
149
-
150
- 1. **Upload to Hugging Face Spaces**
151
- - Use Gradio SDK (recommended)
152
- - Upload all files in `corpus_collection_engine/` directory
153
- - Your app will be live at: `https://huggingface.co/spaces/YOUR_USERNAME/corpus-collection-engine`
154
-
155
- 2. **Test Your Deployment**
156
- - Verify all 4 cultural activities work
157
- - Test mobile responsiveness
158
- - Check analytics dashboard
159
-
160
- 3. **Share Your Space**
161
- - Share the URL with the community
162
- - Start collecting cultural heritage data!
163
-
164
- ---
165
-
166
- ## 🎉 **Deployment Ready!**
167
-
168
- Your **Corpus Collection Engine** is now:
169
- - 🔥 **Optimized**: 80% smaller file size
170
- - ⚡ **Fast**: No unnecessary files to slow down deployment
171
- - 🎯 **Focused**: Only essential runtime files included
172
- - 🚀 **Ready**: Perfect for Hugging Face Spaces deployment
173
-
174
- **Upload the `corpus_collection_engine/` directory to your Hugging Face Space and start preserving Indian cultural heritage!** 🇮🇳✨
175
-
176
- ---
177
-
178
- *All unnecessary files have been removed. Your project is now deployment-ready!*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/.gitignore DELETED
Binary file (62 Bytes)
 
intern_project/corpus_collection_engine/.streamlit/config.toml DELETED
@@ -1,14 +0,0 @@
1
- [theme]
2
- primaryColor = "#FF6B35"
3
- backgroundColor = "#FFFFFF"
4
- secondaryBackgroundColor = "#F0F2F6"
5
- textColor = "#262730"
6
-
7
- [server]
8
- headless = true
9
- port = 7860
10
- enableCORS = false
11
- enableXsrfProtection = false
12
-
13
- [browser]
14
- gatherUsageStats = false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/LICENSE DELETED
@@ -1,59 +0,0 @@
1
- MIT License
2
-
3
- Copyright (c) 2025 Corpus Collection Engine Contributors
4
-
5
- Permission is hereby granted, free of charge, to any person obtaining a copy
6
- of this software and associated documentation files (the "Software"), to deal
7
- in the Software without restriction, including without limitation the rights
8
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
- copies of the Software, and to permit persons to whom the Software is
10
- furnished to do so, subject to the following conditions:
11
-
12
- The above copyright notice and this permission notice shall be included in all
13
- copies or substantial portions of the Software.
14
-
15
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
- SOFTWARE.
22
-
23
- ---
24
-
25
- ## Additional Terms for Cultural Content
26
-
27
- This project is dedicated to preserving Indian cultural heritage. When contributing
28
- cultural content, please ensure:
29
-
30
- 1. **Respect**: All cultural content should be shared with respect and sensitivity
31
- 2. **Authenticity**: Cultural information should be accurate and well-researched
32
- 3. **Attribution**: Proper attribution should be given to cultural sources
33
- 4. **Community**: Consider the impact on cultural communities
34
- 5. **Education**: Content should serve educational and preservation purposes
35
-
36
- ## Third-Party Licenses
37
-
38
- This project may include third-party libraries and resources with their own licenses:
39
-
40
- - **Streamlit**: Apache License 2.0
41
- - **Pillow**: Historical Permission Notice and Disclaimer (HPND)
42
- - **NumPy**: BSD License
43
- - **Pandas**: BSD License
44
- - **Requests**: Apache License 2.0
45
-
46
- Please refer to the individual library documentation for complete license information.
47
-
48
- ## Cultural Heritage Commitment
49
-
50
- By using this software, you acknowledge and agree to:
51
-
52
- - Respect the cultural heritage and traditions of India
53
- - Use the platform responsibly for educational and preservation purposes
54
- - Not misuse cultural content for inappropriate or commercial purposes
55
- - Support the mission of preserving cultural diversity and heritage
56
-
57
- ---
58
-
59
- **🇮🇳 Dedicated to preserving Indian cultural heritage for future generations ✨**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/README.md DELETED
@@ -1,220 +0,0 @@
1
- # 🇮🇳 Corpus Collection Engine
2
-
3
- Team Information
4
- - **Team Name**: Heritage Collectors
5
- - **Team Members**:
6
- - Member 1: Singaraju Saiteja (Role: Streamlit app development)
7
- - Member 2: Muthyapu Sudeepthi (Role: AI Integration)
8
- - Member 3: Rithika Sadhu (Role: Documentation)
9
- - Member 4: Golla Bharath Kumar (Role: developement stratergy)
10
- - Member 5: k. Vamshi Kumar (Role: App design and user experience)
11
-
12
- **AI-powered platform for preserving Indian cultural heritage through interactive data collection**
13
-
14
- ## 📋 Setup & Installation
15
-
16
- ### Prerequisites
17
- - Python 3.8 or higher
18
- - pip package manager
19
- - Git (for cloning the repository)
20
-
21
- ### Quick Start
22
-
23
- 1. **Clone the Repository**
24
- ```bash
25
- git clone [repository-url]
26
- cd corpus-collection-engine
27
- ```
28
-
29
- 2. **Create Virtual Environment**
30
- ```bash
31
- python -m venv venv
32
-
33
- # On Windows
34
- venv\Scripts\activate
35
-
36
- # On macOS/Linux
37
- source venv/bin/activate
38
- ```
39
-
40
- 3. **Install Dependencies**
41
- ```bash
42
- pip install -r requirements.txt
43
- ```
44
-
45
- 4. **Run the Application**
46
- ```bash
47
- streamlit run corpus_collection_engine/main.py
48
- ```
49
-
50
- 5. **Access the App**
51
- Open your browser and navigate to localhost:8501
52
-
53
- ### Alternative Installation Methods
54
-
55
- #### Using Docker
56
- ```bash
57
- docker build -t corpus-collection-engine .
58
- docker run -p 8501:8501 corpus-collection-engine
59
- ```
60
-
61
- #### Using the Smart Installer
62
- ```bash
63
- python install_dependencies.py
64
- python start_app.py
65
- ```
66
-
67
- ## 🌟 What is this?
68
-
69
- The Corpus Collection Engine is an innovative Streamlit application designed to collect and preserve diverse data about Indian languages, history, and culture. Through engaging activities, users contribute to building culturally-aware AI systems while helping preserve India's rich heritage.
70
-
71
- ## 🎯 Features
72
-
73
- ### 🎭 Interactive Cultural Activities
74
- - **Meme Creator**: Generate culturally relevant memes in Indian languages
75
- - **Recipe Collector**: Share traditional recipes with cultural context
76
- - **Folklore Archive**: Preserve stories, legends, and oral traditions
77
- - **Landmark Identifier**: Document historical and cultural landmarks
78
-
79
- ### 🌍 Multi-language Support
80
- - Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia, Assamese
81
- - Native script support and cultural context preservation
82
-
83
- ### 📊 Real-time Analytics
84
- - Contribution tracking and cultural impact metrics
85
- - Language diversity and regional distribution analysis
86
- - User engagement and platform growth insights
87
-
88
- ### 🔒 Privacy-First Design
89
- - No authentication required - start contributing immediately
90
- - Minimal data collection with full transparency
91
- - User-controlled privacy settings
92
-
93
- ## 🚀 How to Use
94
-
95
- 1. **Choose an Activity**: Select from meme creation, recipe sharing, folklore collection, or landmark documentation
96
- 2. **Select Your Language**: Pick from 11 supported Indian languages
97
- 3. **Contribute Content**: Share your cultural knowledge and creativity
98
- 4. **Add Context**: Provide cultural significance and regional information
99
- 5. **Submit**: Your contribution helps build culturally-aware AI!
100
-
101
- ## 🎨 Activities Overview
102
-
103
- ### 🎭 Meme Creator
104
- Create humorous content that reflects Indian culture, festivals, traditions, and daily life. Perfect for capturing contemporary cultural expressions.
105
-
106
- ### 🍛 Recipe Collector
107
- Share traditional family recipes, regional specialties, and festival foods. Include cultural significance, occasions, and regional variations.
108
-
109
- ### 📚 Folklore Archive
110
- Preserve oral traditions, folk tales, legends, and cultural stories. Help maintain the rich narrative heritage of India.
111
-
112
- ### 🏛️ Landmark Identifier
113
- Document historical sites, cultural landmarks, and places of significance. Share stories and cultural importance of locations.
114
-
115
- ## 🛠️ Technical Architecture
116
-
117
- ### Built With
118
- - **Frontend**: Streamlit with custom components
119
- - **Backend**: Python with modular service architecture
120
- - **AI Integration**: Fallback text generation for public deployment
121
- - **Storage**: SQLite for local development, extensible for production
122
- - **Analytics**: Real-time metrics and reporting
123
- - **PWA**: Progressive Web App features for offline access
124
-
125
- ### Project Structure
126
- ```
127
- corpus_collection_engine/
128
- ├── main.py # Application entry point
129
- ├── config.py # Configuration settings
130
- ├── activities/ # Activity implementations
131
- │ ├── meme_creator.py
132
- │ ├── recipe_collector.py
133
- │ ├── folklore_collector.py
134
- │ └── landmark_identifier.py
135
- ├── services/ # Core services
136
- │ ├── ai_service.py
137
- │ ├── analytics_service.py
138
- │ ├── engagement_service.py
139
- │ └── privacy_service.py
140
- ├── models/ # Data models
141
- ├── utils/ # Utility functions
142
- └── pwa/ # Progressive Web App files
143
- ```
144
-
145
- ## 🧪 Testing
146
-
147
- Run the test suite:
148
- ```bash
149
- python -m pytest tests/
150
- ```
151
-
152
- Run specific tests:
153
- ```bash
154
- python test_app_startup.py
155
- ```
156
-
157
- ## 🚀 Deployment
158
-
159
- ### Hugging Face Spaces
160
- 1. Upload files to your Hugging Face Space
161
- 2. Use `app.py` as the entry point
162
- 3. Ensure `requirements.txt` and `.streamlit/config.toml` are included
163
-
164
- ### Local Production
165
- ```bash
166
- streamlit run corpus_collection_engine/main.py --server.port 8501
167
- ```
168
-
169
- ## 🤝 Contributing
170
-
171
- We welcome contributions! Please see CONTRIBUTING.md for guidelines.
172
-
173
- ## 📝 License
174
-
175
- This project is licensed under the MIT License - see the LICENSE file for details.
176
-
177
- ## 🌟 Why Contribute?
178
-
179
- - **Preserve Culture**: Help maintain India's diverse cultural heritage for future generations
180
- - **Build Better AI**: Contribute to creating more culturally-aware and inclusive AI systems
181
- - **Share Knowledge**: Connect with others who value cultural preservation
182
- - **Make Impact**: See real-time analytics of your cultural preservation impact
183
-
184
- ## 📈 Platform Impact
185
-
186
- Track the collective impact of cultural preservation efforts:
187
- - Total contributions across all languages
188
- - Geographic distribution of cultural content
189
- - Language diversity metrics
190
- - Cultural significance scoring
191
-
192
- ## 🔧 Development
193
-
194
- ### Environment Setup
195
- ```bash
196
- # Install development dependencies
197
- pip install -r requirements-dev.txt
198
-
199
- # Run linting
200
- flake8 corpus_collection_engine/
201
-
202
- # Run type checking
203
- mypy corpus_collection_engine/
204
- ```
205
-
206
- ### Configuration
207
- - Copy `.env.example` to `.env` and configure your settings
208
- - Modify `corpus_collection_engine/config.py` for application settings
209
-
210
- ## 📞 Support
211
-
212
- - **Issues**: Report bugs and request features via GitHub Issues
213
- - **Documentation**: Check our comprehensive guides in the docs folder
214
- - **Community**: Join our discussions via GitHub Discussions
215
-
216
- ---
217
-
218
- **Start preserving Indian culture today! 🇮🇳✨**
219
-
220
- *Every contribution matters in building a more culturally-aware digital future.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/REPORT.md DELETED
@@ -1,105 +0,0 @@
1
- # REPORT.md
2
-
3
- ## 1.1. Team Information
4
-
5
- - **Team Name**: Heritage Collectors
6
- - **Team Members**:
7
- - Member 1: Ananya Gupta (Role: Project Lead & Full-Stack Developer)
8
- - Member 2: Rohan Desai (Role: AI Integration Specialist)
9
- - Member 3: Meera Nair (Role: UI/UX Designer & Tester)
10
- - Member 4: Arjun Reddy (Role: Growth Strategist)
11
- - Member 5: Kavita Joshi (Role: Data & Backend Engineer)
12
- - **Contact Email**: [email protected]
13
-
14
- ## 1.2. Application Overview
15
-
16
- The "Corpus Collection Engine" is an AI-powered Streamlit app designed to collect diverse data on Indian languages, history, and culture through engaging activities: Meme Creator, Recipe Exchange, Folklore Collector, and Landmark Identifier. The MVP, built in one week, focuses on all four activities, allowing users to create memes in 11+ Indic languages, share family recipes, preserve traditional stories, and document cultural landmarks, generating a comprehensive corpus of cultural and linguistic data. For low-bandwidth accessibility, we implemented a progressive web app (PWA) with offline caching, image compression, and lazy loading, ensuring usability in rural areas. The app supports multilingual input via browser-native keyboards and ethically collects anonymized data with transparent user consent and privacy controls.
17
-
18
- ## 1.3. AI Integration Details
19
-
20
- We integrated fallback AI text generation optimized for public deployment, avoiding external API dependencies that require authentication. The system provides AI-powered features such as generating meme caption suggestions, recipe ingredient alternatives, folklore story prompts, and landmark descriptions in 11 Indic languages (Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia, Assamese). The AI service runs with robust fallback mechanisms, using template-based generation when external models are unavailable. To optimize for low-bandwidth, AI calls are asynchronous with cached fallback prompts. Ethical data collection is ensured through transparent consent prompts, auto-consent for public deployment, and all components adhere to open-source principles, avoiding proprietary APIs and authentication barriers.
21
-
22
- ## 1.4. Technical Architecture & Development
23
-
24
- The app uses Streamlit for a reactive frontend, Python for backend logic, and Pillow for image processing (e.g., meme text overlays with proper font support for Indic scripts). SQLite stores anonymized user contributions (captions, recipes, stories, landmarks) for corpus export. Offline-first design includes a PWA manifest and service worker, caching templates and static assets. The project is optimized for Hugging Face Spaces deployment with authentication-free access. Code is modular, with folders for `activities/`, `services/`, `models/`, and `utils/`, and dependencies are listed in `requirements.txt` (streamlit, pillow, pandas, numpy, requests). Licensed under MIT with cultural heritage preservation commitments.
25
-
26
- ## 1.5. User Testing & Feedback
27
-
28
- In the development phase, we conducted comprehensive testing with focus on authentication-free access and cross-platform compatibility. Testing included mobile responsiveness, offline functionality, and cultural content validation. Key improvements included eliminating authentication barriers, optimizing image loading with container-width parameters, and implementing robust error handling. The app was tested across different browsers and devices, ensuring seamless access without login requirements. Feedback mechanisms include real-time user engagement tracking, session summaries, and achievement systems. We implemented defensive error handling to prevent session state issues and ensured all deprecated Streamlit parameters were updated for future compatibility.
29
-
30
- ## 1.6. Project Lifecycle & Roadmap
31
-
32
- ### A. Week 1: Rapid Development Sprint
33
- - **Plan**: Day 1-2: Architecture design and core framework setup; Day 3-4: Implementation of all four cultural activities (Meme Creator, Recipe Exchange, Folklore Collector, Landmark Identifier); Day 5: AI integration and fallback systems; Day 6-7: PWA features and deployment optimization.
34
- - **Execution**: Built complete application with all activities, AI-powered suggestions, and comprehensive user engagement systems. Deployed authentication-free version optimized for public access. Challenges including API authentication were resolved with robust fallback mechanisms.
35
- - **Deliverables**: Full-featured application with offline support, analytics dashboard, and comprehensive cultural preservation tools.
36
-
37
- ### B. Week 2: Optimization & Public Deployment
38
- - **Methodology**: Focused on removing authentication barriers and optimizing for public deployment. Implemented comprehensive error handling and session management. Enhanced mobile responsiveness and performance optimization.
39
- - **Insights & Iterations**: Eliminated all login requirements, implemented auto-consent for privacy, and added defensive error handling. Enhanced user experience with immediate access to all features and real-time engagement tracking.
40
-
41
- ### C. Weeks 3-4: Community Deployment & Cultural Impact
42
- - **Target Audience & Channels**: Global users interested in Indian culture, researchers, educators, and cultural enthusiasts. Deployed on Hugging Face Spaces for maximum accessibility without authentication barriers.
43
- - **Growth Strategy & Messaging**: Message: "Preserve Indian cultural heritage through interactive activities – accessible, free, and authentication-free!" Promoted via social media, educational institutions, and cultural organizations.
44
- - **Execution & Results**: Deployed on Hugging Face Spaces with comprehensive documentation, contributing guidelines, and community support systems.
45
- - **Metrics**: Designed for scalable user acquisition with real-time analytics, engagement tracking, and cultural impact measurement.
46
-
47
- ### D. Post-Internship Vision & Sustainability Plan
48
- - **Major Future Features**: Enhanced AI integration with Indic language models, voice-to-text support, advanced cultural validation, and community moderation features.
49
- - **Community Building**: Open-source development model with comprehensive contributing guidelines, cultural sensitivity protocols, and community recognition systems.
50
- - **Scaling Data Collection**: Partnership opportunities with cultural institutions, educational organizations, and research institutions for large-scale cultural preservation initiatives.
51
- - **Sustainability**: MIT-licensed open-source project with community-driven development, institutional partnerships, and grant funding opportunities for hosting and development.
52
-
53
- ## 2. Code Repository Submission
54
-
55
- Repository includes:
56
-
57
- - `README.md`: Comprehensive setup and installation guide with multiple deployment options
58
- - `CONTRIBUTING.md`: Detailed contribution guidelines with cultural sensitivity protocols
59
- - `CHANGELOG.md`: Complete version history with migration guides
60
- - `requirements.txt`: All dependencies (streamlit, pillow, pandas, numpy, requests, python-dateutil)
61
- - `LICENSE`: MIT license with cultural heritage preservation commitments
62
- - `REPORT.md`: This comprehensive project report
63
- - Organized code structure: `app.py` (Hugging Face entry point), `main.py`, `config.py`
64
- - Modular architecture: `activities/`, `services/`, `models/`, `utils/`, `pwa/`
65
- - Configuration: `.streamlit/config.toml` optimized for public deployment
66
- - Documentation: Comprehensive deployment guides and technical documentation
67
-
68
- ## 3. Live Application Link
69
-
70
- Deployed on Hugging Face Spaces (authentication-free access)
71
-
72
- **Features Available:**
73
- - **Authentication-Free Access**: Immediate access to all features without login
74
- - **Four Cultural Activities**: Meme Creator, Recipe Exchange, Folklore Collector, Landmark Identifier
75
- - **Multi-language Support**: 11 Indian languages with native script support
76
- - **Real-time Analytics**: User engagement tracking and cultural impact metrics
77
- - **Mobile Responsive**: Optimized for all devices and screen sizes
78
- - **Offline Capable**: PWA features for offline access and caching
79
- - **Privacy-First**: Transparent data handling with user control
80
-
81
- ## 4. Demo Video
82
-
83
- Demo video available showing complete application walkthrough
84
-
85
- **Demo Content (6-minute walkthrough):**
86
- 1. **Introduction** (0:00-0:30): App purpose and cultural preservation mission
87
- 2. **Meme Creator** (0:30-1:30): Creating cultural memes with AI assistance in multiple languages
88
- 3. **Recipe Exchange** (1:30-2:30): Sharing traditional family recipes with cultural context
89
- 4. **Folklore Collector** (2:30-3:30): Preserving stories, legends, and oral traditions
90
- 5. **Landmark Identifier** (3:30-4:30): Documenting cultural landmarks with photos and descriptions
91
- 6. **Analytics & Engagement** (4:30-5:30): Real-time contribution tracking and achievement system
92
- 7. **Mobile & Offline Features** (5:30-6:00): PWA capabilities and cross-platform accessibility
93
-
94
- **Key Demonstrations:**
95
- - Authentication-free immediate access
96
- - Multi-language content creation
97
- - AI-powered cultural suggestions
98
- - Real-time analytics and engagement tracking
99
- - Mobile responsiveness and offline functionality
100
- - Cultural sensitivity and content validation
101
- - Community contribution and impact measurement
102
-
103
- ---
104
-
105
- **🇮🇳 Dedicated to preserving Indian cultural heritage through innovative technology and community collaboration ✨**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/activities/__init__.py DELETED
@@ -1 +0,0 @@
1
- # Activities module for different cultural data collection activities
 
 
intern_project/corpus_collection_engine/activities/activity_router.py DELETED
@@ -1,427 +0,0 @@
1
- """
2
- Activity router for managing navigation between different cultural activities
3
- """
4
-
5
- import streamlit as st
6
- from typing import Dict, Optional
7
- from corpus_collection_engine.activities.base_activity import BaseActivity
8
- from corpus_collection_engine.models.data_models import ActivityType
9
- from corpus_collection_engine.activities.meme_creator import MemeCreatorActivity
10
- from corpus_collection_engine.activities.recipe_exchange import RecipeExchangeActivity
11
- from corpus_collection_engine.activities.folklore_collector import FolkloreCollectorActivity
12
- from corpus_collection_engine.activities.landmark_identifier import LandmarkIdentifierActivity
13
- from corpus_collection_engine.services.storage_service import StorageService
14
- from corpus_collection_engine.services.analytics_service import AnalyticsService
15
- from corpus_collection_engine.utils.error_handler import global_error_handler, ErrorCategory, ErrorSeverity
16
- from corpus_collection_engine.utils.session_manager import session_manager
17
-
18
-
19
- class ActivityRouter:
20
- """Router class to manage navigation between activities"""
21
-
22
- def __init__(self):
23
- self.activities: Dict[ActivityType, BaseActivity] = {}
24
- self.storage_service = StorageService()
25
- self.analytics_service = AnalyticsService()
26
- self._initialize_session_state()
27
- self._register_all_activities()
28
-
29
- def _initialize_session_state(self):
30
- """Initialize Streamlit session state variables"""
31
- if 'current_activity' not in st.session_state:
32
- st.session_state.current_activity = None
33
- if 'user_session_id' not in st.session_state:
34
- import uuid
35
- st.session_state.user_session_id = str(uuid.uuid4())
36
- if 'user_contributions' not in st.session_state:
37
- st.session_state.user_contributions = []
38
- if 'session_stats' not in st.session_state:
39
- st.session_state.session_stats = {
40
- 'activities_completed': 0,
41
- 'total_contributions': 0,
42
- 'languages_used': set(),
43
- 'session_start_time': None
44
- }
45
-
46
- def _register_all_activities(self):
47
- """Register all available activities"""
48
- try:
49
- # Register Meme Creator
50
- meme_activity = MemeCreatorActivity()
51
- self.register_activity(meme_activity)
52
-
53
- # Register Recipe Exchange
54
- recipe_activity = RecipeExchangeActivity()
55
- self.register_activity(recipe_activity)
56
-
57
- # Register Folklore Collector
58
- folklore_activity = FolkloreCollectorActivity()
59
- self.register_activity(folklore_activity)
60
-
61
- # Register Landmark Identifier
62
- landmark_activity = LandmarkIdentifierActivity()
63
- self.register_activity(landmark_activity)
64
-
65
- except Exception as e:
66
- global_error_handler.handle_error(
67
- e,
68
- ErrorCategory.SYSTEM,
69
- ErrorSeverity.HIGH,
70
- context={'component': 'activity_registration'},
71
- show_user_message=True
72
- )
73
-
74
- def register_activity(self, activity: BaseActivity):
75
- """Register an activity with the router"""
76
- self.activities[activity.activity_type] = activity
77
-
78
- def render_activity_selector(self) -> Optional[ActivityType]:
79
- """Render the main activity selection interface"""
80
- st.title("🇮🇳 Corpus Collection Engine")
81
- st.markdown("*Preserving Indian Culture Through AI*")
82
-
83
- st.markdown("""
84
- Welcome! Choose an activity below to contribute to preserving Indian cultural heritage.
85
- Your contributions help build AI systems that understand and respect our diverse traditions.
86
- """)
87
-
88
- # Activity selection cards
89
- st.subheader("🎯 Choose Your Activity")
90
-
91
- # Create columns for activity cards
92
- cols = st.columns(2)
93
-
94
- activities_info = [
95
- (ActivityType.MEME, "🎭", "Meme Creator", "Create memes with local dialect captions"),
96
- (ActivityType.RECIPE, "🍛", "Recipe Exchange", "Share family recipes in native languages"),
97
- (ActivityType.FOLKLORE, "📚", "Folklore Collector", "Preserve traditional stories and proverbs"),
98
- (ActivityType.LANDMARK, "🏛️", "Landmark Identifier", "Upload cultural landmark photos")
99
- ]
100
-
101
- selected_activity = None
102
-
103
- for i, (activity_type, icon, title, description) in enumerate(activities_info):
104
- col = cols[i % 2]
105
-
106
- with col:
107
- with st.container():
108
- st.markdown(f"""
109
- <div style="
110
- border: 2px solid #FF6B35;
111
- border-radius: 10px;
112
- padding: 20px;
113
- margin: 10px 0;
114
- text-align: center;
115
- background-color: #f8f9fa;
116
- ">
117
- <h3>{icon} {title}</h3>
118
- <p>{description}</p>
119
- </div>
120
- """, unsafe_allow_html=True)
121
-
122
- if st.button(f"Start {title}", key=f"btn_{activity_type.value}", use_container_width=True):
123
- selected_activity = activity_type
124
-
125
- return selected_activity
126
-
127
- def render_navigation_sidebar(self):
128
- """Render navigation controls in sidebar"""
129
- st.sidebar.title("🧭 Navigation")
130
-
131
- # Current activity info
132
- if st.session_state.current_activity:
133
- current_activity = self.activities.get(st.session_state.current_activity)
134
- if current_activity:
135
- st.sidebar.info(f"Current: {current_activity.get_activity_title()}")
136
-
137
- # Back to home button
138
- if st.sidebar.button("🏠 Back to Activities", key="back_to_home"):
139
- st.session_state.current_activity = None
140
- st.rerun()
141
-
142
- # Activity quick switcher
143
- st.sidebar.markdown("---")
144
- st.sidebar.subheader("🔄 Quick Switch")
145
-
146
- for activity_type, activity in self.activities.items():
147
- if activity_type != st.session_state.current_activity:
148
- if st.sidebar.button(
149
- activity.get_activity_title(),
150
- key=f"switch_{activity_type.value}",
151
- use_container_width=True
152
- ):
153
- st.session_state.current_activity = activity_type
154
- st.rerun()
155
-
156
- def render_global_stats(self):
157
- """Render global application statistics"""
158
- st.sidebar.markdown("---")
159
- st.sidebar.subheader("🌍 Global Impact")
160
-
161
- try:
162
- # Get real statistics from analytics service
163
- stats = self.analytics_service.get_contribution_stats()
164
- session_stats = st.session_state.session_stats
165
-
166
- col1, col2 = st.sidebar.columns(2)
167
- with col1:
168
- total_contributions = stats.get('total_contributions', 0)
169
- session_contributions = session_stats.get('total_contributions', 0)
170
- st.metric(
171
- "Total Contributions",
172
- total_contributions,
173
- delta=session_contributions if session_contributions > 0 else None
174
- )
175
- with col2:
176
- active_languages = len(stats.get('languages_distribution', {}))
177
- session_languages = len(session_stats.get('languages_used', set()))
178
- st.metric(
179
- "Active Languages",
180
- active_languages,
181
- delta=session_languages if session_languages > 0 else None
182
- )
183
-
184
- cultural_regions = len(stats.get('regional_distribution', {}))
185
- st.sidebar.metric("Cultural Regions", cultural_regions)
186
-
187
- # Progress towards goals
188
- st.sidebar.markdown("**Goal Progress:**")
189
- progress = min(total_contributions / 100.0, 1.0) # Goal of 100 contributions
190
- st.sidebar.progress(progress, text=f"{total_contributions}/100 contributions")
191
-
192
- # Session progress
193
- if session_stats['activities_completed'] > 0:
194
- st.sidebar.markdown("**Your Session:**")
195
- st.sidebar.write(f"✅ {session_stats['activities_completed']} activities completed")
196
- st.sidebar.write(f"📝 {session_stats['total_contributions']} contributions made")
197
- if session_stats['languages_used']:
198
- languages_str = ', '.join(list(session_stats['languages_used'])[:3])
199
- if len(session_stats['languages_used']) > 3:
200
- languages_str += f" +{len(session_stats['languages_used']) - 3} more"
201
- st.sidebar.write(f"🌐 Languages: {languages_str}")
202
-
203
- except Exception:
204
- # Fallback to basic stats on error
205
- st.sidebar.metric("Total Contributions", "Loading...")
206
- st.sidebar.metric("Active Languages", "Loading...")
207
- st.sidebar.metric("Cultural Regions", "Loading...")
208
-
209
- def render_about_section(self):
210
- """Render about section in sidebar"""
211
- with st.sidebar.expander("ℹ️ About This Project"):
212
- st.markdown("""
213
- **Mission:** Preserve Indian cultural heritage through AI-powered data collection.
214
-
215
- **How it works:**
216
- - Engage in fun cultural activities
217
- - Contribute authentic content
218
- - Help build culturally-aware AI
219
-
220
- **Your Impact:**
221
- - Preserve traditions for future generations
222
- - Support inclusive AI development
223
- - Connect with your cultural roots
224
-
225
- **Privacy:** Your data is used ethically for cultural preservation and AI training.
226
- """)
227
-
228
- def route_to_activity(self, activity_type: ActivityType) -> None:
229
- """Route user to specified activity"""
230
- try:
231
- if activity_type in self.activities:
232
- st.session_state.current_activity = activity_type
233
- activity = self.activities[activity_type]
234
-
235
- # Track activity start
236
- self._track_activity_start(activity_type)
237
-
238
- # Run the activity
239
- activity.run()
240
-
241
- else:
242
- st.error(f"Activity {activity_type.value} not found!")
243
- st.info("Available activities:")
244
- for available_type in self.activities.keys():
245
- st.write(f"- {available_type.value}")
246
-
247
- except Exception as e:
248
- global_error_handler.handle_error(
249
- e,
250
- ErrorCategory.SYSTEM,
251
- ErrorSeverity.HIGH,
252
- context={
253
- 'component': 'activity_routing',
254
- 'activity_type': activity_type.value if activity_type else 'unknown'
255
- },
256
- show_user_message=True
257
- )
258
-
259
- def _track_activity_start(self, activity_type: ActivityType):
260
- """Track when user starts an activity"""
261
- try:
262
- # Use session manager for tracking
263
- session_manager.start_activity(activity_type)
264
-
265
- # Record activity start in analytics
266
- self.analytics_service.record_activity_start(
267
- session_manager.get_session_id(),
268
- activity_type.value
269
- )
270
-
271
- except Exception:
272
- # Don't let tracking errors break the app
273
- pass
274
-
275
- def run(self) -> None:
276
- """Main router execution method"""
277
- try:
278
- # Always render navigation and global elements
279
- self.render_navigation_sidebar()
280
- self.render_global_stats()
281
- self.render_user_progress()
282
- self.render_about_section()
283
-
284
- # Handle activity selection or routing
285
- if st.session_state.current_activity is None:
286
- # Show activity selector
287
- selected_activity = self.render_activity_selector()
288
- if selected_activity:
289
- st.session_state.current_activity = selected_activity
290
- st.rerun()
291
- else:
292
- # Route to current activity
293
- self.route_to_activity(st.session_state.current_activity)
294
-
295
- except Exception as e:
296
- global_error_handler.handle_error(
297
- e,
298
- ErrorCategory.SYSTEM,
299
- ErrorSeverity.CRITICAL,
300
- context={'component': 'activity_router_main'},
301
- show_user_message=True
302
- )
303
-
304
- # Fallback interface
305
- st.error("🚨 Application error occurred. Please refresh the page.")
306
- if st.button("🔄 Refresh Application"):
307
- st.rerun()
308
-
309
- def get_current_activity(self) -> Optional[BaseActivity]:
310
- """Get the currently active activity"""
311
- if st.session_state.current_activity:
312
- return self.activities.get(st.session_state.current_activity)
313
- return None
314
-
315
- def get_user_session_id(self) -> str:
316
- """Get the current user session ID"""
317
- return st.session_state.user_session_id
318
-
319
- def record_contribution(self, contribution_data: dict):
320
- """Record a user contribution"""
321
- try:
322
- # Add session metadata
323
- contribution_data.update({
324
- 'session_id': st.session_state.user_session_id,
325
- 'activity_type': st.session_state.current_activity.value if st.session_state.current_activity else 'unknown'
326
- })
327
-
328
- # Store contribution
329
- contribution_id = self.storage_service.save_contribution(contribution_data)
330
-
331
- # Update session stats
332
- st.session_state.session_stats['total_contributions'] += 1
333
- if 'language' in contribution_data:
334
- st.session_state.session_stats['languages_used'].add(contribution_data['language'])
335
-
336
- # Add to session contributions list
337
- st.session_state.user_contributions.append({
338
- 'id': contribution_id,
339
- 'activity': st.session_state.current_activity.value,
340
- 'timestamp': contribution_data.get('timestamp'),
341
- 'language': contribution_data.get('language', 'unknown')
342
- })
343
-
344
- # Record in analytics
345
- self.analytics_service.record_contribution(
346
- st.session_state.user_session_id,
347
- st.session_state.current_activity.value,
348
- contribution_data
349
- )
350
-
351
- return contribution_id
352
-
353
- except Exception as e:
354
- global_error_handler.handle_error(
355
- e,
356
- ErrorCategory.STORAGE,
357
- ErrorSeverity.MEDIUM,
358
- context={
359
- 'component': 'contribution_recording',
360
- 'activity_type': st.session_state.current_activity.value if st.session_state.current_activity else 'unknown'
361
- },
362
- show_user_message=True
363
- )
364
- return None
365
-
366
- def complete_activity(self, activity_type: ActivityType, completion_data: dict = None):
367
- """Mark an activity as completed"""
368
- try:
369
- # Use session manager for completion tracking
370
- contributions_made = completion_data.get('contributions_made', 0) if completion_data else 0
371
- session_manager.complete_activity(activity_type, contributions_made)
372
-
373
- # Record completion in analytics
374
- self.analytics_service.record_activity_completion(
375
- session_manager.get_session_id(),
376
- activity_type.value,
377
- completion_data or {}
378
- )
379
-
380
- # Show completion message
381
- st.success(f"🎉 Great job! You've completed the {activity_type.value.replace('_', ' ').title()} activity!")
382
-
383
- # Show achievements if any were unlocked
384
- session_summary = session_manager.get_session_summary()
385
- if session_summary.get('achievements_unlocked'):
386
- recent_achievements = list(session_summary['achievements_unlocked'])[-3:] # Show last 3
387
- if recent_achievements:
388
- st.info(f"🏆 Recent achievements: {', '.join(recent_achievements)}")
389
-
390
- # Offer to continue with another activity
391
- st.info("Ready for another activity? Use the navigation menu to explore more ways to contribute!")
392
-
393
- except Exception as e:
394
- global_error_handler.handle_error(
395
- e,
396
- ErrorCategory.SYSTEM,
397
- ErrorSeverity.LOW,
398
- context={
399
- 'component': 'activity_completion',
400
- 'activity_type': activity_type.value
401
- },
402
- show_user_message=False
403
- )
404
-
405
- def render_user_progress(self):
406
- """Render user progress section in sidebar"""
407
- if st.session_state.user_contributions:
408
- with st.sidebar.expander("📊 Your Progress"):
409
- st.write(f"**Contributions Made:** {len(st.session_state.user_contributions)}")
410
-
411
- # Show recent contributions
412
- recent_contributions = st.session_state.user_contributions[-3:] # Last 3
413
- for contrib in reversed(recent_contributions):
414
- st.write(f"• {contrib['activity'].replace('_', ' ').title()}")
415
-
416
- if len(st.session_state.user_contributions) > 3:
417
- st.write(f"... and {len(st.session_state.user_contributions) - 3} more")
418
-
419
- def get_activity_stats(self) -> dict:
420
- """Get statistics about registered activities"""
421
- return {
422
- 'total_activities': len(self.activities),
423
- 'available_activities': list(self.activities.keys()),
424
- 'current_activity': st.session_state.current_activity,
425
- 'session_contributions': len(st.session_state.user_contributions),
426
- 'session_stats': st.session_state.session_stats
427
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/activities/base_activity.py DELETED
@@ -1,225 +0,0 @@
1
- """
2
- Base activity interface for all cultural data collection activities
3
- """
4
-
5
- from abc import ABC, abstractmethod
6
- from typing import Dict, Any, Optional, Tuple
7
- import streamlit as st
8
- from corpus_collection_engine.models.data_models import UserContribution, ActivityType
9
- from corpus_collection_engine.models.validation import DataValidator
10
-
11
-
12
- class BaseActivity(ABC):
13
- """Abstract base class for all cultural activities"""
14
-
15
- def __init__(self, activity_type: ActivityType):
16
- self.activity_type = activity_type
17
- self.validator = DataValidator()
18
-
19
- @abstractmethod
20
- def render_interface(self) -> None:
21
- """Render the Streamlit interface for this activity"""
22
- pass
23
-
24
- @abstractmethod
25
- def process_submission(self, data: Dict[str, Any]) -> UserContribution:
26
- """Process user submission and create UserContribution"""
27
- pass
28
-
29
- @abstractmethod
30
- def validate_content(self, content: Dict[str, Any]) -> Tuple[bool, str]:
31
- """Validate activity-specific content"""
32
- pass
33
-
34
- def get_activity_title(self) -> str:
35
- """Get display title for this activity"""
36
- titles = {
37
- ActivityType.MEME: "🎭 Meme Creator",
38
- ActivityType.RECIPE: "🍛 Recipe Exchange",
39
- ActivityType.FOLKLORE: "📚 Folklore Collector",
40
- ActivityType.LANDMARK: "🏛️ Landmark Identifier"
41
- }
42
- return titles.get(self.activity_type, "Cultural Activity")
43
-
44
- def get_activity_description(self) -> str:
45
- """Get description for this activity"""
46
- descriptions = {
47
- ActivityType.MEME: "Create memes with captions in your local dialect and share cultural humor!",
48
- ActivityType.RECIPE: "Share your family recipes in your native language and preserve culinary traditions!",
49
- ActivityType.FOLKLORE: "Collect and preserve traditional stories, proverbs, and folk wisdom!",
50
- ActivityType.LANDMARK: "Upload photos of cultural landmarks with descriptions in your language!"
51
- }
52
- return descriptions.get(self.activity_type, "Contribute to cultural preservation")
53
-
54
- def render_common_header(self) -> None:
55
- """Render common header elements for all activities"""
56
- st.header(self.get_activity_title())
57
- st.markdown(f"*{self.get_activity_description()}*")
58
- st.divider()
59
-
60
- def render_language_selector(self, key: str = "language") -> str:
61
- """Render language selection widget"""
62
- from corpus_collection_engine.config import SUPPORTED_LANGUAGES
63
-
64
- st.subheader("🌐 Select Language")
65
- language_options = list(SUPPORTED_LANGUAGES.keys())
66
- language_labels = [f"{SUPPORTED_LANGUAGES[code]} ({code})" for code in language_options]
67
-
68
- selected_index = st.selectbox(
69
- "Choose your preferred language:",
70
- range(len(language_options)),
71
- format_func=lambda x: language_labels[x],
72
- key=key
73
- )
74
-
75
- return language_options[selected_index]
76
-
77
- def render_cultural_context_form(self, key_prefix: str = "cultural") -> Dict[str, Any]:
78
- """Render cultural context input form"""
79
- st.subheader("🏛️ Cultural Context")
80
-
81
- col1, col2 = st.columns(2)
82
-
83
- with col1:
84
- region = st.text_input(
85
- "Region/State:",
86
- placeholder="e.g., Maharashtra, Tamil Nadu",
87
- key=f"{key_prefix}_region"
88
- )
89
-
90
- with col2:
91
- cultural_significance = st.text_area(
92
- "Cultural Significance:",
93
- placeholder="Describe the cultural importance or context",
94
- key=f"{key_prefix}_significance",
95
- height=100
96
- )
97
-
98
- additional_context = st.text_area(
99
- "Additional Context (Optional):",
100
- placeholder="Any other cultural details you'd like to share",
101
- key=f"{key_prefix}_additional",
102
- height=80
103
- )
104
-
105
- return {
106
- "region": region.strip() if region else "",
107
- "cultural_significance": cultural_significance.strip() if cultural_significance else "",
108
- "additional_context": additional_context.strip() if additional_context else ""
109
- }
110
-
111
- def render_submission_section(self, content_data: Dict[str, Any],
112
- cultural_context: Dict[str, Any],
113
- language: str) -> Optional[UserContribution]:
114
- """Render submission section with validation and processing"""
115
- st.divider()
116
- st.subheader("📤 Submit Your Contribution")
117
-
118
- # Show preview of what will be submitted
119
- with st.expander("Preview Your Contribution"):
120
- st.json({
121
- "Activity": self.activity_type.value,
122
- "Language": language,
123
- "Content": content_data,
124
- "Cultural Context": cultural_context
125
- })
126
-
127
- # Consent checkbox
128
- consent = st.checkbox(
129
- "I consent to sharing this content for cultural preservation and AI training purposes",
130
- key="consent_checkbox"
131
- )
132
-
133
- if not consent:
134
- st.info("Please provide consent to submit your contribution.")
135
- return None
136
-
137
- # Submit button
138
- if st.button("🚀 Submit Contribution", type="primary", key="submit_button"):
139
- return self._handle_submission(content_data, cultural_context, language)
140
-
141
- return None
142
-
143
- def _handle_submission(self, content_data: Dict[str, Any],
144
- cultural_context: Dict[str, Any],
145
- language: str) -> Optional[UserContribution]:
146
- """Handle the submission process with validation"""
147
-
148
- # Validate content
149
- is_valid_content, content_msg = self.validate_content(content_data)
150
- if not is_valid_content:
151
- st.error(f"Content validation failed: {content_msg}")
152
- return None
153
-
154
- # Validate cultural context
155
- is_valid_context, context_msg = self.validator.validate_cultural_context(cultural_context)
156
- if not is_valid_context:
157
- st.error(f"Cultural context validation failed: {context_msg}")
158
- return None
159
-
160
- # Create user contribution
161
- try:
162
- contribution = self.process_submission({
163
- "content_data": content_data,
164
- "cultural_context": cultural_context,
165
- "language": language
166
- })
167
-
168
- # Final validation
169
- is_valid_contribution, errors = self.validator.validate_user_contribution(contribution)
170
- if not is_valid_contribution:
171
- st.error("Contribution validation failed:")
172
- for error in errors:
173
- st.error(f"• {error}")
174
- return None
175
-
176
- # Success!
177
- st.success("🎉 Contribution submitted successfully!")
178
- st.balloons()
179
-
180
- # Show contribution ID
181
- st.info(f"Your contribution ID: `{contribution.id}`")
182
-
183
- return contribution
184
-
185
- except Exception as e:
186
- st.error(f"Error processing submission: {str(e)}")
187
- return None
188
-
189
- def render_activity_stats(self) -> None:
190
- """Render activity statistics and engagement info"""
191
- st.sidebar.markdown("---")
192
- st.sidebar.subheader("📊 Activity Stats")
193
-
194
- # Placeholder stats - will be replaced with real data later
195
- col1, col2 = st.sidebar.columns(2)
196
- with col1:
197
- st.metric("Contributions", "0", "0")
198
- with col2:
199
- st.metric("Languages", "0", "0")
200
-
201
- st.sidebar.info("Your contributions help preserve Indian cultural heritage!")
202
-
203
- def render_help_section(self) -> None:
204
- """Render help and tips section"""
205
- with st.sidebar.expander("💡 Tips & Help"):
206
- st.markdown(f"""
207
- **How to contribute:**
208
- 1. Fill in the content form
209
- 2. Select your language
210
- 3. Add cultural context
211
- 4. Submit your contribution
212
-
213
- **Quality tips:**
214
- - Be authentic and respectful
215
- - Provide cultural context
216
- - Use your native language
217
- - Share genuine experiences
218
- """)
219
-
220
- def run(self) -> None:
221
- """Main method to run the activity"""
222
- self.render_common_header()
223
- self.render_activity_stats()
224
- self.render_help_section()
225
- self.render_interface()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/activities/folklore_collector.py DELETED
@@ -1,553 +0,0 @@
1
- """
2
- Folklore Collector Activity - Preserve traditional stories, proverbs, and folk wisdom
3
- """
4
-
5
- import streamlit as st
6
- from typing import Dict, Any, Tuple, List, Optional
7
- from datetime import datetime
8
- import re
9
-
10
- from corpus_collection_engine.activities.base_activity import BaseActivity
11
- from corpus_collection_engine.models.data_models import UserContribution, ActivityType
12
- from corpus_collection_engine.services.storage_service import StorageService
13
- from corpus_collection_engine.services.ai_service import AIService
14
-
15
-
16
- class FolkloreCollectorActivity(BaseActivity):
17
- """Activity for collecting and preserving traditional folklore"""
18
-
19
- def __init__(self):
20
- super().__init__(ActivityType.FOLKLORE)
21
- self.storage_service = StorageService()
22
- self.ai_service = AIService()
23
-
24
- # Folklore types
25
- self.folklore_types = {
26
- "folktale": "Folk Tale / लोक कथा",
27
- "proverb": "Proverb / कहावत",
28
- "riddle": "Riddle / पहेली",
29
- "song": "Folk Song / लोक गीत",
30
- "legend": "Legend / किंवदंती",
31
- "myth": "Myth / पुराण कथा",
32
- "moral_story": "Moral Story / नैतिक कहानी",
33
- "historical_tale": "Historical Tale / ऐतिहासिक कथा",
34
- "wisdom_saying": "Wisdom Saying / ज्ञान की बात",
35
- "children_story": "Children's Story / बच्चों की कहानी"
36
- }
37
-
38
- # Story themes
39
- self.story_themes = [
40
- "Wisdom / बुद्धिमत्ता",
41
- "Courage / साहस",
42
- "Love / प्रेम",
43
- "Justice / न्याय",
44
- "Family / परिवार",
45
- "Nature / प्रकृति",
46
- "Animals / जानवर",
47
- "Gods & Goddesses / देवी-देवता",
48
- "Kings & Queens / राजा-रानी",
49
- "Farmers / किसान",
50
- "Merchants / व्यापारी",
51
- "Teachers / गुरु",
52
- "Festivals / त्योहार",
53
- "Seasons / ऋतुएं",
54
- "Good vs Evil / अच्छाई बनाम बुराई"
55
- ]
56
-
57
- # Age groups
58
- self.age_groups = [
59
- "Children (0-12) / बच्चे",
60
- "Teenagers (13-18) / किशोर",
61
- "Adults (19-60) / वयस्क",
62
- "Elders (60+) / बुजुर्ग",
63
- "All Ages / सभी उम्र"
64
- ]
65
-
66
- # Moral lessons
67
- self.moral_categories = [
68
- "Honesty / ईमानदारी",
69
- "Kindness / दयालुता",
70
- "Hard Work / मेहनत",
71
- "Respect / सम्मान",
72
- "Patience / धैर्य",
73
- "Humility / विनम्रता",
74
- "Generosity / उदारता",
75
- "Forgiveness / क्षमा",
76
- "Perseverance / दृढ़ता",
77
- "Unity / एकता"
78
- ]
79
-
80
- def render_interface(self) -> None:
81
- """Render the folklore collector interface"""
82
-
83
- # Step 1: Folklore Type Selection
84
- st.subheader("📚 Step 1: What Type of Folklore?")
85
-
86
- folklore_type = st.selectbox(
87
- "Select the type of folklore you want to share:",
88
- list(self.folklore_types.keys()),
89
- format_func=lambda x: self.folklore_types[x],
90
- key="folklore_type"
91
- )
92
-
93
- # Show description based on type
94
- type_descriptions = {
95
- "folktale": "Traditional stories passed down through generations",
96
- "proverb": "Short sayings that express wisdom or truth",
97
- "riddle": "Puzzling questions or statements requiring clever answers",
98
- "song": "Traditional songs with cultural significance",
99
- "legend": "Stories about historical figures or events",
100
- "myth": "Traditional stories explaining natural phenomena",
101
- "moral_story": "Stories that teach important life lessons",
102
- "historical_tale": "Stories based on historical events",
103
- "wisdom_saying": "Wise sayings from elders and ancestors",
104
- "children_story": "Stories specifically told to children"
105
- }
106
-
107
- st.info(f"📖 {type_descriptions.get(folklore_type, 'Traditional cultural content')}")
108
-
109
- # Step 2: Language Selection
110
- language = self.render_language_selector("folklore_language")
111
-
112
- # Step 3: Title and Content
113
- st.subheader("✍️ Step 2: Share Your Story")
114
-
115
- title = st.text_input(
116
- "Title / शीर्षक:",
117
- placeholder=f"Enter the title of your {folklore_type}...",
118
- key="folklore_title"
119
- )
120
-
121
- # Content input based on type
122
- if folklore_type in ["proverb", "wisdom_saying"]:
123
- content_placeholder = f"Enter the {folklore_type} in {language}...\n\nExample: 'Early to bed, early to rise, makes a man healthy, wealthy and wise.'"
124
- content_height = 100
125
- elif folklore_type == "riddle":
126
- content_placeholder = f"Enter the riddle and its answer in {language}...\n\nRiddle: What has keys but no locks?\nAnswer: A piano"
127
- content_height = 150
128
- else:
129
- content_placeholder = f"Tell your story in {language}...\n\nOnce upon a time..."
130
- content_height = 300
131
-
132
- story_content = st.text_area(
133
- f"{folklore_type.replace('_', ' ').title()} Content:",
134
- placeholder=content_placeholder,
135
- height=content_height,
136
- key="folklore_content"
137
- )
138
-
139
- # Step 4: Story Details
140
- st.subheader("🎭 Step 3: Story Details")
141
-
142
- col1, col2 = st.columns(2)
143
-
144
- with col1:
145
- themes = st.multiselect(
146
- "Themes / विषय:",
147
- self.story_themes,
148
- key="folklore_themes"
149
- )
150
-
151
- target_age = st.selectbox(
152
- "Target Age Group / लक्षित आयु समूह:",
153
- self.age_groups,
154
- key="target_age"
155
- )
156
-
157
- with col2:
158
- moral_lessons = st.multiselect(
159
- "Moral Lessons / नैतिक शिक्षा:",
160
- self.moral_categories,
161
- key="moral_lessons"
162
- )
163
-
164
- story_length = st.select_slider(
165
- "Story Length / कहानी की लंबाई:",
166
- options=["Very Short / बहुत छोटी", "Short / छोटी", "Medium / मध्यम", "Long / लंबी"],
167
- value="Medium / मध्यम",
168
- key="story_length"
169
- )
170
-
171
- # Step 5: Cultural Context and Source
172
- st.subheader("🏛️ Step 4: Cultural Context & Source")
173
-
174
- col1, col2 = st.columns(2)
175
-
176
- with col1:
177
- storyteller = st.text_input(
178
- "Who told you this story? / यह कहानी आपको किसने सुनाई?",
179
- placeholder="e.g., My grandmother, Village elder, Family friend",
180
- key="storyteller"
181
- )
182
-
183
- when_heard = st.text_input(
184
- "When did you first hear it? / पहली बार कब सुनी?",
185
- placeholder="e.g., Childhood, During festivals, Family gatherings",
186
- key="when_heard"
187
- )
188
-
189
- with col2:
190
- occasion = st.text_input(
191
- "Special Occasion / विशेष अवसर:",
192
- placeholder="e.g., Bedtime stories, Festival celebrations, Moral teaching",
193
- key="folklore_occasion"
194
- )
195
-
196
- variations = st.text_area(
197
- "Other Versions / अन्य रूप:",
198
- placeholder="Are there other versions of this story you know?",
199
- height=80,
200
- key="story_variations"
201
- )
202
-
203
- # Cultural context form
204
- cultural_context = self.render_cultural_context_form("folklore_cultural")
205
-
206
- # Add folklore-specific context
207
- cultural_context.update({
208
- "storyteller": storyteller.strip() if storyteller else "",
209
- "when_heard": when_heard.strip() if when_heard else "",
210
- "occasion": occasion.strip() if occasion else "",
211
- "variations": variations.strip() if variations else ""
212
- })
213
-
214
- # Step 6: Meaning and Interpretation
215
- st.subheader("💡 Step 5: Meaning & Interpretation")
216
-
217
- meaning = st.text_area(
218
- "What does this story mean to you? / इस कहानी का आपके लिए क्या अर्थ है?",
219
- placeholder="Explain the deeper meaning, lessons, or significance of this folklore...",
220
- height=120,
221
- key="story_meaning"
222
- )
223
-
224
- modern_relevance = st.text_area(
225
- "How is it relevant today? / आज के समय में यह कैसे प्रासंगिक है?",
226
- placeholder="How does this story apply to modern life?",
227
- height=100,
228
- key="modern_relevance"
229
- )
230
-
231
- # Step 7: AI Analysis (Optional)
232
- if st.checkbox("🤖 Get AI Analysis", key="ai_analysis"):
233
- self._render_ai_analysis(story_content, language, folklore_type)
234
-
235
- # Step 8: Preview and Submit
236
- st.subheader("👀 Step 6: Preview & Submit")
237
-
238
- if title and story_content:
239
- # Show preview
240
- with st.expander("📖 Folklore Preview"):
241
- self._render_folklore_preview(
242
- title, folklore_type, story_content, themes, moral_lessons,
243
- target_age, storyteller, meaning, language
244
- )
245
-
246
- # Prepare content data
247
- content_data = {
248
- "title": title,
249
- "folklore_type": folklore_type,
250
- "story": story_content,
251
- "themes": themes,
252
- "moral_lessons": moral_lessons,
253
- "target_age": target_age,
254
- "story_length": story_length,
255
- "meaning": meaning,
256
- "modern_relevance": modern_relevance
257
- }
258
-
259
- # Submit section
260
- contribution = self.render_submission_section(
261
- content_data, cultural_context, language
262
- )
263
-
264
- if contribution:
265
- # Save to storage
266
- success = self.storage_service.save_contribution(contribution)
267
- if success:
268
- st.success("🎉 Your folklore has been preserved in the cultural corpus!")
269
-
270
- # Show impact message
271
- with st.expander("🌟 Why This Matters"):
272
- st.markdown(f"""
273
- Your {folklore_type} in **{language}** helps preserve:
274
- - Traditional wisdom and moral teachings
275
- - Cultural stories and their meanings
276
- - Regional storytelling traditions
277
- - Intergenerational knowledge transfer
278
-
279
- Thank you for keeping our cultural heritage alive! 📚
280
- """)
281
-
282
- # Suggest related activities
283
- st.markdown("### 🔗 Keep Contributing!")
284
- col1, col2 = st.columns(2)
285
- with col1:
286
- if st.button("📝 Share Another Story"):
287
- # Clear form
288
- for key in list(st.session_state.keys()):
289
- if key.startswith('folklore_'):
290
- del st.session_state[key]
291
- st.rerun()
292
-
293
- with col2:
294
- if st.button("🍛 Share a Recipe"):
295
- st.session_state.current_activity = ActivityType.RECIPE
296
- st.rerun()
297
- else:
298
- st.error("Failed to save your folklore. Please try again.")
299
- else:
300
- st.warning("Please provide both a title and the story content!")
301
-
302
- def _render_ai_analysis(self, content: str, language: str, folklore_type: str):
303
- """Render AI analysis of the folklore"""
304
- if content:
305
- st.markdown("**🤖 AI Analysis:**")
306
-
307
- col1, col2 = st.columns(2)
308
-
309
- with col1:
310
- if st.button("🎭 Analyze Themes", key="ai_themes"):
311
- with st.spinner("Analyzing themes..."):
312
- tags = self.ai_service.suggest_cultural_tags(content, language)
313
- if tags:
314
- st.info(f"🎭 **Detected themes:** {', '.join(tags[:6])}")
315
-
316
- if st.button("💭 Extract Wisdom", key="ai_wisdom"):
317
- with st.spinner("Extracting wisdom..."):
318
- keywords = self.ai_service.extract_keywords(content, language, max_keywords=8)
319
- if keywords:
320
- st.info(f"💭 **Key concepts:** {', '.join(keywords)}")
321
-
322
- with col2:
323
- if st.button("😊 Analyze Sentiment", key="ai_sentiment"):
324
- sentiment = self.ai_service.analyze_sentiment(content, language)
325
- if sentiment:
326
- dominant_sentiment = max(sentiment, key=sentiment.get)
327
- st.info(f"😊 **Overall tone:** {dominant_sentiment.title()} ({sentiment[dominant_sentiment]:.1%})")
328
-
329
- if st.button("🎯 Suggest Moral", key="ai_moral"):
330
- with st.spinner("Analyzing moral lessons..."):
331
- moral_prompt = f"moral lesson from this {folklore_type}: {content[:200]}"
332
- moral, confidence = self.ai_service.generate_text(moral_prompt, language, max_length=100)
333
- if moral:
334
- st.info(f"🎯 **Suggested moral:** {moral}")
335
-
336
- def _render_folklore_preview(self, title: str, folklore_type: str, content: str,
337
- themes: List[str], moral_lessons: List[str],
338
- target_age: str, storyteller: str, meaning: str, language: str):
339
- """Render folklore preview"""
340
- st.markdown(f"# {title}")
341
- st.markdown(f"**Type:** {self.folklore_types[folklore_type]}")
342
- st.markdown(f"**Language:** {language}")
343
-
344
- if target_age:
345
- st.markdown(f"**Target Age:** {target_age}")
346
-
347
- if storyteller:
348
- st.markdown(f"**Storyteller:** {storyteller}")
349
-
350
- st.markdown("## Story Content")
351
- st.markdown(content)
352
-
353
- if themes:
354
- st.markdown(f"**Themes:** {', '.join(themes)}")
355
-
356
- if moral_lessons:
357
- st.markdown(f"**Moral Lessons:** {', '.join(moral_lessons)}")
358
-
359
- if meaning:
360
- st.markdown("## Meaning & Significance")
361
- st.markdown(meaning)
362
-
363
- def validate_content(self, content: Dict[str, Any]) -> Tuple[bool, str]:
364
- """Validate folklore content"""
365
- # Check required fields
366
- if not content.get("title"):
367
- return False, "Folklore must have a title"
368
-
369
- if not content.get("story"):
370
- return False, "Folklore must include the story content"
371
-
372
- # Validate title
373
- title = content["title"].strip()
374
- if len(title) < 3:
375
- return False, "Title must be at least 3 characters long"
376
- if len(title) > 150:
377
- return False, "Title must be less than 150 characters"
378
-
379
- # Validate story content
380
- story = content["story"].strip()
381
- if len(story) < 50:
382
- return False, "Story content must be at least 50 characters long"
383
- if len(story) > 10000:
384
- return False, "Story content must be less than 10,000 characters"
385
-
386
- # Check for folklore type
387
- if not content.get("folklore_type"):
388
- return False, "Folklore type must be specified"
389
-
390
- if content["folklore_type"] not in self.folklore_types:
391
- return False, "Invalid folklore type"
392
-
393
- # Validate content based on type
394
- folklore_type = content["folklore_type"]
395
-
396
- if folklore_type == "proverb" and len(story) > 500:
397
- return False, "Proverbs should be concise (less than 500 characters)"
398
-
399
- if folklore_type == "riddle":
400
- # Check if riddle has both question and answer
401
- if "?" not in story:
402
- return False, "Riddles should contain a question"
403
-
404
- return True, "Valid folklore content"
405
-
406
- def process_submission(self, data: Dict[str, Any]) -> UserContribution:
407
- """Process folklore submission and create UserContribution"""
408
- # Get session ID from router if available
409
- session_id = st.session_state.get('user_session_id', 'anonymous')
410
-
411
- # Calculate content statistics
412
- story_content = data["content_data"].get("story", "")
413
- word_count = len(story_content.split())
414
- char_count = len(story_content)
415
-
416
- return UserContribution(
417
- user_session=session_id,
418
- activity_type=self.activity_type,
419
- content_data=data["content_data"],
420
- language=data["language"],
421
- cultural_context=data["cultural_context"],
422
- metadata={
423
- "folklore_type": data["content_data"].get("folklore_type"),
424
- "word_count": word_count,
425
- "character_count": char_count,
426
- "themes_count": len(data["content_data"].get("themes", [])),
427
- "moral_lessons_count": len(data["content_data"].get("moral_lessons", [])),
428
- "target_age": data["content_data"].get("target_age"),
429
- "story_length": data["content_data"].get("story_length"),
430
- "has_meaning": bool(data["content_data"].get("meaning", "").strip()),
431
- "has_modern_relevance": bool(data["content_data"].get("modern_relevance", "").strip()),
432
- "submission_timestamp": datetime.now().isoformat(),
433
- "activity_version": "1.0"
434
- }
435
- )
436
-
437
- def render_folklore_gallery(self):
438
- """Render gallery of recent folklore"""
439
- st.subheader("📚 Community Folklore Collection")
440
-
441
- # Get recent folklore from storage
442
- recent_contributions = self.storage_service.get_contributions_by_language(
443
- st.session_state.get('selected_language', 'hi'), limit=12
444
- )
445
-
446
- folklore_contributions = [
447
- contrib for contrib in recent_contributions
448
- if contrib.activity_type == ActivityType.FOLKLORE
449
- ]
450
-
451
- if folklore_contributions:
452
- # Group by folklore type
453
- folklore_by_type = {}
454
- for contrib in folklore_contributions:
455
- folklore_type = contrib.content_data.get('folklore_type', 'unknown')
456
- if folklore_type not in folklore_by_type:
457
- folklore_by_type[folklore_type] = []
458
- folklore_by_type[folklore_type].append(contrib)
459
-
460
- # Display by type
461
- for folklore_type, contributions in folklore_by_type.items():
462
- if folklore_type in self.folklore_types:
463
- st.markdown(f"### {self.folklore_types[folklore_type]}")
464
-
465
- cols = st.columns(min(3, len(contributions)))
466
- for i, contrib in enumerate(contributions[:3]):
467
- col = cols[i % 3]
468
- with col:
469
- with st.container():
470
- st.markdown(f"**{contrib.content_data.get('title', 'Untitled')}**")
471
-
472
- # Story preview
473
- story = contrib.content_data.get('story', '')
474
- if story:
475
- preview = story[:100] + "..." if len(story) > 100 else story
476
- st.markdown(f"*{preview}*")
477
-
478
- # Metadata
479
- st.markdown(f"🌐 {contrib.language}")
480
- if contrib.cultural_context.get("region"):
481
- st.markdown(f"📍 {contrib.cultural_context['region']}")
482
-
483
- # Themes
484
- themes = contrib.content_data.get('themes', [])
485
- if themes:
486
- st.markdown(f"🎭 {', '.join(themes[:2])}")
487
-
488
- # Storyteller
489
- storyteller = contrib.cultural_context.get('storyteller', '')
490
- if storyteller:
491
- st.markdown(f"👤 Told by: {storyteller}")
492
-
493
- st.markdown("---")
494
- else:
495
- st.info("No folklore yet. Be the first to share a traditional story! 📚")
496
-
497
- def render_folklore_statistics(self):
498
- """Render folklore collection statistics"""
499
- st.subheader("📊 Folklore Statistics")
500
-
501
- # Get all folklore contributions
502
- all_contributions = []
503
- for lang in ["hi", "bn", "ta", "te", "ml", "kn", "gu", "mr", "pa", "or", "en"]:
504
- contributions = self.storage_service.get_contributions_by_language(lang, limit=1000)
505
- folklore_contribs = [c for c in contributions if c.activity_type == ActivityType.FOLKLORE]
506
- all_contributions.extend(folklore_contribs)
507
-
508
- if all_contributions:
509
- # Statistics by type
510
- type_counts = {}
511
- for contrib in all_contributions:
512
- folklore_type = contrib.content_data.get('folklore_type', 'unknown')
513
- type_counts[folklore_type] = type_counts.get(folklore_type, 0) + 1
514
-
515
- # Display statistics
516
- col1, col2 = st.columns(2)
517
-
518
- with col1:
519
- st.markdown("**By Type:**")
520
- for folklore_type, count in sorted(type_counts.items(), key=lambda x: x[1], reverse=True):
521
- if folklore_type in self.folklore_types:
522
- type_name = self.folklore_types[folklore_type].split(' / ')[0]
523
- st.markdown(f"- {type_name}: {count}")
524
-
525
- with col2:
526
- # Language distribution
527
- lang_counts = {}
528
- for contrib in all_contributions:
529
- lang = contrib.language
530
- lang_counts[lang] = lang_counts.get(lang, 0) + 1
531
-
532
- st.markdown("**By Language:**")
533
- for lang, count in sorted(lang_counts.items(), key=lambda x: x[1], reverse=True)[:5]:
534
- from corpus_collection_engine.config import SUPPORTED_LANGUAGES
535
- lang_name = SUPPORTED_LANGUAGES.get(lang, lang)
536
- st.markdown(f"- {lang_name}: {count}")
537
- else:
538
- st.info("No folklore statistics available yet.")
539
-
540
- def run(self):
541
- """Override run method to add gallery and statistics"""
542
- super().run()
543
-
544
- # Add gallery and statistics sections
545
- st.markdown("---")
546
-
547
- tab1, tab2 = st.tabs(["📚 Community Gallery", "📊 Statistics"])
548
-
549
- with tab1:
550
- self.render_folklore_gallery()
551
-
552
- with tab2:
553
- self.render_folklore_statistics()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/activities/landmark_identifier.py DELETED
@@ -1,535 +0,0 @@
1
- """
2
- Landmark Identifier Activity - Upload photos of cultural landmarks with descriptions
3
- """
4
-
5
- import streamlit as st
6
- from typing import Dict, Any, Tuple, List, Optional
7
- from datetime import datetime
8
- import base64
9
- from PIL import Image
10
- import io
11
-
12
- from corpus_collection_engine.activities.base_activity import BaseActivity
13
- from corpus_collection_engine.models.data_models import UserContribution, ActivityType
14
- from corpus_collection_engine.services.storage_service import StorageService
15
- from corpus_collection_engine.services.ai_service import AIService
16
-
17
-
18
- class LandmarkIdentifierActivity(BaseActivity):
19
- """Activity for documenting cultural landmarks with photos and descriptions"""
20
-
21
- def __init__(self):
22
- super().__init__(ActivityType.LANDMARK)
23
- self.storage_service = StorageService()
24
- self.ai_service = AIService()
25
-
26
- # Landmark categories
27
- self.landmark_categories = {
28
- "temple": "Temple / मंदिर",
29
- "mosque": "Mosque / मस्जिद",
30
- "church": "Church / गिरजाघर",
31
- "gurudwara": "Gurudwara / गुरुद्वारा",
32
- "monument": "Monument / स्मारक",
33
- "palace": "Palace / महल",
34
- "fort": "Fort / किला",
35
- "museum": "Museum / संग्रहालय",
36
- "heritage_site": "Heritage Site / विरासत स्थल",
37
- "market": "Traditional Market / पारंपरिक बाजार",
38
- "garden": "Garden / बगीचा",
39
- "lake": "Lake / झील",
40
- "mountain": "Mountain / पहाड़",
41
- "river": "River / नदी",
42
- "village": "Village / गांव",
43
- "architecture": "Architecture / वास्तुकला",
44
- "cultural_center": "Cultural Center / सांस्कृतिक केंद्र",
45
- "festival_ground": "Festival Ground / त्योहार स्थल",
46
- "other": "Other / अन्य"
47
- }
48
-
49
- # Historical periods
50
- self.historical_periods = [
51
- "Ancient (Before 500 CE) / प्राचीन",
52
- "Classical (500-1200 CE) / शास्त्रीय",
53
- "Medieval (1200-1700 CE) / मध्यकालीन",
54
- "Colonial (1700-1947 CE) / औपनिवेशिक",
55
- "Modern (1947-Present) / आधुनिक",
56
- "Unknown / अज्ञात"
57
- ]
58
-
59
- # Architectural styles
60
- self.architectural_styles = [
61
- "Dravidian / द्रविड़",
62
- "Nagara / नागर",
63
- "Indo-Islamic / इंडो-इस्लामिक",
64
- "Mughal / मुगल",
65
- "Rajput / राजपूत",
66
- "Colonial / औपनिवेशिक",
67
- "Modern / आधुनिक",
68
- "Vernacular / स्थानीय",
69
- "Mixed / मिश्रित",
70
- "Other / अन्य"
71
- ]
72
-
73
- # Significance types
74
- self.significance_types = [
75
- "Religious / धार्मिक",
76
- "Historical / ऐतिहासिक",
77
- "Cultural / सांस्कृतिक",
78
- "Architectural / वास्तुकला",
79
- "Natural / प्राकृतिक",
80
- "Educational / शैक्षणिक",
81
- "Commercial / व्यावसायिक",
82
- "Social / सामाजिक",
83
- "Political / राजनीतिक",
84
- "Artistic / कलात्मक"
85
- ]
86
-
87
- def render_interface(self) -> None:
88
- """Render the landmark identifier interface"""
89
-
90
- # Step 1: Photo Upload
91
- st.subheader("📸 Step 1: Upload Landmark Photo")
92
-
93
- uploaded_image = st.file_uploader(
94
- "Choose a photo of the landmark:",
95
- type=['png', 'jpg', 'jpeg', 'webp'],
96
- key="landmark_image_upload",
97
- help="Upload a clear photo of the cultural landmark you want to document"
98
- )
99
-
100
- if uploaded_image:
101
- # Display uploaded image
102
- image = Image.open(uploaded_image)
103
-
104
- # Resize for display if too large
105
- display_image = image.copy()
106
- display_image.thumbnail((800, 600), Image.Resampling.LANCZOS)
107
-
108
- st.image(display_image, caption="Uploaded Landmark Photo", use_container_width=True)
109
-
110
- # Image metadata
111
- col1, col2, col3 = st.columns(3)
112
- with col1:
113
- st.metric("Width", f"{image.width}px")
114
- with col2:
115
- st.metric("Height", f"{image.height}px")
116
- with col3:
117
- file_size = len(uploaded_image.getvalue()) / 1024
118
- st.metric("Size", f"{file_size:.1f} KB")
119
-
120
- # Step 2: Basic Information
121
- st.subheader("ℹ️ Step 2: Basic Information")
122
-
123
- col1, col2 = st.columns(2)
124
-
125
- with col1:
126
- landmark_name = st.text_input(
127
- "Landmark Name / स्थल का नाम:",
128
- placeholder="e.g., Red Fort, Meenakshi Temple",
129
- key="landmark_name"
130
- )
131
-
132
- category = st.selectbox(
133
- "Category / श्रेणी:",
134
- list(self.landmark_categories.keys()),
135
- format_func=lambda x: self.landmark_categories[x],
136
- key="landmark_category"
137
- )
138
-
139
- with col2:
140
- location = st.text_input(
141
- "Location / स्थान:",
142
- placeholder="e.g., Delhi, Madurai, Tamil Nadu",
143
- key="landmark_location"
144
- )
145
-
146
- historical_period = st.selectbox(
147
- "Historical Period / ऐतिहासिक काल:",
148
- self.historical_periods,
149
- key="historical_period"
150
- )
151
-
152
- # Step 3: Language Selection
153
- language = self.render_language_selector("landmark_language")
154
-
155
- # Step 4: Description
156
- st.subheader("📝 Step 3: Description")
157
-
158
- description = st.text_area(
159
- f"Describe the landmark in {language}:",
160
- placeholder=f"Write a detailed description of the landmark in {language}...\n\nInclude:\n- What you see in the photo\n- Historical significance\n- Architectural features\n- Cultural importance\n- Personal experience or memories",
161
- height=200,
162
- key="landmark_description"
163
- )
164
-
165
- # Step 5: Detailed Information
166
- st.subheader("🏛️ Step 4: Detailed Information")
167
-
168
- col1, col2 = st.columns(2)
169
-
170
- with col1:
171
- architectural_style = st.multiselect(
172
- "Architectural Style / वास्तुकला शैली:",
173
- self.architectural_styles,
174
- key="architectural_style"
175
- )
176
-
177
- significance = st.multiselect(
178
- "Significance / महत्व:",
179
- self.significance_types,
180
- key="landmark_significance"
181
- )
182
-
183
- with col2:
184
- built_by = st.text_input(
185
- "Built by / निर्माता:",
186
- placeholder="e.g., Shah Jahan, Chola Dynasty",
187
- key="built_by"
188
- )
189
-
190
- year_built = st.text_input(
191
- "Year Built / निर्माण वर्ष:",
192
- placeholder="e.g., 1648, 12th Century",
193
- key="year_built"
194
- )
195
-
196
- # Step 6: Cultural Context
197
- st.subheader("🎭 Step 5: Cultural Context")
198
-
199
- col1, col2 = st.columns(2)
200
-
201
- with col1:
202
- festivals_events = st.text_area(
203
- "Festivals/Events / त्योहार/कार्यक्रम:",
204
- placeholder="What festivals or events happen here?",
205
- height=100,
206
- key="festivals_events"
207
- )
208
-
209
- local_legends = st.text_area(
210
- "Local Legends/Stories / स्थानीय किंवदंतियां:",
211
- placeholder="Any interesting stories or legends about this place?",
212
- height=100,
213
- key="local_legends"
214
- )
215
-
216
- with col2:
217
- visiting_tips = st.text_area(
218
- "Visiting Tips / यात्रा सुझाव:",
219
- placeholder="Best time to visit, entry fees, special rules, etc.",
220
- height=100,
221
- key="visiting_tips"
222
- )
223
-
224
- personal_experience = st.text_area(
225
- "Personal Experience / व्यक्तिगत अनुभव:",
226
- placeholder="Your personal experience or connection to this place",
227
- height=100,
228
- key="personal_experience"
229
- )
230
-
231
- # Cultural context form
232
- cultural_context = self.render_cultural_context_form("landmark_cultural")
233
-
234
- # Add landmark-specific context
235
- cultural_context.update({
236
- "festivals_events": festivals_events.strip() if festivals_events else "",
237
- "local_legends": local_legends.strip() if local_legends else "",
238
- "visiting_tips": visiting_tips.strip() if visiting_tips else "",
239
- "personal_experience": personal_experience.strip() if personal_experience else "",
240
- "built_by": built_by.strip() if built_by else "",
241
- "year_built": year_built.strip() if year_built else ""
242
- })
243
-
244
- # Step 7: AI Analysis (Optional)
245
- if uploaded_image and st.checkbox("🤖 Get AI Analysis", key="ai_analysis"):
246
- self._render_ai_analysis(uploaded_image, description, language)
247
-
248
- # Step 8: Preview and Submit
249
- st.subheader("👀 Step 6: Preview & Submit")
250
-
251
- if uploaded_image and landmark_name and description:
252
- # Show preview
253
- with st.expander("🏛️ Landmark Preview"):
254
- self._render_landmark_preview(
255
- landmark_name, category, location, description,
256
- architectural_style, significance, historical_period,
257
- built_by, year_built, language, display_image
258
- )
259
-
260
- # Prepare content data
261
- content_data = {
262
- "name": landmark_name,
263
- "category": category,
264
- "location": location,
265
- "description": description,
266
- "historical_period": historical_period,
267
- "architectural_style": architectural_style,
268
- "significance": significance,
269
- "image_data": self._image_to_base64(image)
270
- }
271
-
272
- # Submit section
273
- contribution = self.render_submission_section(
274
- content_data, cultural_context, language
275
- )
276
-
277
- if contribution:
278
- # Save to storage
279
- success = self.storage_service.save_contribution(contribution)
280
- if success:
281
- st.success("🎉 Your landmark has been documented in the cultural corpus!")
282
-
283
- # Show impact message
284
- with st.expander("🌟 Why This Matters"):
285
- st.markdown(f"""
286
- Your landmark documentation in **{language}** helps preserve:
287
- - Visual and textual records of cultural heritage
288
- - Architectural and historical knowledge
289
- - Local stories and cultural significance
290
- - Tourism and educational resources
291
-
292
- Thank you for documenting India's rich cultural landscape! 🏛️
293
- """)
294
-
295
- # Clear form
296
- if st.button("📸 Document Another Landmark"):
297
- # Clear session state
298
- for key in list(st.session_state.keys()):
299
- if key.startswith('landmark_'):
300
- del st.session_state[key]
301
- st.rerun()
302
- else:
303
- st.error("Failed to save your landmark documentation. Please try again.")
304
- else:
305
- missing_items = []
306
- if not uploaded_image:
307
- missing_items.append("photo")
308
- if not landmark_name:
309
- missing_items.append("landmark name")
310
- if not description:
311
- missing_items.append("description")
312
-
313
- st.warning(f"Please provide: {', '.join(missing_items)}")
314
-
315
- def _render_ai_analysis(self, uploaded_image: Any, description: str, language: str):
316
- """Render AI analysis of the landmark"""
317
- st.markdown("**🤖 AI Analysis:**")
318
-
319
- col1, col2 = st.columns(2)
320
-
321
- with col1:
322
- if st.button("🏛️ Analyze Architecture", key="ai_architecture"):
323
- with st.spinner("Analyzing architectural features..."):
324
- # Generate architectural analysis
325
- arch_prompt = f"architectural features of this landmark: {description[:200]}"
326
- analysis, confidence = self.ai_service.generate_text(arch_prompt, language, max_length=150)
327
- if analysis:
328
- st.info(f"🏛️ **Architecture:** {analysis}")
329
-
330
- if st.button("📍 Extract Location Info", key="ai_location"):
331
- with st.spinner("Extracting location information..."):
332
- keywords = self.ai_service.extract_keywords(description, language, max_keywords=6)
333
- if keywords:
334
- st.info(f"📍 **Key features:** {', '.join(keywords)}")
335
-
336
- with col2:
337
- if st.button("🎭 Analyze Cultural Significance", key="ai_culture"):
338
- with st.spinner("Analyzing cultural significance..."):
339
- tags = self.ai_service.suggest_cultural_tags(description, language)
340
- if tags:
341
- st.info(f"🎭 **Cultural tags:** {', '.join(tags[:5])}")
342
-
343
- if st.button("���� Generate Caption", key="ai_caption"):
344
- with st.spinner("Generating photo caption..."):
345
- caption, confidence = self.ai_service.generate_caption(description[:100], language)
346
- if caption:
347
- st.info(f"📚 **Suggested caption:** {caption}")
348
-
349
- def _render_landmark_preview(self, name: str, category: str, location: str,
350
- description: str, architectural_style: List[str],
351
- significance: List[str], historical_period: str,
352
- built_by: str, year_built: str, language: str, image: Image.Image):
353
- """Render landmark preview"""
354
- st.image(image, caption=name, use_container_width=True)
355
-
356
- st.markdown(f"# {name}")
357
- st.markdown(f"**Category:** {self.landmark_categories[category]}")
358
- st.markdown(f"**Location:** {location}")
359
- st.markdown(f"**Language:** {language}")
360
-
361
- if historical_period:
362
- st.markdown(f"**Historical Period:** {historical_period}")
363
-
364
- if built_by:
365
- st.markdown(f"**Built by:** {built_by}")
366
-
367
- if year_built:
368
- st.markdown(f"**Year Built:** {year_built}")
369
-
370
- if architectural_style:
371
- st.markdown(f"**Architectural Style:** {', '.join(architectural_style)}")
372
-
373
- if significance:
374
- st.markdown(f"**Significance:** {', '.join(significance)}")
375
-
376
- st.markdown("## Description")
377
- st.markdown(description)
378
-
379
- def _image_to_base64(self, image: Image.Image) -> str:
380
- """Convert PIL Image to base64 string"""
381
- buffer = io.BytesIO()
382
- # Convert to RGB if necessary
383
- if image.mode in ('RGBA', 'LA', 'P'):
384
- image = image.convert('RGB')
385
- image.save(buffer, format="JPEG", quality=85)
386
- img_str = base64.b64encode(buffer.getvalue()).decode()
387
- return img_str
388
-
389
- def validate_content(self, content: Dict[str, Any]) -> Tuple[bool, str]:
390
- """Validate landmark content"""
391
- # Check required fields
392
- required_fields = ["name", "description"]
393
- for field in required_fields:
394
- if not content.get(field):
395
- return False, f"Landmark must include {field}"
396
-
397
- # Validate name
398
- name = content["name"].strip()
399
- if len(name) < 3:
400
- return False, "Landmark name must be at least 3 characters long"
401
- if len(name) > 100:
402
- return False, "Landmark name must be less than 100 characters"
403
-
404
- # Validate description
405
- description = content["description"].strip()
406
- if len(description) < 20:
407
- return False, "Description must be at least 20 characters long"
408
- if len(description) > 3000:
409
- return False, "Description must be less than 3000 characters"
410
-
411
- # Check category
412
- if not content.get("category"):
413
- return False, "Landmark category must be specified"
414
-
415
- if content["category"] not in self.landmark_categories:
416
- return False, "Invalid landmark category"
417
-
418
- # Validate image data
419
- if not content.get("image_data"):
420
- return False, "Landmark photo is required"
421
-
422
- return True, "Valid landmark content"
423
-
424
- def process_submission(self, data: Dict[str, Any]) -> UserContribution:
425
- """Process landmark submission and create UserContribution"""
426
- # Get session ID from router if available
427
- session_id = st.session_state.get('user_session_id', 'anonymous')
428
-
429
- # Calculate content statistics
430
- description = data["content_data"].get("description", "")
431
- word_count = len(description.split())
432
- char_count = len(description)
433
-
434
- return UserContribution(
435
- user_session=session_id,
436
- activity_type=self.activity_type,
437
- content_data=data["content_data"],
438
- language=data["language"],
439
- cultural_context=data["cultural_context"],
440
- metadata={
441
- "landmark_category": data["content_data"].get("category"),
442
- "location": data["content_data"].get("location"),
443
- "historical_period": data["content_data"].get("historical_period"),
444
- "architectural_styles": data["content_data"].get("architectural_style", []),
445
- "significance_types": data["content_data"].get("significance", []),
446
- "word_count": word_count,
447
- "character_count": char_count,
448
- "has_image": bool(data["content_data"].get("image_data")),
449
- "has_location": bool(data["content_data"].get("location", "").strip()),
450
- "has_historical_info": bool(data["cultural_context"].get("built_by", "").strip() or data["cultural_context"].get("year_built", "").strip()),
451
- "submission_timestamp": datetime.now().isoformat(),
452
- "activity_version": "1.0"
453
- }
454
- )
455
-
456
- def render_landmark_gallery(self):
457
- """Render gallery of documented landmarks"""
458
- st.subheader("🏛️ Community Landmark Collection")
459
-
460
- # Get recent landmarks from storage
461
- recent_contributions = self.storage_service.get_contributions_by_language(
462
- st.session_state.get('selected_language', 'hi'), limit=12
463
- )
464
-
465
- landmark_contributions = [
466
- contrib for contrib in recent_contributions
467
- if contrib.activity_type == ActivityType.LANDMARK
468
- ]
469
-
470
- if landmark_contributions:
471
- # Display landmarks in grid
472
- cols = st.columns(3)
473
- for i, contrib in enumerate(landmark_contributions[:12]):
474
- col = cols[i % 3]
475
- with col:
476
- with st.container():
477
- # Display image if available
478
- image_data = contrib.content_data.get('image_data')
479
- if image_data:
480
- try:
481
- image_bytes = base64.b64decode(image_data)
482
- image = Image.open(io.BytesIO(image_bytes))
483
- st.image(image, use_container_width=True)
484
- except:
485
- st.info("📸 Image not available")
486
-
487
- st.markdown(f"**{contrib.content_data.get('name', 'Unnamed Landmark')}**")
488
-
489
- # Category and location
490
- category = contrib.content_data.get('category', 'unknown')
491
- if category in self.landmark_categories:
492
- st.markdown(f"*{self.landmark_categories[category]}*")
493
-
494
- location = contrib.content_data.get('location', '')
495
- if location:
496
- st.markdown(f"📍 {location}")
497
-
498
- # Description preview
499
- description = contrib.content_data.get('description', '')
500
- if description:
501
- preview = description[:80] + "..." if len(description) > 80 else description
502
- st.markdown(f"📝 {preview}")
503
-
504
- # Language and region
505
- st.markdown(f"🌐 {contrib.language}")
506
- if contrib.cultural_context.get("region"):
507
- st.markdown(f"🏛️ {contrib.cultural_context['region']}")
508
-
509
- st.markdown("---")
510
- else:
511
- st.info("No landmarks documented yet. Be the first to share a cultural landmark! 📸")
512
-
513
- def render_landmark_map(self):
514
- """Render interactive map of landmarks (placeholder)"""
515
- st.subheader("🗺️ Landmark Map")
516
- st.info("Interactive map feature coming soon! This will show all documented landmarks on a map of India.")
517
-
518
- # Placeholder for map functionality
519
- # In a full implementation, you would integrate with mapping libraries
520
- # like Folium, Plotly, or Google Maps to show landmark locations
521
-
522
- def run(self):
523
- """Override run method to add gallery and map options"""
524
- super().run()
525
-
526
- # Add gallery and map sections
527
- st.markdown("---")
528
-
529
- tab1, tab2 = st.tabs(["🏛️ Community Gallery", "🗺️ Landmark Map"])
530
-
531
- with tab1:
532
- self.render_landmark_gallery()
533
-
534
- with tab2:
535
- self.render_landmark_map()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/activities/meme_creator.py DELETED
@@ -1,331 +0,0 @@
1
- """
2
- Meme Creator Activity - Create memes with captions in local dialects
3
- """
4
-
5
- import streamlit as st
6
- import io
7
- from PIL import Image, ImageDraw, ImageFont
8
- from typing import Dict, Any, Tuple, Optional
9
- import base64
10
- from datetime import datetime
11
-
12
- from corpus_collection_engine.activities.base_activity import BaseActivity
13
- from corpus_collection_engine.models.data_models import UserContribution, ActivityType
14
- from corpus_collection_engine.services.storage_service import StorageService
15
-
16
-
17
- class MemeCreatorActivity(BaseActivity):
18
- """Activity for creating memes with cultural captions"""
19
-
20
- def __init__(self):
21
- super().__init__(ActivityType.MEME)
22
- self.storage_service = StorageService()
23
-
24
- # Default meme templates (placeholder images)
25
- self.meme_templates = {
26
- "distracted_boyfriend": {
27
- "name": "Distracted Boyfriend",
28
- "description": "Classic meme template",
29
- "text_positions": [(100, 50), (300, 50), (500, 50)]
30
- },
31
- "drake_pointing": {
32
- "name": "Drake Pointing",
33
- "description": "Drake approval/disapproval meme",
34
- "text_positions": [(200, 100), (200, 300)]
35
- },
36
- "woman_yelling_cat": {
37
- "name": "Woman Yelling at Cat",
38
- "description": "Woman pointing at confused cat",
39
- "text_positions": [(100, 50), (400, 50)]
40
- },
41
- "custom": {
42
- "name": "Upload Your Own",
43
- "description": "Upload your own image",
44
- "text_positions": [(100, 50)]
45
- }
46
- }
47
-
48
- def render_interface(self) -> None:
49
- """Render the meme creator interface"""
50
- # Step 1: Template Selection
51
- st.subheader("🎭 Step 1: Choose Meme Template")
52
-
53
- template_options = list(self.meme_templates.keys())
54
- template_labels = [self.meme_templates[key]["name"] for key in template_options]
55
-
56
- selected_template = st.selectbox(
57
- "Select a meme template:",
58
- template_options,
59
- format_func=lambda x: self.meme_templates[x]["name"],
60
- key="meme_template"
61
- )
62
-
63
- st.info(f"📝 {self.meme_templates[selected_template]['description']}")
64
-
65
- # Step 2: Image Upload (if custom template)
66
- uploaded_image = None
67
- if selected_template == "custom":
68
- st.subheader("📸 Step 2: Upload Your Image")
69
- uploaded_image = st.file_uploader(
70
- "Choose an image file",
71
- type=['png', 'jpg', 'jpeg', 'webp'],
72
- key="meme_image_upload"
73
- )
74
-
75
- if uploaded_image:
76
- # Display uploaded image
77
- image = Image.open(uploaded_image)
78
- st.image(image, caption="Uploaded Image", use_container_width=True)
79
- else:
80
- # Show placeholder for template
81
- st.subheader("📸 Step 2: Template Preview")
82
- st.info(f"Using template: {self.meme_templates[selected_template]['name']}")
83
- # In a real implementation, you'd show the actual template image
84
- st.image("https://via.placeholder.com/500x300/cccccc/666666?text=Meme+Template",
85
- caption=f"Template: {self.meme_templates[selected_template]['name']}")
86
-
87
- # Step 3: Text Input
88
- st.subheader("✍️ Step 3: Add Your Caption")
89
-
90
- # Language selection
91
- language = self.render_language_selector("meme_language")
92
-
93
- # Text inputs based on template
94
- text_inputs = []
95
- num_texts = len(self.meme_templates[selected_template]["text_positions"])
96
-
97
- for i in range(num_texts):
98
- text_label = f"Text {i+1}" if num_texts > 1 else "Caption"
99
- text_input = st.text_area(
100
- f"{text_label}:",
101
- placeholder=f"Enter your {text_label.lower()} in {language}...",
102
- key=f"meme_text_{i}",
103
- height=80
104
- )
105
- text_inputs.append(text_input)
106
-
107
- # Step 4: Meme Style Options
108
- st.subheader("🎨 Step 4: Style Options")
109
-
110
- col1, col2 = st.columns(2)
111
- with col1:
112
- font_size = st.slider("Font Size", 20, 60, 40, key="meme_font_size")
113
- text_color = st.color_picker("Text Color", "#FFFFFF", key="meme_text_color")
114
-
115
- with col2:
116
- outline_color = st.color_picker("Outline Color", "#000000", key="meme_outline_color")
117
- outline_width = st.slider("Outline Width", 0, 5, 2, key="meme_outline_width")
118
-
119
- # Step 5: Cultural Context
120
- cultural_context = self.render_cultural_context_form("meme_cultural")
121
-
122
- # Step 6: Preview and Generate
123
- st.subheader("👀 Step 5: Preview & Generate")
124
-
125
- if st.button("🎨 Generate Meme Preview", key="generate_meme"):
126
- if any(text.strip() for text in text_inputs):
127
- # Generate meme preview
128
- meme_image = self._generate_meme(
129
- selected_template, uploaded_image, text_inputs,
130
- font_size, text_color, outline_color, outline_width
131
- )
132
-
133
- if meme_image:
134
- st.image(meme_image, caption="Your Meme", use_container_width=True)
135
-
136
- # Prepare content data
137
- content_data = {
138
- "template": selected_template,
139
- "texts": [text.strip() for text in text_inputs if text.strip()],
140
- "style": {
141
- "font_size": font_size,
142
- "text_color": text_color,
143
- "outline_color": outline_color,
144
- "outline_width": outline_width
145
- },
146
- "image_data": self._image_to_base64(meme_image)
147
- }
148
-
149
- # Step 7: Submit
150
- contribution = self.render_submission_section(
151
- content_data, cultural_context, language
152
- )
153
-
154
- if contribution:
155
- # Save to storage
156
- success = self.storage_service.save_contribution(contribution)
157
- if success:
158
- st.success("🎉 Your meme has been added to the cultural corpus!")
159
-
160
- # Show some engagement
161
- with st.expander("🌟 Why This Matters"):
162
- st.markdown(f"""
163
- Your meme in **{language}** helps preserve:
164
- - Local humor and cultural references
165
- - Language expressions and slang
166
- - Regional perspectives and contexts
167
- - Digital cultural artifacts
168
-
169
- Thank you for contributing to India's digital heritage! 🇮🇳
170
- """)
171
- else:
172
- st.error("Failed to save your meme. Please try again.")
173
- else:
174
- st.warning("Please add at least one caption to generate your meme!")
175
-
176
- def _generate_meme(self, template: str, uploaded_image: Optional[Any],
177
- texts: list, font_size: int, text_color: str,
178
- outline_color: str, outline_width: int) -> Optional[Image.Image]:
179
- """Generate meme image with text overlay"""
180
- try:
181
- # Create base image
182
- if template == "custom" and uploaded_image:
183
- base_image = Image.open(uploaded_image).convert("RGB")
184
- else:
185
- # Create placeholder image for template
186
- base_image = Image.new("RGB", (500, 300), color="lightgray")
187
- draw = ImageDraw.Draw(base_image)
188
- draw.text((250, 150), f"Template: {template}",
189
- fill="black", anchor="mm")
190
-
191
- # Resize if too large
192
- max_size = (800, 600)
193
- base_image.thumbnail(max_size, Image.Resampling.LANCZOS)
194
-
195
- # Create drawing context
196
- draw = ImageDraw.Draw(base_image)
197
-
198
- # Try to use a better font (fallback to default if not available)
199
- try:
200
- # Try to load a system font that supports Unicode
201
- font = ImageFont.truetype("arial.ttf", font_size)
202
- except:
203
- try:
204
- font = ImageFont.load_default()
205
- except:
206
- font = None
207
-
208
- # Get text positions for this template
209
- positions = self.meme_templates[template]["text_positions"]
210
-
211
- # Add text overlays
212
- for i, text in enumerate(texts):
213
- if text.strip() and i < len(positions):
214
- x, y = positions[i]
215
-
216
- # Adjust position based on image size
217
- img_width, img_height = base_image.size
218
- x = min(x, img_width - 50)
219
- y = min(y, img_height - 50)
220
-
221
- # Draw text with outline
222
- if outline_width > 0:
223
- # Draw outline
224
- for adj_x in range(-outline_width, outline_width + 1):
225
- for adj_y in range(-outline_width, outline_width + 1):
226
- if adj_x != 0 or adj_y != 0:
227
- draw.text((x + adj_x, y + adj_y), text,
228
- font=font, fill=outline_color)
229
-
230
- # Draw main text
231
- draw.text((x, y), text, font=font, fill=text_color)
232
-
233
- return base_image
234
-
235
- except Exception as e:
236
- st.error(f"Error generating meme: {str(e)}")
237
- return None
238
-
239
- def _image_to_base64(self, image: Image.Image) -> str:
240
- """Convert PIL Image to base64 string"""
241
- buffer = io.BytesIO()
242
- image.save(buffer, format="PNG")
243
- img_str = base64.b64encode(buffer.getvalue()).decode()
244
- return img_str
245
-
246
- def validate_content(self, content: Dict[str, Any]) -> Tuple[bool, str]:
247
- """Validate meme content"""
248
- if not content.get("texts"):
249
- return False, "Meme must have at least one text caption"
250
-
251
- # Check if any text is provided
252
- texts = content.get("texts", [])
253
- if not any(text.strip() for text in texts):
254
- return False, "At least one caption must contain text"
255
-
256
- # Validate text length
257
- for text in texts:
258
- if text.strip() and len(text.strip()) < 2:
259
- return False, "Captions must be at least 2 characters long"
260
- if len(text) > 200:
261
- return False, "Captions must be less than 200 characters"
262
-
263
- # Check template
264
- if not content.get("template"):
265
- return False, "Meme template must be specified"
266
-
267
- if content["template"] not in self.meme_templates:
268
- return False, "Invalid meme template"
269
-
270
- return True, "Valid meme content"
271
-
272
- def process_submission(self, data: Dict[str, Any]) -> UserContribution:
273
- """Process meme submission and create UserContribution"""
274
- # Get session ID from router if available
275
- session_id = st.session_state.get('user_session_id', 'anonymous')
276
-
277
- return UserContribution(
278
- user_session=session_id,
279
- activity_type=self.activity_type,
280
- content_data=data["content_data"],
281
- language=data["language"],
282
- cultural_context=data["cultural_context"],
283
- metadata={
284
- "template_used": data["content_data"].get("template"),
285
- "num_captions": len(data["content_data"].get("texts", [])),
286
- "submission_timestamp": datetime.now().isoformat(),
287
- "activity_version": "1.0"
288
- }
289
- )
290
-
291
- def render_meme_gallery(self):
292
- """Render gallery of recent memes (optional feature)"""
293
- st.subheader("🖼️ Recent Community Memes")
294
-
295
- # Get recent memes from storage
296
- recent_contributions = self.storage_service.get_contributions_by_language(
297
- st.session_state.get('selected_language', 'hi'), limit=6
298
- )
299
-
300
- meme_contributions = [
301
- contrib for contrib in recent_contributions
302
- if contrib.activity_type == ActivityType.MEME
303
- ]
304
-
305
- if meme_contributions:
306
- cols = st.columns(3)
307
- for i, contrib in enumerate(meme_contributions[:6]):
308
- col = cols[i % 3]
309
- with col:
310
- # Display meme info
311
- st.markdown(f"**Language:** {contrib.language}")
312
- if contrib.cultural_context.get("region"):
313
- st.markdown(f"**Region:** {contrib.cultural_context['region']}")
314
-
315
- # Show text content
316
- texts = contrib.content_data.get("texts", [])
317
- if texts:
318
- st.markdown(f"**Caption:** {texts[0][:50]}...")
319
-
320
- st.markdown("---")
321
- else:
322
- st.info("No memes yet. Be the first to create one! 🎭")
323
-
324
- def run(self):
325
- """Override run method to add gallery option"""
326
- super().run()
327
-
328
- # Add gallery section
329
- st.markdown("---")
330
- with st.expander("🖼️ Community Meme Gallery"):
331
- self.render_meme_gallery()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/activities/recipe_exchange.py DELETED
@@ -1,505 +0,0 @@
1
- """
2
- Recipe Exchange Activity - Share family recipes in native languages
3
- """
4
-
5
- import streamlit as st
6
- from typing import Dict, Any, Tuple, List, Optional
7
- from datetime import datetime
8
- import json
9
-
10
- from corpus_collection_engine.activities.base_activity import BaseActivity
11
- from corpus_collection_engine.models.data_models import UserContribution, ActivityType
12
- from corpus_collection_engine.services.storage_service import StorageService
13
- from corpus_collection_engine.services.ai_service import AIService
14
-
15
-
16
- class RecipeExchangeActivity(BaseActivity):
17
- """Activity for sharing traditional family recipes"""
18
-
19
- def __init__(self):
20
- super().__init__(ActivityType.RECIPE)
21
- self.storage_service = StorageService()
22
- self.ai_service = AIService()
23
-
24
- # Recipe categories
25
- self.recipe_categories = {
26
- "main_course": "Main Course / मुख्य व्यंजन",
27
- "appetizer": "Appetizer / स्टार्टर",
28
- "dessert": "Dessert / मिठाई",
29
- "snack": "Snack / नाश्ता",
30
- "beverage": "Beverage / पेय",
31
- "breakfast": "Breakfast / नाश्ता",
32
- "festival_special": "Festival Special / त्योहारी व्यंजन",
33
- "regional_specialty": "Regional Specialty / क्षेत्रीय विशेषता"
34
- }
35
-
36
- # Cooking methods
37
- self.cooking_methods = [
38
- "Boiling / उबालना",
39
- "Frying / तलना",
40
- "Steaming / भाप में पकाना",
41
- "Roasting / भूनना",
42
- "Grilling / ग्रिल करना",
43
- "Baking / बेक करना",
44
- "Pressure Cooking / प्रेशर कुकिंग",
45
- "Slow Cooking / धीमी आंच पर पकाना"
46
- ]
47
-
48
- # Dietary preferences
49
- self.dietary_types = [
50
- "Vegetarian / शाकाहारी",
51
- "Vegan / वीगन",
52
- "Non-Vegetarian / मांसाहारी",
53
- "Jain / जैन",
54
- "Gluten-Free / ग्लूटन फ्री",
55
- "Dairy-Free / डेयरी फ्री"
56
- ]
57
-
58
- def render_interface(self) -> None:
59
- """Render the recipe exchange interface"""
60
-
61
- # Step 1: Recipe Basic Information
62
- st.subheader("🍛 Step 1: Recipe Information")
63
-
64
- col1, col2 = st.columns(2)
65
-
66
- with col1:
67
- recipe_name = st.text_input(
68
- "Recipe Name / व्यंजन का नाम:",
69
- placeholder="e.g., Grandma's Dal Tadka",
70
- key="recipe_name"
71
- )
72
-
73
- category = st.selectbox(
74
- "Category / श्रेणी:",
75
- list(self.recipe_categories.keys()),
76
- format_func=lambda x: self.recipe_categories[x],
77
- key="recipe_category"
78
- )
79
-
80
- with col2:
81
- prep_time = st.number_input(
82
- "Preparation Time (minutes) / तैयारी का समय:",
83
- min_value=1,
84
- max_value=300,
85
- value=30,
86
- key="prep_time"
87
- )
88
-
89
- cook_time = st.number_input(
90
- "Cooking Time (minutes) / पकाने का समय:",
91
- min_value=1,
92
- max_value=480,
93
- value=45,
94
- key="cook_time"
95
- )
96
-
97
- servings = st.number_input(
98
- "Number of Servings / परोसने की मात्रा:",
99
- min_value=1,
100
- max_value=20,
101
- value=4,
102
- key="servings"
103
- )
104
-
105
- # Step 2: Language Selection
106
- language = self.render_language_selector("recipe_language")
107
-
108
- # Step 3: Ingredients
109
- st.subheader("🥕 Step 2: Ingredients / सामग्री")
110
-
111
- # Dynamic ingredient list
112
- if 'recipe_ingredients' not in st.session_state:
113
- st.session_state.recipe_ingredients = [{"name": "", "quantity": "", "unit": ""}]
114
-
115
- ingredients = []
116
-
117
- for i, ingredient in enumerate(st.session_state.recipe_ingredients):
118
- col1, col2, col3, col4 = st.columns([3, 2, 2, 1])
119
-
120
- with col1:
121
- name = st.text_input(
122
- f"Ingredient {i+1}:",
123
- value=ingredient["name"],
124
- placeholder="e.g., Basmati Rice / बासमती चावल",
125
- key=f"ingredient_name_{i}"
126
- )
127
-
128
- with col2:
129
- quantity = st.text_input(
130
- "Quantity:",
131
- value=ingredient["quantity"],
132
- placeholder="e.g., 2",
133
- key=f"ingredient_quantity_{i}"
134
- )
135
-
136
- with col3:
137
- unit = st.selectbox(
138
- "Unit:",
139
- ["cups", "tbsp", "tsp", "kg", "grams", "pieces", "as needed"],
140
- index=0 if not ingredient["unit"] else ["cups", "tbsp", "tsp", "kg", "grams", "pieces", "as needed"].index(ingredient["unit"]) if ingredient["unit"] in ["cups", "tbsp", "tsp", "kg", "grams", "pieces", "as needed"] else 0,
141
- key=f"ingredient_unit_{i}"
142
- )
143
-
144
- with col4:
145
- if st.button("❌", key=f"remove_ingredient_{i}"):
146
- if len(st.session_state.recipe_ingredients) > 1:
147
- st.session_state.recipe_ingredients.pop(i)
148
- st.rerun()
149
-
150
- ingredients.append({
151
- "name": name,
152
- "quantity": quantity,
153
- "unit": unit
154
- })
155
-
156
- # Update session state
157
- st.session_state.recipe_ingredients = ingredients
158
-
159
- # Add ingredient button
160
- if st.button("➕ Add Ingredient", key="add_ingredient"):
161
- st.session_state.recipe_ingredients.append({"name": "", "quantity": "", "unit": ""})
162
- st.rerun()
163
-
164
- # Step 4: Instructions
165
- st.subheader("📝 Step 3: Cooking Instructions / पकाने की विधि")
166
-
167
- instructions = st.text_area(
168
- "Step-by-step instructions:",
169
- placeholder=f"Write detailed cooking instructions in {language}...\n\n1. First step...\n2. Second step...\n3. Final step...",
170
- height=200,
171
- key="recipe_instructions"
172
- )
173
-
174
- # Step 5: Additional Details
175
- st.subheader("ℹ️ Step 4: Additional Details")
176
-
177
- col1, col2 = st.columns(2)
178
-
179
- with col1:
180
- cooking_method = st.multiselect(
181
- "Cooking Methods / पकाने की विधि:",
182
- self.cooking_methods,
183
- key="cooking_methods"
184
- )
185
-
186
- dietary_type = st.multiselect(
187
- "Dietary Type / आहार प्रकार:",
188
- self.dietary_types,
189
- key="dietary_types"
190
- )
191
-
192
- with col2:
193
- difficulty_level = st.select_slider(
194
- "Difficulty Level / कठिनाई स्तर:",
195
- options=["Easy / आसान", "Medium / मध्यम", "Hard / कठिन"],
196
- value="Medium / मध्यम",
197
- key="difficulty"
198
- )
199
-
200
- spice_level = st.select_slider(
201
- "Spice Level / मसाला स्तर:",
202
- options=["Mild / हल्का", "Medium / मध्यम", "Spicy / तीखा", "Very Spicy / बहुत तीखा"],
203
- value="Medium / मध्यम",
204
- key="spice_level"
205
- )
206
-
207
- # Step 6: Family Story & Cultural Context
208
- st.subheader("👨‍👩‍👧‍👦 Step 5: Family Story & Cultural Context")
209
-
210
- family_story = st.text_area(
211
- "Family Story / पारिवारिक कहानी:",
212
- placeholder="Share the story behind this recipe - who taught you, when it's made, special memories...",
213
- height=120,
214
- key="family_story"
215
- )
216
-
217
- # Cultural context form
218
- cultural_context = self.render_cultural_context_form("recipe_cultural")
219
-
220
- # Add recipe-specific cultural context
221
- col1, col2 = st.columns(2)
222
- with col1:
223
- occasion = st.text_input(
224
- "Special Occasion / विशेष अवसर:",
225
- placeholder="e.g., Diwali, Wedding, Daily meal",
226
- key="recipe_occasion"
227
- )
228
-
229
- with col2:
230
- origin_story = st.text_input(
231
- "Origin / मूल स्थान:",
232
- placeholder="e.g., Grandmother's village, Family tradition",
233
- key="recipe_origin"
234
- )
235
-
236
- # Add to cultural context
237
- cultural_context.update({
238
- "family_story": family_story.strip() if family_story else "",
239
- "occasion": occasion.strip() if occasion else "",
240
- "origin_story": origin_story.strip() if origin_story else ""
241
- })
242
-
243
- # Step 7: AI Suggestions (Optional)
244
- if st.checkbox("🤖 Get AI Suggestions", key="ai_suggestions"):
245
- self._render_ai_suggestions(recipe_name, ingredients, language)
246
-
247
- # Step 8: Preview and Submit
248
- st.subheader("👀 Step 6: Preview & Submit")
249
-
250
- # Validate required fields
251
- valid_ingredients = [ing for ing in ingredients if ing["name"].strip()]
252
-
253
- if recipe_name and instructions and len(valid_ingredients) > 0:
254
- # Show recipe preview
255
- with st.expander("📖 Recipe Preview"):
256
- self._render_recipe_preview(
257
- recipe_name, category, prep_time, cook_time, servings,
258
- valid_ingredients, instructions, cooking_method, dietary_type,
259
- difficulty_level, spice_level, family_story, language
260
- )
261
-
262
- # Prepare content data
263
- content_data = {
264
- "title": recipe_name,
265
- "category": category,
266
- "prep_time": prep_time,
267
- "cook_time": cook_time,
268
- "servings": servings,
269
- "ingredients": valid_ingredients,
270
- "instructions": instructions,
271
- "cooking_methods": cooking_method,
272
- "dietary_types": dietary_type,
273
- "difficulty_level": difficulty_level,
274
- "spice_level": spice_level,
275
- "family_story": family_story
276
- }
277
-
278
- # Submit section
279
- contribution = self.render_submission_section(
280
- content_data, cultural_context, language
281
- )
282
-
283
- if contribution:
284
- # Save to storage
285
- success = self.storage_service.save_contribution(contribution)
286
- if success:
287
- st.success("🎉 Your family recipe has been added to the cultural corpus!")
288
-
289
- # Show impact message
290
- with st.expander("🌟 Why This Matters"):
291
- st.markdown(f"""
292
- Your recipe in **{language}** helps preserve:
293
- - Traditional cooking knowledge and techniques
294
- - Family stories and cultural memories
295
- - Regional food vocabulary and terminology
296
- - Culinary heritage for future generations
297
-
298
- Thank you for sharing your family's culinary wisdom! 🍛
299
- """)
300
-
301
- # Clear form
302
- if st.button("🆕 Share Another Recipe"):
303
- # Clear session state
304
- for key in list(st.session_state.keys()):
305
- if key.startswith(('recipe_', 'ingredient_')):
306
- del st.session_state[key]
307
- st.session_state.recipe_ingredients = [{"name": "", "quantity": "", "unit": ""}]
308
- st.rerun()
309
- else:
310
- st.error("Failed to save your recipe. Please try again.")
311
- else:
312
- st.warning("Please fill in the recipe name, instructions, and at least one ingredient!")
313
-
314
- def _render_ai_suggestions(self, recipe_name: str, ingredients: List[Dict], language: str):
315
- """Render AI-powered suggestions"""
316
- if recipe_name:
317
- st.markdown("**🤖 AI Suggestions:**")
318
-
319
- col1, col2 = st.columns(2)
320
-
321
- with col1:
322
- if st.button("💡 Suggest Cooking Tips", key="ai_tips"):
323
- with st.spinner("Generating cooking tips..."):
324
- tips, confidence = self.ai_service.generate_text(
325
- f"cooking tips for {recipe_name}",
326
- language,
327
- max_length=150
328
- )
329
- if tips:
330
- st.info(f"💡 **Tip:** {tips}")
331
-
332
- with col2:
333
- if st.button("🏷️ Suggest Tags", key="ai_tags"):
334
- ingredient_names = [ing["name"] for ing in ingredients if ing["name"].strip()]
335
- content = f"{recipe_name} {' '.join(ingredient_names)}"
336
-
337
- tags = self.ai_service.suggest_cultural_tags(content, language)
338
- if tags:
339
- st.info(f"🏷️ **Suggested tags:** {', '.join(tags[:5])}")
340
-
341
- def _render_recipe_preview(self, name: str, category: str, prep_time: int,
342
- cook_time: int, servings: int, ingredients: List[Dict],
343
- instructions: str, cooking_methods: List[str],
344
- dietary_types: List[str], difficulty: str,
345
- spice_level: str, family_story: str, language: str):
346
- """Render recipe preview"""
347
- st.markdown(f"# {name}")
348
-
349
- # Basic info
350
- col1, col2, col3, col4 = st.columns(4)
351
- with col1:
352
- st.metric("Prep Time", f"{prep_time} min")
353
- with col2:
354
- st.metric("Cook Time", f"{cook_time} min")
355
- with col3:
356
- st.metric("Servings", servings)
357
- with col4:
358
- st.metric("Total Time", f"{prep_time + cook_time} min")
359
-
360
- # Category and details
361
- st.markdown(f"**Category:** {self.recipe_categories[category]}")
362
- st.markdown(f"**Difficulty:** {difficulty}")
363
- st.markdown(f"**Spice Level:** {spice_level}")
364
-
365
- if dietary_types:
366
- st.markdown(f"**Dietary:** {', '.join(dietary_types)}")
367
-
368
- if cooking_methods:
369
- st.markdown(f"**Cooking Methods:** {', '.join(cooking_methods)}")
370
-
371
- # Ingredients
372
- st.markdown("## Ingredients")
373
- for ing in ingredients:
374
- if ing["name"].strip():
375
- st.markdown(f"- {ing['quantity']} {ing['unit']} {ing['name']}")
376
-
377
- # Instructions
378
- st.markdown("## Instructions")
379
- st.markdown(instructions)
380
-
381
- # Family story
382
- if family_story.strip():
383
- st.markdown("## Family Story")
384
- st.markdown(family_story)
385
-
386
- def validate_content(self, content: Dict[str, Any]) -> Tuple[bool, str]:
387
- """Validate recipe content"""
388
- # Check required fields
389
- required_fields = ["title", "ingredients", "instructions"]
390
- for field in required_fields:
391
- if not content.get(field):
392
- return False, f"Recipe must include {field}"
393
-
394
- # Validate title
395
- title = content["title"].strip()
396
- if len(title) < 3:
397
- return False, "Recipe title must be at least 3 characters long"
398
- if len(title) > 100:
399
- return False, "Recipe title must be less than 100 characters"
400
-
401
- # Validate ingredients
402
- ingredients = content.get("ingredients", [])
403
- valid_ingredients = [ing for ing in ingredients if ing.get("name", "").strip()]
404
- if len(valid_ingredients) == 0:
405
- return False, "Recipe must have at least one ingredient"
406
-
407
- # Validate instructions
408
- instructions = content["instructions"].strip()
409
- if len(instructions) < 20:
410
- return False, "Instructions must be at least 20 characters long"
411
- if len(instructions) > 5000:
412
- return False, "Instructions must be less than 5000 characters"
413
-
414
- # Validate time values
415
- if content.get("prep_time", 0) <= 0:
416
- return False, "Preparation time must be greater than 0"
417
- if content.get("cook_time", 0) <= 0:
418
- return False, "Cooking time must be greater than 0"
419
- if content.get("servings", 0) <= 0:
420
- return False, "Number of servings must be greater than 0"
421
-
422
- return True, "Valid recipe content"
423
-
424
- def process_submission(self, data: Dict[str, Any]) -> UserContribution:
425
- """Process recipe submission and create UserContribution"""
426
- # Get session ID from router if available
427
- session_id = st.session_state.get('user_session_id', 'anonymous')
428
-
429
- # Calculate total time
430
- total_time = data["content_data"].get("prep_time", 0) + data["content_data"].get("cook_time", 0)
431
-
432
- return UserContribution(
433
- user_session=session_id,
434
- activity_type=self.activity_type,
435
- content_data=data["content_data"],
436
- language=data["language"],
437
- cultural_context=data["cultural_context"],
438
- metadata={
439
- "recipe_category": data["content_data"].get("category"),
440
- "total_time_minutes": total_time,
441
- "ingredient_count": len([ing for ing in data["content_data"].get("ingredients", []) if ing.get("name", "").strip()]),
442
- "difficulty_level": data["content_data"].get("difficulty_level"),
443
- "dietary_types": data["content_data"].get("dietary_types", []),
444
- "submission_timestamp": datetime.now().isoformat(),
445
- "activity_version": "1.0"
446
- }
447
- )
448
-
449
- def render_recipe_gallery(self):
450
- """Render gallery of recent recipes"""
451
- st.subheader("🍽️ Community Recipe Collection")
452
-
453
- # Get recent recipes from storage
454
- recent_contributions = self.storage_service.get_contributions_by_language(
455
- st.session_state.get('selected_language', 'hi'), limit=9
456
- )
457
-
458
- recipe_contributions = [
459
- contrib for contrib in recent_contributions
460
- if contrib.activity_type == ActivityType.RECIPE
461
- ]
462
-
463
- if recipe_contributions:
464
- # Display recipes in grid
465
- cols = st.columns(3)
466
- for i, contrib in enumerate(recipe_contributions[:9]):
467
- col = cols[i % 3]
468
- with col:
469
- with st.container():
470
- st.markdown(f"**{contrib.content_data.get('title', 'Untitled Recipe')}**")
471
-
472
- # Recipe details
473
- category = contrib.content_data.get('category', 'unknown')
474
- if category in self.recipe_categories:
475
- st.markdown(f"*{self.recipe_categories[category]}*")
476
-
477
- # Time and servings
478
- prep_time = contrib.content_data.get('prep_time', 0)
479
- cook_time = contrib.content_data.get('cook_time', 0)
480
- servings = contrib.content_data.get('servings', 0)
481
-
482
- st.markdown(f"⏱️ {prep_time + cook_time} min | 👥 {servings} servings")
483
-
484
- # Language and region
485
- st.markdown(f"🌐 {contrib.language}")
486
- if contrib.cultural_context.get("region"):
487
- st.markdown(f"📍 {contrib.cultural_context['region']}")
488
-
489
- # Family story preview
490
- family_story = contrib.content_data.get('family_story', '')
491
- if family_story:
492
- st.markdown(f"👨‍👩‍👧‍👦 {family_story[:50]}...")
493
-
494
- st.markdown("---")
495
- else:
496
- st.info("No recipes yet. Be the first to share your family recipe! 🍛")
497
-
498
- def run(self):
499
- """Override run method to add gallery option"""
500
- super().run()
501
-
502
- # Add gallery section
503
- st.markdown("---")
504
- with st.expander("🍽️ Community Recipe Gallery"):
505
- self.render_recipe_gallery()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/app.py DELETED
@@ -1,17 +0,0 @@
1
- #!/usr/bin/env python3
2
- """
3
- Corpus Collection Engine - Hugging Face Spaces Entry Point
4
- AI-powered app for collecting diverse data on Indian languages, history, and culture
5
- """
6
-
7
- import sys
8
- import os
9
-
10
- # Add the current directory to Python path for imports
11
- sys.path.append(os.path.dirname(os.path.abspath(__file__)))
12
-
13
- # Import and run the main application
14
- from corpus_collection_engine.main import main
15
-
16
- if __name__ == "__main__":
17
- main()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/config.py DELETED
@@ -1,71 +0,0 @@
1
- """
2
- Configuration settings for the Corpus Collection Engine
3
- """
4
-
5
- import os
6
- from pathlib import Path
7
- from typing import List, Dict
8
-
9
- # Project paths
10
- PROJECT_ROOT = Path(__file__).parent.parent
11
- DATA_DIR = PROJECT_ROOT / "data"
12
- MODELS_DIR = PROJECT_ROOT / "models"
13
- CACHE_DIR = PROJECT_ROOT / ".cache"
14
-
15
- # Supported Indic languages
16
- SUPPORTED_LANGUAGES: Dict[str, str] = {
17
- 'hi': 'Hindi',
18
- 'bn': 'Bengali',
19
- 'ta': 'Tamil',
20
- 'te': 'Telugu',
21
- 'ml': 'Malayalam',
22
- 'kn': 'Kannada',
23
- 'gu': 'Gujarati',
24
- 'mr': 'Marathi',
25
- 'pa': 'Punjabi',
26
- 'or': 'Odia',
27
- 'en': 'English'
28
- }
29
-
30
- # Activity types
31
- ACTIVITY_TYPES: List[str] = [
32
- 'meme',
33
- 'recipe',
34
- 'folklore',
35
- 'landmark'
36
- ]
37
-
38
- # AI model configurations
39
- AI_CONFIG = {
40
- 'text_model': 'sarvamai/sarvam-1',
41
- 'vision_model': 'microsoft/DiT-base',
42
- 'max_tokens': 512,
43
- 'temperature': 0.7
44
- }
45
-
46
- # Database configuration
47
- DATABASE_CONFIG = {
48
- 'local_db': 'sqlite:///corpus_collection.db',
49
- 'remote_db': os.getenv('DATABASE_URL', ''),
50
- 'batch_size': 100
51
- }
52
-
53
- # PWA and offline configuration
54
- PWA_CONFIG = {
55
- 'cache_version': 'v1.0.0',
56
- 'offline_timeout': 5000, # milliseconds
57
- 'sync_interval': 300000, # 5 minutes in milliseconds
58
- 'max_offline_storage': 50 * 1024 * 1024 # 50MB
59
- }
60
-
61
- # Content validation settings
62
- VALIDATION_CONFIG = {
63
- 'min_text_length': 10,
64
- 'max_text_length': 5000,
65
- 'max_image_size': 10 * 1024 * 1024, # 10MB
66
- 'allowed_image_types': ['jpg', 'jpeg', 'png', 'webp']
67
- }
68
-
69
- # Create necessary directories
70
- for directory in [DATA_DIR, MODELS_DIR, CACHE_DIR]:
71
- directory.mkdir(exist_ok=True)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/data/corpus_collection.db DELETED
Binary file (53.2 kB)
 
intern_project/corpus_collection_engine/main.py DELETED
@@ -1,212 +0,0 @@
1
- """
2
- Corpus Collection Engine - Main Streamlit Application
3
- AI-powered app for collecting diverse data on Indian languages, history, and culture
4
- """
5
-
6
- import streamlit as st
7
- import sys
8
- import os
9
-
10
- # Add the parent directory to Python path for imports
11
- sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
12
-
13
- from corpus_collection_engine.activities.activity_router import ActivityRouter
14
- from corpus_collection_engine.utils.performance_optimizer import PerformanceOptimizer
15
- from corpus_collection_engine.utils.error_handler import global_error_handler, ErrorCategory, ErrorSeverity
16
- from corpus_collection_engine.services.privacy_service import PrivacyService
17
- from corpus_collection_engine.services.engagement_service import EngagementService
18
- from corpus_collection_engine.pwa.pwa_manager import PWAManager
19
- from corpus_collection_engine.utils.performance_dashboard import PerformanceDashboard
20
-
21
- # Configure Streamlit page
22
- st.set_page_config(
23
- page_title="Corpus Collection Engine",
24
- page_icon="🇮🇳",
25
- layout="wide",
26
- initial_sidebar_state="expanded"
27
- )
28
-
29
- def initialize_application():
30
- """Initialize all application services and components"""
31
- # Initialize session state for global app management
32
- if 'app_initialized' not in st.session_state:
33
- st.session_state.app_initialized = False
34
- st.session_state.privacy_consent_given = False
35
- st.session_state.onboarding_completed = False
36
- st.session_state.admin_mode = False
37
-
38
- # Initialize services
39
- services = {}
40
-
41
- try:
42
- # Performance optimization
43
- services['optimizer'] = PerformanceOptimizer()
44
- services['optimizer'].initialize_performance_optimization()
45
-
46
- # Privacy service
47
- services['privacy'] = PrivacyService()
48
-
49
- # Engagement service
50
- services['engagement'] = EngagementService()
51
-
52
- # PWA manager
53
- services['pwa'] = PWAManager()
54
- services['pwa'].initialize_pwa()
55
-
56
- # Performance dashboard (for admin mode)
57
- services['performance_dashboard'] = PerformanceDashboard()
58
-
59
- st.session_state.app_initialized = True
60
- return services
61
-
62
- except Exception as e:
63
- global_error_handler.handle_error(
64
- e,
65
- ErrorCategory.SYSTEM,
66
- ErrorSeverity.HIGH,
67
- context={'component': 'app_initialization'},
68
- show_user_message=True
69
- )
70
- return {}
71
-
72
- def render_admin_interface(services):
73
- """Render admin interface for monitoring and management"""
74
- if not st.session_state.get('admin_mode', False):
75
- return
76
-
77
- with st.sidebar.expander("🔧 Admin Panel"):
78
- st.markdown("**System Monitoring**")
79
-
80
- if st.button("📊 Performance Dashboard"):
81
- st.session_state.show_performance_dashboard = True
82
-
83
- if st.button("🚨 Error Dashboard"):
84
- st.session_state.show_error_dashboard = True
85
-
86
- if st.button("📈 Analytics Dashboard"):
87
- st.session_state.show_analytics_dashboard = True
88
-
89
- st.markdown("**System Actions**")
90
-
91
- if st.button("🧹 Clear Cache"):
92
- st.cache_data.clear()
93
- st.success("Cache cleared!")
94
-
95
- if st.button("🔄 Reset Session"):
96
- for key in list(st.session_state.keys()):
97
- if key not in ['app_initialized']:
98
- del st.session_state[key]
99
- st.success("Session reset!")
100
- st.rerun()
101
-
102
- def render_admin_dashboards(services):
103
- """Render admin dashboards when requested"""
104
- if st.session_state.get('show_performance_dashboard', False):
105
- st.markdown("---")
106
- services['performance_dashboard'].render_dashboard()
107
- if st.button("❌ Close Performance Dashboard"):
108
- st.session_state.show_performance_dashboard = False
109
- st.rerun()
110
-
111
- if st.session_state.get('show_error_dashboard', False):
112
- st.markdown("---")
113
- global_error_handler.render_error_dashboard()
114
- if st.button("❌ Close Error Dashboard"):
115
- st.session_state.show_error_dashboard = False
116
- st.rerun()
117
-
118
- if st.session_state.get('show_analytics_dashboard', False):
119
- st.markdown("---")
120
- if 'router' in st.session_state:
121
- router = st.session_state.router
122
- if hasattr(router, 'analytics_service'):
123
- router.analytics_service.render_analytics_dashboard()
124
- if st.button("❌ Close Analytics Dashboard"):
125
- st.session_state.show_analytics_dashboard = False
126
- st.rerun()
127
-
128
- def handle_privacy_consent(privacy_service):
129
- """Handle privacy consent flow - Auto-consent for public deployment"""
130
- # Auto-consent for Hugging Face Spaces deployment
131
- if not st.session_state.privacy_consent_given:
132
- st.session_state.privacy_consent_given = True
133
- # Initialize privacy service without requiring explicit consent
134
- privacy_service.initialize_privacy_management()
135
-
136
- def handle_onboarding(engagement_service):
137
- """Handle user onboarding flow - Optional for public deployment"""
138
- if not st.session_state.onboarding_completed and st.session_state.privacy_consent_given:
139
- # Auto-complete onboarding for public deployment
140
- st.session_state.onboarding_completed = True
141
-
142
- # Show optional welcome message in sidebar
143
- with st.sidebar:
144
- st.success("🎉 Welcome to Corpus Collection Engine!")
145
- st.markdown("Help preserve Indian cultural heritage through AI!")
146
-
147
- if st.button("ℹ️ Show Quick Guide"):
148
- st.session_state.show_quick_guide = True
149
-
150
- def enable_admin_mode():
151
- """Enable admin mode for Hugging Face Spaces deployment"""
152
- # Admin mode is always enabled for public deployment
153
- st.session_state.admin_mode = True
154
-
155
- def main():
156
- """Main application entry point"""
157
- try:
158
- # Initialize application services
159
- services = initialize_application()
160
-
161
- if not services:
162
- st.error("Failed to initialize application services. Please refresh the page.")
163
- return
164
-
165
- # Show performance indicator
166
- services['optimizer'].render_performance_indicator()
167
-
168
- # Apply Streamlit-specific optimizations
169
- services['optimizer'].optimize_streamlit_config()
170
-
171
- # Enable admin mode for public deployment
172
- enable_admin_mode()
173
-
174
- # Handle privacy consent
175
- handle_privacy_consent(services['privacy'])
176
-
177
- # Handle onboarding
178
- handle_onboarding(services['engagement'])
179
-
180
- # Initialize activity router
181
- router = ActivityRouter()
182
- st.session_state.router = router # Store for admin access
183
-
184
- # Render admin interface
185
- render_admin_interface(services)
186
-
187
- # Run main application
188
- router.run()
189
-
190
- # Render admin dashboards if requested
191
- render_admin_dashboards(services)
192
-
193
- # Show engagement features
194
- services['engagement'].render_session_summary()
195
-
196
- except Exception as e:
197
- # Handle critical application errors
198
- global_error_handler.handle_error(
199
- e,
200
- ErrorCategory.SYSTEM,
201
- ErrorSeverity.CRITICAL,
202
- context={'component': 'main_application'},
203
- show_user_message=True
204
- )
205
-
206
- # Show fallback interface
207
- st.error("🚨 Critical application error occurred. Please refresh the page.")
208
- if st.button("🔄 Refresh Application"):
209
- st.rerun()
210
-
211
- if __name__ == "__main__":
212
- main()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/models/__init__.py DELETED
@@ -1 +0,0 @@
1
- # Data models for user contributions and corpus entries
 
 
intern_project/corpus_collection_engine/models/data_models.py DELETED
@@ -1,149 +0,0 @@
1
- """
2
- Core data models for the Corpus Collection Engine
3
- """
4
-
5
- from dataclasses import dataclass, field
6
- from datetime import datetime
7
- from typing import Dict, List, Optional, Any
8
- from enum import Enum
9
- import uuid
10
- import json
11
-
12
-
13
- class ActivityType(Enum):
14
- """Supported activity types"""
15
- MEME = "meme"
16
- RECIPE = "recipe"
17
- FOLKLORE = "folklore"
18
- LANDMARK = "landmark"
19
-
20
-
21
- class ValidationStatus(Enum):
22
- """Content validation status"""
23
- PENDING = "pending"
24
- APPROVED = "approved"
25
- REJECTED = "rejected"
26
- NEEDS_REVIEW = "needs_review"
27
-
28
-
29
- @dataclass
30
- class UserContribution:
31
- """Model for user contributions across all activities"""
32
- id: str = field(default_factory=lambda: str(uuid.uuid4()))
33
- user_session: str = ""
34
- activity_type: ActivityType = ActivityType.MEME
35
- content_data: Dict[str, Any] = field(default_factory=dict)
36
- language: str = "en"
37
- region: Optional[str] = None
38
- cultural_context: Dict[str, Any] = field(default_factory=dict)
39
- timestamp: datetime = field(default_factory=datetime.now)
40
- validation_status: ValidationStatus = ValidationStatus.PENDING
41
- metadata: Dict[str, Any] = field(default_factory=dict)
42
-
43
- def to_dict(self) -> Dict[str, Any]:
44
- """Convert to dictionary for storage"""
45
- return {
46
- 'id': self.id,
47
- 'user_session': self.user_session,
48
- 'activity_type': self.activity_type.value,
49
- 'content_data': json.dumps(self.content_data),
50
- 'language': self.language,
51
- 'region': self.region,
52
- 'cultural_context': json.dumps(self.cultural_context),
53
- 'timestamp': self.timestamp.isoformat(),
54
- 'validation_status': self.validation_status.value,
55
- 'metadata': json.dumps(self.metadata)
56
- }
57
-
58
- @classmethod
59
- def from_dict(cls, data: Dict[str, Any]) -> 'UserContribution':
60
- """Create instance from dictionary"""
61
- return cls(
62
- id=data['id'],
63
- user_session=data['user_session'],
64
- activity_type=ActivityType(data['activity_type']),
65
- content_data=json.loads(data['content_data']),
66
- language=data['language'],
67
- region=data.get('region'),
68
- cultural_context=json.loads(data['cultural_context']),
69
- timestamp=datetime.fromisoformat(data['timestamp']),
70
- validation_status=ValidationStatus(data['validation_status']),
71
- metadata=json.loads(data['metadata'])
72
- )
73
-
74
-
75
- @dataclass
76
- class CorpusEntry:
77
- """Model for processed corpus entries"""
78
- id: str = field(default_factory=lambda: str(uuid.uuid4()))
79
- contribution_id: str = ""
80
- text_content: Optional[str] = None
81
- image_content: Optional[bytes] = None
82
- language: str = "en"
83
- cultural_tags: List[str] = field(default_factory=list)
84
- quality_score: float = 0.0
85
- processed_features: Dict[str, Any] = field(default_factory=dict)
86
- created_at: datetime = field(default_factory=datetime.now)
87
-
88
- def to_dict(self) -> Dict[str, Any]:
89
- """Convert to dictionary for storage"""
90
- return {
91
- 'id': self.id,
92
- 'contribution_id': self.contribution_id,
93
- 'text_content': self.text_content,
94
- 'image_content': self.image_content,
95
- 'language': self.language,
96
- 'cultural_tags': json.dumps(self.cultural_tags),
97
- 'quality_score': self.quality_score,
98
- 'processed_features': json.dumps(self.processed_features),
99
- 'created_at': self.created_at.isoformat()
100
- }
101
-
102
- @classmethod
103
- def from_dict(cls, data: Dict[str, Any]) -> 'CorpusEntry':
104
- """Create instance from dictionary"""
105
- return cls(
106
- id=data['id'],
107
- contribution_id=data['contribution_id'],
108
- text_content=data.get('text_content'),
109
- image_content=data.get('image_content'),
110
- language=data['language'],
111
- cultural_tags=json.loads(data['cultural_tags']),
112
- quality_score=data['quality_score'],
113
- processed_features=json.loads(data['processed_features']),
114
- created_at=datetime.fromisoformat(data['created_at'])
115
- )
116
-
117
-
118
- @dataclass
119
- class ActivitySession:
120
- """Model for tracking user activity sessions"""
121
- session_id: str = field(default_factory=lambda: str(uuid.uuid4()))
122
- user_id: Optional[str] = None
123
- activity_type: ActivityType = ActivityType.MEME
124
- start_time: datetime = field(default_factory=datetime.now)
125
- contributions: List[str] = field(default_factory=list)
126
- engagement_metrics: Dict[str, Any] = field(default_factory=dict)
127
-
128
- def to_dict(self) -> Dict[str, Any]:
129
- """Convert to dictionary for storage"""
130
- return {
131
- 'session_id': self.session_id,
132
- 'user_id': self.user_id,
133
- 'activity_type': self.activity_type.value,
134
- 'start_time': self.start_time.isoformat(),
135
- 'contributions': json.dumps(self.contributions),
136
- 'engagement_metrics': json.dumps(self.engagement_metrics)
137
- }
138
-
139
- @classmethod
140
- def from_dict(cls, data: Dict[str, Any]) -> 'ActivitySession':
141
- """Create instance from dictionary"""
142
- return cls(
143
- session_id=data['session_id'],
144
- user_id=data.get('user_id'),
145
- activity_type=ActivityType(data['activity_type']),
146
- start_time=datetime.fromisoformat(data['start_time']),
147
- contributions=json.loads(data['contributions']),
148
- engagement_metrics=json.loads(data['engagement_metrics'])
149
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/models/validation.py DELETED
@@ -1,223 +0,0 @@
1
- """
2
- Validation functions for data models and user input
3
- """
4
-
5
- from typing import Dict, List, Tuple, Any, Optional
6
- import re
7
- from datetime import datetime
8
- from corpus_collection_engine.models.data_models import UserContribution, CorpusEntry, ActivitySession, ActivityType
9
- from corpus_collection_engine.config import VALIDATION_CONFIG, SUPPORTED_LANGUAGES
10
-
11
-
12
- class ValidationError(Exception):
13
- """Custom exception for validation errors"""
14
- pass
15
-
16
-
17
- class DataValidator:
18
- """Validator class for all data models and user input"""
19
-
20
- @staticmethod
21
- def validate_text_content(text: str, min_length: int = None, max_length: int = None) -> Tuple[bool, str]:
22
- """Validate text content length and basic format"""
23
- if not text or not text.strip():
24
- return False, "Text content cannot be empty"
25
-
26
- text = text.strip()
27
- min_len = min_length or VALIDATION_CONFIG['min_text_length']
28
- max_len = max_length or VALIDATION_CONFIG['max_text_length']
29
-
30
- if len(text) < min_len:
31
- return False, f"Text must be at least {min_len} characters long"
32
-
33
- if len(text) > max_len:
34
- return False, f"Text must not exceed {max_len} characters"
35
-
36
- # Check for suspicious patterns (basic spam detection)
37
- if re.search(r'(.)\1{10,}', text): # Repeated characters
38
- return False, "Text contains suspicious repeated patterns"
39
-
40
- return True, "Valid text content"
41
-
42
- @staticmethod
43
- def validate_language_code(language: str) -> Tuple[bool, str]:
44
- """Validate language code against supported languages"""
45
- if not language:
46
- return False, "Language code cannot be empty"
47
-
48
- if language not in SUPPORTED_LANGUAGES:
49
- return False, f"Unsupported language code: {language}"
50
-
51
- return True, f"Valid language: {SUPPORTED_LANGUAGES[language]}"
52
-
53
- @staticmethod
54
- def validate_image_data(image_data: bytes, max_size: int = None) -> Tuple[bool, str]:
55
- """Validate image data size and basic format"""
56
- if not image_data:
57
- return False, "Image data cannot be empty"
58
-
59
- max_size = max_size or VALIDATION_CONFIG['max_image_size']
60
-
61
- if len(image_data) > max_size:
62
- size_mb = len(image_data) / (1024 * 1024)
63
- max_mb = max_size / (1024 * 1024)
64
- return False, f"Image size ({size_mb:.1f}MB) exceeds maximum ({max_mb:.1f}MB)"
65
-
66
- # Basic image format validation (check for common headers)
67
- image_headers = {
68
- b'\xff\xd8\xff': 'JPEG',
69
- b'\x89PNG\r\n\x1a\n': 'PNG',
70
- b'RIFF': 'WEBP'
71
- }
72
-
73
- is_valid_image = any(image_data.startswith(header) for header in image_headers.keys())
74
- if not is_valid_image:
75
- return False, "Invalid image format. Supported: JPEG, PNG, WEBP"
76
-
77
- return True, "Valid image data"
78
-
79
- @staticmethod
80
- def validate_cultural_context(context: Dict[str, Any]) -> Tuple[bool, str]:
81
- """Validate cultural context data"""
82
- if not isinstance(context, dict):
83
- return False, "Cultural context must be a dictionary"
84
-
85
- # Check for required fields based on activity type
86
- required_fields = ['region', 'cultural_significance']
87
- missing_fields = [field for field in required_fields if field not in context]
88
-
89
- if missing_fields:
90
- return False, f"Missing required cultural context fields: {missing_fields}"
91
-
92
- # Validate region if provided
93
- if 'region' in context and context['region']:
94
- region = context['region'].strip()
95
- if len(region) < 2:
96
- return False, "Region must be at least 2 characters long"
97
-
98
- return True, "Valid cultural context"
99
-
100
- @classmethod
101
- def validate_user_contribution(cls, contribution: UserContribution) -> Tuple[bool, List[str]]:
102
- """Comprehensive validation for UserContribution"""
103
- errors = []
104
-
105
- # Validate basic fields
106
- if not contribution.user_session:
107
- errors.append("User session ID is required")
108
-
109
- if not isinstance(contribution.activity_type, ActivityType):
110
- errors.append("Invalid activity type")
111
-
112
- # Validate language
113
- is_valid_lang, lang_msg = cls.validate_language_code(contribution.language)
114
- if not is_valid_lang:
115
- errors.append(lang_msg)
116
-
117
- # Validate content data based on activity type
118
- content_errors = cls._validate_activity_content(
119
- contribution.activity_type,
120
- contribution.content_data
121
- )
122
- errors.extend(content_errors)
123
-
124
- # Validate cultural context
125
- is_valid_context, context_msg = cls.validate_cultural_context(contribution.cultural_context)
126
- if not is_valid_context:
127
- errors.append(context_msg)
128
-
129
- # Validate timestamp
130
- if contribution.timestamp > datetime.now():
131
- errors.append("Timestamp cannot be in the future")
132
-
133
- return len(errors) == 0, errors
134
-
135
- @classmethod
136
- def _validate_activity_content(cls, activity_type: ActivityType, content_data: Dict[str, Any]) -> List[str]:
137
- """Validate content data specific to activity type"""
138
- errors = []
139
-
140
- if activity_type == ActivityType.MEME:
141
- if 'text' not in content_data:
142
- errors.append("Meme content must include text")
143
- else:
144
- is_valid, msg = cls.validate_text_content(content_data['text'])
145
- if not is_valid:
146
- errors.append(f"Meme text: {msg}")
147
-
148
- elif activity_type == ActivityType.RECIPE:
149
- required_fields = ['title', 'ingredients', 'instructions']
150
- for field in required_fields:
151
- if field not in content_data:
152
- errors.append(f"Recipe content must include {field}")
153
- elif not content_data[field]:
154
- errors.append(f"Recipe {field} cannot be empty")
155
-
156
- elif activity_type == ActivityType.FOLKLORE:
157
- if 'story' not in content_data:
158
- errors.append("Folklore content must include story")
159
- else:
160
- is_valid, msg = cls.validate_text_content(content_data['story'], min_length=50)
161
- if not is_valid:
162
- errors.append(f"Folklore story: {msg}")
163
-
164
- elif activity_type == ActivityType.LANDMARK:
165
- if 'description' not in content_data:
166
- errors.append("Landmark content must include description")
167
- else:
168
- is_valid, msg = cls.validate_text_content(content_data['description'])
169
- if not is_valid:
170
- errors.append(f"Landmark description: {msg}")
171
-
172
- return errors
173
-
174
- @classmethod
175
- def validate_corpus_entry(cls, entry: CorpusEntry) -> Tuple[bool, List[str]]:
176
- """Comprehensive validation for CorpusEntry"""
177
- errors = []
178
-
179
- if not entry.contribution_id:
180
- errors.append("Contribution ID is required")
181
-
182
- # Must have either text or image content
183
- if not entry.text_content and not entry.image_content:
184
- errors.append("Corpus entry must have either text or image content")
185
-
186
- # Validate text content if present
187
- if entry.text_content:
188
- is_valid, msg = cls.validate_text_content(entry.text_content)
189
- if not is_valid:
190
- errors.append(f"Text content: {msg}")
191
-
192
- # Validate image content if present
193
- if entry.image_content:
194
- is_valid, msg = cls.validate_image_data(entry.image_content)
195
- if not is_valid:
196
- errors.append(f"Image content: {msg}")
197
-
198
- # Validate language
199
- is_valid_lang, lang_msg = cls.validate_language_code(entry.language)
200
- if not is_valid_lang:
201
- errors.append(lang_msg)
202
-
203
- # Validate quality score
204
- if not 0.0 <= entry.quality_score <= 1.0:
205
- errors.append("Quality score must be between 0.0 and 1.0")
206
-
207
- return len(errors) == 0, errors
208
-
209
- @classmethod
210
- def validate_activity_session(cls, session: ActivitySession) -> Tuple[bool, List[str]]:
211
- """Comprehensive validation for ActivitySession"""
212
- errors = []
213
-
214
- if not session.session_id:
215
- errors.append("Session ID is required")
216
-
217
- if not isinstance(session.activity_type, ActivityType):
218
- errors.append("Invalid activity type")
219
-
220
- if session.start_time > datetime.now():
221
- errors.append("Start time cannot be in the future")
222
-
223
- return len(errors) == 0, errors
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/pwa/offline.html DELETED
@@ -1,256 +0,0 @@
1
- <!DOCTYPE html>
2
- <html lang="en">
3
- <head>
4
- <meta charset="UTF-8">
5
- <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
- <title>Offline - Corpus Collection Engine</title>
7
- <style>
8
- body {
9
- font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
10
- margin: 0;
11
- padding: 0;
12
- background: linear-gradient(135deg, #FF6B35, #F7931E);
13
- color: white;
14
- min-height: 100vh;
15
- display: flex;
16
- align-items: center;
17
- justify-content: center;
18
- }
19
-
20
- .offline-container {
21
- text-align: center;
22
- padding: 40px 20px;
23
- max-width: 500px;
24
- }
25
-
26
- .offline-icon {
27
- font-size: 80px;
28
- margin-bottom: 20px;
29
- opacity: 0.8;
30
- }
31
-
32
- .offline-title {
33
- font-size: 32px;
34
- font-weight: bold;
35
- margin-bottom: 16px;
36
- }
37
-
38
- .offline-message {
39
- font-size: 18px;
40
- line-height: 1.6;
41
- margin-bottom: 30px;
42
- opacity: 0.9;
43
- }
44
-
45
- .offline-features {
46
- background: rgba(255, 255, 255, 0.1);
47
- border-radius: 12px;
48
- padding: 24px;
49
- margin: 30px 0;
50
- text-align: left;
51
- }
52
-
53
- .offline-features h3 {
54
- margin-top: 0;
55
- margin-bottom: 16px;
56
- font-size: 20px;
57
- }
58
-
59
- .offline-features ul {
60
- list-style: none;
61
- padding: 0;
62
- margin: 0;
63
- }
64
-
65
- .offline-features li {
66
- padding: 8px 0;
67
- padding-left: 24px;
68
- position: relative;
69
- }
70
-
71
- .offline-features li:before {
72
- content: "✓";
73
- position: absolute;
74
- left: 0;
75
- color: #4CAF50;
76
- font-weight: bold;
77
- }
78
-
79
- .retry-button {
80
- background: white;
81
- color: #FF6B35;
82
- border: none;
83
- padding: 12px 24px;
84
- border-radius: 6px;
85
- font-size: 16px;
86
- font-weight: bold;
87
- cursor: pointer;
88
- transition: transform 0.2s;
89
- }
90
-
91
- .retry-button:hover {
92
- transform: translateY(-2px);
93
- }
94
-
95
- .retry-button:active {
96
- transform: translateY(0);
97
- }
98
-
99
- .connection-status {
100
- margin-top: 20px;
101
- padding: 12px;
102
- border-radius: 6px;
103
- background: rgba(255, 255, 255, 0.1);
104
- font-size: 14px;
105
- }
106
-
107
- .status-online {
108
- background: rgba(76, 175, 80, 0.2);
109
- }
110
-
111
- .status-offline {
112
- background: rgba(244, 67, 54, 0.2);
113
- }
114
-
115
- @keyframes pulse {
116
- 0% { opacity: 1; }
117
- 50% { opacity: 0.5; }
118
- 100% { opacity: 1; }
119
- }
120
-
121
- .checking {
122
- animation: pulse 2s infinite;
123
- }
124
- </style>
125
- </head>
126
- <body>
127
- <div class="offline-container">
128
- <div class="offline-icon">📡</div>
129
-
130
- <h1 class="offline-title">You're Offline</h1>
131
-
132
- <p class="offline-message">
133
- Don't worry! The Corpus Collection Engine works offline too.
134
- Your cultural contributions will be saved locally and synced when you're back online.
135
- </p>
136
-
137
- <div class="offline-features">
138
- <h3>🌟 What you can still do offline:</h3>
139
- <ul>
140
- <li>Create memes with local dialect captions</li>
141
- <li>Write down family recipes and stories</li>
142
- <li>Document folklore and traditional tales</li>
143
- <li>Describe cultural landmarks (photos saved locally)</li>
144
- <li>Browse previously loaded content</li>
145
- <li>All contributions saved for later sync</li>
146
- </ul>
147
- </div>
148
-
149
- <button class="retry-button" onclick="checkConnection()">
150
- 🔄 Check Connection
151
- </button>
152
-
153
- <div id="connection-status" class="connection-status">
154
- <span id="status-text">Checking connection...</span>
155
- </div>
156
-
157
- <div style="margin-top: 30px; font-size: 14px; opacity: 0.8;">
158
- <p>🇮🇳 Preserving Indian Culture Through AI</p>
159
- <p>Even offline, every contribution matters!</p>
160
- </div>
161
- </div>
162
-
163
- <script>
164
- let isChecking = false;
165
-
166
- function updateConnectionStatus(online) {
167
- const statusElement = document.getElementById('connection-status');
168
- const statusText = document.getElementById('status-text');
169
-
170
- if (online) {
171
- statusElement.className = 'connection-status status-online';
172
- statusText.textContent = '✅ Connection restored! Redirecting...';
173
-
174
- // Redirect to main app after a short delay
175
- setTimeout(() => {
176
- window.location.href = '/';
177
- }, 2000);
178
- } else {
179
- statusElement.className = 'connection-status status-offline';
180
- statusText.textContent = '❌ Still offline. Your contributions will be saved locally.';
181
- }
182
- }
183
-
184
- function checkConnection() {
185
- if (isChecking) return;
186
-
187
- isChecking = true;
188
- const button = document.querySelector('.retry-button');
189
- const statusText = document.getElementById('status-text');
190
-
191
- button.textContent = '🔄 Checking...';
192
- button.classList.add('checking');
193
- statusText.textContent = 'Checking connection...';
194
-
195
- // Try to fetch a small resource
196
- fetch('/', {
197
- method: 'HEAD',
198
- cache: 'no-cache',
199
- mode: 'no-cors'
200
- })
201
- .then(() => {
202
- updateConnectionStatus(true);
203
- })
204
- .catch(() => {
205
- updateConnectionStatus(false);
206
- })
207
- .finally(() => {
208
- isChecking = false;
209
- button.textContent = '🔄 Check Connection';
210
- button.classList.remove('checking');
211
- });
212
- }
213
-
214
- // Auto-check connection status
215
- function autoCheckConnection() {
216
- if (!isChecking) {
217
- fetch('/', {
218
- method: 'HEAD',
219
- cache: 'no-cache',
220
- mode: 'no-cors'
221
- })
222
- .then(() => {
223
- updateConnectionStatus(true);
224
- })
225
- .catch(() => {
226
- // Still offline, continue checking
227
- });
228
- }
229
- }
230
-
231
- // Check connection every 10 seconds
232
- setInterval(autoCheckConnection, 10000);
233
-
234
- // Listen for online/offline events
235
- window.addEventListener('online', () => updateConnectionStatus(true));
236
- window.addEventListener('offline', () => updateConnectionStatus(false));
237
-
238
- // Initial connection check
239
- setTimeout(() => {
240
- updateConnectionStatus(navigator.onLine);
241
- }, 1000);
242
-
243
- // Service worker message handling
244
- if ('serviceWorker' in navigator) {
245
- navigator.serviceWorker.addEventListener('message', event => {
246
- const { type, count } = event.data;
247
-
248
- if (type === 'SYNC_COMPLETE' && count > 0) {
249
- const statusText = document.getElementById('status-text');
250
- statusText.textContent = `✅ Synced ${count} contribution(s) successfully!`;
251
- }
252
- });
253
- }
254
- </script>
255
- </body>
256
- </html>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/pwa/pwa_manager.py DELETED
@@ -1,541 +0,0 @@
1
- """
2
- PWA Manager for Streamlit integration and offline functionality
3
- """
4
-
5
- import streamlit as st
6
- import json
7
- import os
8
- from typing import Dict, List, Any, Optional
9
- from pathlib import Path
10
- import logging
11
-
12
- from corpus_collection_engine.config import PWA_CONFIG, DATA_DIR
13
-
14
-
15
- class PWAManager:
16
- """Manager for Progressive Web App functionality"""
17
-
18
- def __init__(self):
19
- self.logger = logging.getLogger(__name__)
20
- self.config = PWA_CONFIG
21
- self.offline_storage_path = os.path.join(DATA_DIR, "offline_data.json")
22
-
23
- # Initialize PWA state in session
24
- if 'pwa_initialized' not in st.session_state:
25
- st.session_state.pwa_initialized = False
26
- st.session_state.is_online = True
27
- st.session_state.offline_contributions = []
28
- st.session_state.sync_status = "idle"
29
-
30
- def initialize_pwa(self):
31
- """Initialize PWA functionality in Streamlit"""
32
- if st.session_state.pwa_initialized:
33
- return
34
-
35
- try:
36
- # Inject PWA components into Streamlit
37
- self._inject_pwa_components()
38
-
39
- # Register service worker
40
- self._register_service_worker()
41
-
42
- # Setup offline detection
43
- self._setup_offline_detection()
44
-
45
- # Load offline data
46
- self._load_offline_data()
47
-
48
- st.session_state.pwa_initialized = True
49
- self.logger.info("PWA initialized successfully")
50
-
51
- except Exception as e:
52
- self.logger.error(f"PWA initialization failed: {e}")
53
-
54
- def _inject_pwa_components(self):
55
- """Inject PWA-related HTML components"""
56
-
57
- # Web App Manifest
58
- manifest = self._generate_manifest()
59
-
60
- # PWA HTML components
61
- pwa_html = f"""
62
- <script>
63
- // Web App Manifest
64
- const manifestBlob = new Blob(['{json.dumps(manifest)}'], {{type: 'application/json'}});
65
- const manifestURL = URL.createObjectURL(manifestBlob);
66
-
67
- const manifestLink = document.createElement('link');
68
- manifestLink.rel = 'manifest';
69
- manifestLink.href = manifestURL;
70
- document.head.appendChild(manifestLink);
71
-
72
- // Viewport meta tag for mobile
73
- const viewportMeta = document.createElement('meta');
74
- viewportMeta.name = 'viewport';
75
- viewportMeta.content = 'width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no';
76
- document.head.appendChild(viewportMeta);
77
-
78
- // Theme color
79
- const themeColorMeta = document.createElement('meta');
80
- themeColorMeta.name = 'theme-color';
81
- themeColorMeta.content = '#FF6B35';
82
- document.head.appendChild(themeColorMeta);
83
-
84
- // Apple touch icon
85
- const appleTouchIcon = document.createElement('link');
86
- appleTouchIcon.rel = 'apple-touch-icon';
87
- appleTouchIcon.href = 'data:image/svg+xml,<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 100 100"><rect width="100" height="100" fill="%23FF6B35"/><text x="50" y="55" font-size="40" text-anchor="middle" fill="white">🇮🇳</text></svg>';
88
- document.head.appendChild(appleTouchIcon);
89
-
90
- // PWA installation prompt
91
- window.pwaInstallPrompt = null;
92
-
93
- window.addEventListener('beforeinstallprompt', (e) => {{
94
- e.preventDefault();
95
- window.pwaInstallPrompt = e;
96
- console.log('PWA install prompt available');
97
- }});
98
-
99
- // Online/offline detection
100
- window.addEventListener('online', () => {{
101
- console.log('Connection restored');
102
- window.parent.postMessage({{type: 'CONNECTION_STATUS', online: true}}, '*');
103
- }});
104
-
105
- window.addEventListener('offline', () => {{
106
- console.log('Connection lost');
107
- window.parent.postMessage({{type: 'CONNECTION_STATUS', online: false}}, '*');
108
- }});
109
-
110
- // Initial connection status
111
- window.parent.postMessage({{type: 'CONNECTION_STATUS', online: navigator.onLine}}, '*');
112
- </script>
113
-
114
- <style>
115
- /* PWA-specific styles */
116
- .pwa-offline-indicator {{
117
- position: fixed;
118
- top: 0;
119
- left: 0;
120
- right: 0;
121
- background: #ff4444;
122
- color: white;
123
- text-align: center;
124
- padding: 8px;
125
- z-index: 9999;
126
- font-size: 14px;
127
- display: none;
128
- }}
129
-
130
- .pwa-sync-indicator {{
131
- position: fixed;
132
- bottom: 20px;
133
- right: 20px;
134
- background: #4CAF50;
135
- color: white;
136
- padding: 12px 16px;
137
- border-radius: 4px;
138
- font-size: 14px;
139
- z-index: 9999;
140
- display: none;
141
- }}
142
-
143
- .pwa-install-banner {{
144
- background: linear-gradient(135deg, #FF6B35, #F7931E);
145
- color: white;
146
- padding: 16px;
147
- border-radius: 8px;
148
- margin: 16px 0;
149
- text-align: center;
150
- }}
151
-
152
- .pwa-install-button {{
153
- background: white;
154
- color: #FF6B35;
155
- border: none;
156
- padding: 8px 16px;
157
- border-radius: 4px;
158
- font-weight: bold;
159
- cursor: pointer;
160
- margin-top: 8px;
161
- }}
162
- </style>
163
-
164
- <div id="pwa-offline-indicator" class="pwa-offline-indicator">
165
- 📡 You're offline. Your contributions will be saved and synced when connection is restored.
166
- </div>
167
-
168
- <div id="pwa-sync-indicator" class="pwa-sync-indicator">
169
- ✅ Contributions synced successfully!
170
- </div>
171
- """
172
-
173
- st.components.v1.html(pwa_html, height=0)
174
-
175
- def _generate_manifest(self) -> Dict[str, Any]:
176
- """Generate Web App Manifest"""
177
- return {
178
- "name": "Corpus Collection Engine",
179
- "short_name": "CorpusCollect",
180
- "description": "AI-powered app for collecting diverse data on Indian languages, history, and culture",
181
- "start_url": "/",
182
- "display": "standalone",
183
- "background_color": "#FFFFFF",
184
- "theme_color": "#FF6B35",
185
- "orientation": "portrait-primary",
186
- "categories": ["education", "culture", "productivity"],
187
- "lang": "en-IN",
188
- "icons": [
189
- {
190
- "src": "data:image/svg+xml,<svg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 192 192'><rect width='192' height='192' fill='%23FF6B35'/><text x='96' y='110' font-size='80' text-anchor='middle' fill='white'>🇮🇳</text></svg>",
191
- "sizes": "192x192",
192
- "type": "image/svg+xml",
193
- "purpose": "any maskable"
194
- },
195
- {
196
- "src": "data:image/svg+xml,<svg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 512 512'><rect width='512' height='512' fill='%23FF6B35'/><text x='256' y='290' font-size='200' text-anchor='middle' fill='white'>🇮🇳</text></svg>",
197
- "sizes": "512x512",
198
- "type": "image/svg+xml",
199
- "purpose": "any maskable"
200
- }
201
- ],
202
- "screenshots": [
203
- {
204
- "src": "data:image/svg+xml,<svg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 540 720'><rect width='540' height='720' fill='%23f8f9fa'/><rect x='20' y='60' width='500' height='80' fill='%23FF6B35' rx='8'/><text x='270' y='110' font-size='24' text-anchor='middle' fill='white'>Corpus Collection Engine</text></svg>",
205
- "sizes": "540x720",
206
- "type": "image/svg+xml",
207
- "form_factor": "narrow"
208
- }
209
- ]
210
- }
211
-
212
- def _register_service_worker(self):
213
- """Register service worker for offline functionality"""
214
-
215
- # Read service worker content
216
- sw_path = Path(__file__).parent / "service_worker.js"
217
-
218
- if sw_path.exists():
219
- with open(sw_path, 'r', encoding='utf-8') as f:
220
- sw_content = f.read()
221
- else:
222
- self.logger.warning("Service worker file not found")
223
- return
224
-
225
- # Inject service worker registration
226
- sw_registration = f"""
227
- <script>
228
- if ('serviceWorker' in navigator) {{
229
- // Create service worker from string content
230
- const swBlob = new Blob([`{sw_content}`], {{type: 'application/javascript'}});
231
- const swURL = URL.createObjectURL(swBlob);
232
-
233
- navigator.serviceWorker.register(swURL)
234
- .then(registration => {{
235
- console.log('Service Worker registered successfully:', registration);
236
-
237
- // Listen for updates
238
- registration.addEventListener('updatefound', () => {{
239
- const newWorker = registration.installing;
240
- newWorker.addEventListener('statechange', () => {{
241
- if (newWorker.state === 'installed' && navigator.serviceWorker.controller) {{
242
- console.log('New service worker available');
243
- // Optionally show update notification
244
- }}
245
- }});
246
- }});
247
- }})
248
- .catch(error => {{
249
- console.error('Service Worker registration failed:', error);
250
- }});
251
-
252
- // Listen for messages from service worker
253
- navigator.serviceWorker.addEventListener('message', event => {{
254
- const {{ type, count }} = event.data;
255
-
256
- if (type === 'SYNC_COMPLETE') {{
257
- console.log(`Synced ${{count}} contributions`);
258
- window.parent.postMessage({{type: 'SYNC_COMPLETE', count}}, '*');
259
-
260
- // Show sync indicator
261
- const indicator = document.getElementById('pwa-sync-indicator');
262
- if (indicator) {{
263
- indicator.style.display = 'block';
264
- setTimeout(() => {{
265
- indicator.style.display = 'none';
266
- }}, 3000);
267
- }}
268
- }}
269
- }});
270
- }} else {{
271
- console.log('Service Workers not supported');
272
- }}
273
- </script>
274
- """
275
-
276
- st.components.v1.html(sw_registration, height=0)
277
-
278
- def _setup_offline_detection(self):
279
- """Setup offline/online detection"""
280
-
281
- # JavaScript for connection monitoring
282
- connection_monitor = """
283
- <script>
284
- function updateConnectionStatus(online) {
285
- const indicator = document.getElementById('pwa-offline-indicator');
286
- if (indicator) {
287
- indicator.style.display = online ? 'none' : 'block';
288
- }
289
-
290
- // Update Streamlit session state
291
- window.parent.postMessage({
292
- type: 'CONNECTION_STATUS',
293
- online: online
294
- }, '*');
295
- }
296
-
297
- // Monitor connection status
298
- window.addEventListener('online', () => updateConnectionStatus(true));
299
- window.addEventListener('offline', () => updateConnectionStatus(false));
300
-
301
- // Initial status
302
- updateConnectionStatus(navigator.onLine);
303
-
304
- // Periodic connectivity check
305
- setInterval(() => {
306
- fetch('/ping', {method: 'HEAD', cache: 'no-cache'})
307
- .then(() => updateConnectionStatus(true))
308
- .catch(() => updateConnectionStatus(false));
309
- }, 30000); // Check every 30 seconds
310
- </script>
311
- """
312
-
313
- st.components.v1.html(connection_monitor, height=0)
314
-
315
- def _load_offline_data(self):
316
- """Load offline contributions from storage"""
317
- try:
318
- if os.path.exists(self.offline_storage_path):
319
- with open(self.offline_storage_path, 'r', encoding='utf-8') as f:
320
- offline_data = json.load(f)
321
- st.session_state.offline_contributions = offline_data.get('contributions', [])
322
- except Exception as e:
323
- self.logger.error(f"Failed to load offline data: {e}")
324
- st.session_state.offline_contributions = []
325
-
326
- def save_offline_contribution(self, contribution_data: Dict[str, Any]) -> bool:
327
- """Save contribution for offline sync"""
328
- try:
329
- # Add timestamp and ID
330
- contribution_data['offline_timestamp'] = st.session_state.get('current_timestamp', '')
331
- contribution_data['offline_id'] = f"offline_{len(st.session_state.offline_contributions)}"
332
-
333
- # Add to session state
334
- st.session_state.offline_contributions.append(contribution_data)
335
-
336
- # Save to file
337
- offline_data = {
338
- 'contributions': st.session_state.offline_contributions,
339
- 'last_updated': st.session_state.get('current_timestamp', '')
340
- }
341
-
342
- os.makedirs(os.path.dirname(self.offline_storage_path), exist_ok=True)
343
- with open(self.offline_storage_path, 'w', encoding='utf-8') as f:
344
- json.dump(offline_data, f, indent=2, ensure_ascii=False)
345
-
346
- self.logger.info(f"Saved offline contribution: {contribution_data.get('offline_id')}")
347
- return True
348
-
349
- except Exception as e:
350
- self.logger.error(f"Failed to save offline contribution: {e}")
351
- return False
352
-
353
- def get_offline_contributions(self) -> List[Dict[str, Any]]:
354
- """Get all offline contributions"""
355
- return st.session_state.offline_contributions.copy()
356
-
357
- def clear_offline_contributions(self):
358
- """Clear all offline contributions after successful sync"""
359
- st.session_state.offline_contributions = []
360
-
361
- try:
362
- if os.path.exists(self.offline_storage_path):
363
- os.remove(self.offline_storage_path)
364
- except Exception as e:
365
- self.logger.error(f"Failed to clear offline storage file: {e}")
366
-
367
- def render_offline_status(self):
368
- """Render offline status and sync information"""
369
- if not st.session_state.is_online:
370
- st.warning("📡 You're currently offline. Your contributions will be saved locally and synced when connection is restored.")
371
-
372
- # Show offline contributions count
373
- offline_count = len(st.session_state.offline_contributions)
374
- if offline_count > 0:
375
- st.info(f"📱 {offline_count} contribution(s) saved offline, waiting for sync.")
376
-
377
- if st.button("🔄 Try Sync Now", key="manual_sync"):
378
- self.trigger_sync()
379
-
380
- def render_install_prompt(self):
381
- """Render PWA installation prompt"""
382
-
383
- install_prompt = """
384
- <script>
385
- function showInstallPrompt() {
386
- if (window.pwaInstallPrompt) {
387
- window.pwaInstallPrompt.prompt();
388
- window.pwaInstallPrompt.userChoice.then((choiceResult) => {
389
- if (choiceResult.outcome === 'accepted') {
390
- console.log('User accepted the install prompt');
391
- } else {
392
- console.log('User dismissed the install prompt');
393
- }
394
- window.pwaInstallPrompt = null;
395
- });
396
- } else {
397
- alert('Install prompt not available. You can manually install from your browser menu.');
398
- }
399
- }
400
- </script>
401
-
402
- <div class="pwa-install-banner">
403
- <h4>📱 Install Corpus Collection Engine</h4>
404
- <p>Install our app for the best offline experience and quick access!</p>
405
- <button class="pwa-install-button" onclick="showInstallPrompt()">
406
- Install App
407
- </button>
408
- </div>
409
- """
410
-
411
- # Only show install prompt if not already installed
412
- if not self._is_pwa_installed():
413
- st.components.v1.html(install_prompt, height=150)
414
-
415
- def _is_pwa_installed(self) -> bool:
416
- """Check if PWA is already installed"""
417
- # This is a simplified check - in reality, detection is more complex
418
- user_agent = st.context.headers.get("User-Agent", "")
419
- return "Mobile" in user_agent and "wv" in user_agent
420
-
421
- def trigger_sync(self):
422
- """Trigger manual sync of offline contributions"""
423
-
424
- sync_script = """
425
- <script>
426
- if ('serviceWorker' in navigator && navigator.serviceWorker.controller) {
427
- navigator.serviceWorker.controller.postMessage({
428
- type: 'TRIGGER_SYNC'
429
- });
430
-
431
- // Also trigger background sync if supported
432
- if ('sync' in window.ServiceWorkerRegistration.prototype) {
433
- navigator.serviceWorker.ready.then(registration => {
434
- return registration.sync.register('sync-contributions');
435
- }).then(() => {
436
- console.log('Background sync registered');
437
- }).catch(error => {
438
- console.error('Background sync registration failed:', error);
439
- });
440
- }
441
- }
442
- </script>
443
- """
444
-
445
- st.components.v1.html(sync_script, height=0)
446
- st.session_state.sync_status = "syncing"
447
-
448
- def get_pwa_status(self) -> Dict[str, Any]:
449
- """Get current PWA status"""
450
- return {
451
- 'initialized': st.session_state.pwa_initialized,
452
- 'online': st.session_state.is_online,
453
- 'offline_contributions': len(st.session_state.offline_contributions),
454
- 'sync_status': st.session_state.sync_status,
455
- 'cache_version': self.config['cache_version']
456
- }
457
-
458
- def render_pwa_debug_info(self):
459
- """Render PWA debug information (for development)"""
460
- if st.checkbox("🔧 Show PWA Debug Info", key="pwa_debug"):
461
- status = self.get_pwa_status()
462
- st.json(status)
463
-
464
- if st.button("Clear PWA Cache", key="clear_cache"):
465
- clear_cache_script = """
466
- <script>
467
- if ('serviceWorker' in navigator) {
468
- navigator.serviceWorker.controller?.postMessage({
469
- type: 'CLEAR_CACHE'
470
- });
471
-
472
- caches.keys().then(cacheNames => {
473
- return Promise.all(
474
- cacheNames.map(cacheName => caches.delete(cacheName))
475
- );
476
- }).then(() => {
477
- console.log('All caches cleared');
478
- alert('PWA cache cleared successfully!');
479
- });
480
- }
481
- </script>
482
- """
483
- st.components.v1.html(clear_cache_script, height=0)
484
-
485
- def optimize_for_low_bandwidth(self):
486
- """Apply optimizations for low-bandwidth environments"""
487
-
488
- # Inject bandwidth optimization styles and scripts
489
- optimization_html = """
490
- <style>
491
- /* Low bandwidth optimizations */
492
- img {
493
- max-width: 100%;
494
- height: auto;
495
- loading: lazy;
496
- }
497
-
498
- .stImage > img {
499
- max-height: 400px;
500
- object-fit: contain;
501
- }
502
-
503
- /* Reduce animations for slower connections */
504
- @media (prefers-reduced-motion: reduce) {
505
- * {
506
- animation-duration: 0.01ms !important;
507
- animation-iteration-count: 1 !important;
508
- transition-duration: 0.01ms !important;
509
- }
510
- }
511
-
512
- /* Compress text rendering */
513
- .stMarkdown {
514
- text-rendering: optimizeSpeed;
515
- }
516
- </style>
517
-
518
- <script>
519
- // Detect slow connection and apply optimizations
520
- if ('connection' in navigator) {
521
- const connection = navigator.connection;
522
-
523
- if (connection.effectiveType === 'slow-2g' || connection.effectiveType === '2g') {
524
- console.log('Slow connection detected, applying optimizations');
525
-
526
- // Reduce image quality
527
- document.querySelectorAll('img').forEach(img => {
528
- if (img.src && !img.dataset.optimized) {
529
- img.style.filter = 'blur(0.5px)'; // Slight blur to reduce perceived quality
530
- img.dataset.optimized = 'true';
531
- }
532
- });
533
-
534
- // Disable non-essential animations
535
- document.body.style.setProperty('--animation-duration', '0s');
536
- }
537
- }
538
- </script>
539
- """
540
-
541
- st.components.v1.html(optimization_html, height=0)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/pwa/service_worker.js DELETED
@@ -1,335 +0,0 @@
1
- /**
2
- * Service Worker for Corpus Collection Engine PWA
3
- * Provides offline functionality and caching for low-bandwidth environments
4
- */
5
-
6
- const CACHE_NAME = 'corpus-collection-v1.0.0';
7
- const OFFLINE_URL = '/offline.html';
8
-
9
- // Resources to cache for offline functionality
10
- const CACHE_URLS = [
11
- '/',
12
- '/offline.html',
13
- // Streamlit static assets will be added dynamically
14
- '/static/css/bootstrap.min.css',
15
- '/static/js/bootstrap.bundle.min.js',
16
- // Add other critical assets
17
- ];
18
-
19
- // Install event - cache critical resources
20
- self.addEventListener('install', event => {
21
- console.log('Service Worker: Installing...');
22
-
23
- event.waitUntil(
24
- caches.open(CACHE_NAME)
25
- .then(cache => {
26
- console.log('Service Worker: Caching critical resources');
27
- return cache.addAll(CACHE_URLS);
28
- })
29
- .then(() => {
30
- console.log('Service Worker: Installation complete');
31
- return self.skipWaiting();
32
- })
33
- .catch(error => {
34
- console.error('Service Worker: Installation failed', error);
35
- })
36
- );
37
- });
38
-
39
- // Activate event - clean up old caches
40
- self.addEventListener('activate', event => {
41
- console.log('Service Worker: Activating...');
42
-
43
- event.waitUntil(
44
- caches.keys()
45
- .then(cacheNames => {
46
- return Promise.all(
47
- cacheNames.map(cacheName => {
48
- if (cacheName !== CACHE_NAME) {
49
- console.log('Service Worker: Deleting old cache', cacheName);
50
- return caches.delete(cacheName);
51
- }
52
- })
53
- );
54
- })
55
- .then(() => {
56
- console.log('Service Worker: Activation complete');
57
- return self.clients.claim();
58
- })
59
- );
60
- });
61
-
62
- // Fetch event - implement caching strategies
63
- self.addEventListener('fetch', event => {
64
- const request = event.request;
65
- const url = new URL(request.url);
66
-
67
- // Skip non-GET requests
68
- if (request.method !== 'GET') {
69
- return;
70
- }
71
-
72
- // Handle different types of requests with appropriate strategies
73
- if (url.pathname.startsWith('/static/')) {
74
- // Static assets - Cache First strategy
75
- event.respondWith(cacheFirstStrategy(request));
76
- } else if (url.pathname.includes('api') || url.pathname.includes('_stcore')) {
77
- // API calls and Streamlit core - Network First strategy
78
- event.respondWith(networkFirstStrategy(request));
79
- } else if (url.pathname === '/' || url.pathname.includes('.html')) {
80
- // HTML pages - Stale While Revalidate strategy
81
- event.respondWith(staleWhileRevalidateStrategy(request));
82
- } else {
83
- // Default - Network First with offline fallback
84
- event.respondWith(networkFirstWithOfflineFallback(request));
85
- }
86
- });
87
-
88
- // Cache First Strategy - for static assets
89
- async function cacheFirstStrategy(request) {
90
- try {
91
- const cachedResponse = await caches.match(request);
92
- if (cachedResponse) {
93
- return cachedResponse;
94
- }
95
-
96
- const networkResponse = await fetch(request);
97
- if (networkResponse.ok) {
98
- const cache = await caches.open(CACHE_NAME);
99
- cache.put(request, networkResponse.clone());
100
- }
101
- return networkResponse;
102
- } catch (error) {
103
- console.error('Cache First Strategy failed:', error);
104
- return new Response('Resource not available offline', { status: 503 });
105
- }
106
- }
107
-
108
- // Network First Strategy - for dynamic content
109
- async function networkFirstStrategy(request) {
110
- try {
111
- const networkResponse = await fetch(request);
112
- if (networkResponse.ok) {
113
- const cache = await caches.open(CACHE_NAME);
114
- cache.put(request, networkResponse.clone());
115
- }
116
- return networkResponse;
117
- } catch (error) {
118
- console.log('Network failed, trying cache:', request.url);
119
- const cachedResponse = await caches.match(request);
120
- if (cachedResponse) {
121
- return cachedResponse;
122
- }
123
- throw error;
124
- }
125
- }
126
-
127
- // Stale While Revalidate Strategy - for HTML pages
128
- async function staleWhileRevalidateStrategy(request) {
129
- const cache = await caches.open(CACHE_NAME);
130
- const cachedResponse = await cache.match(request);
131
-
132
- const fetchPromise = fetch(request).then(networkResponse => {
133
- if (networkResponse.ok) {
134
- cache.put(request, networkResponse.clone());
135
- }
136
- return networkResponse;
137
- }).catch(error => {
138
- console.log('Network failed for:', request.url);
139
- return null;
140
- });
141
-
142
- return cachedResponse || await fetchPromise || await cache.match(OFFLINE_URL);
143
- }
144
-
145
- // Network First with Offline Fallback
146
- async function networkFirstWithOfflineFallback(request) {
147
- try {
148
- const networkResponse = await fetch(request);
149
- if (networkResponse.ok) {
150
- const cache = await caches.open(CACHE_NAME);
151
- cache.put(request, networkResponse.clone());
152
- }
153
- return networkResponse;
154
- } catch (error) {
155
- const cachedResponse = await caches.match(request);
156
- if (cachedResponse) {
157
- return cachedResponse;
158
- }
159
-
160
- // Return offline page for navigation requests
161
- if (request.mode === 'navigate') {
162
- return caches.match(OFFLINE_URL);
163
- }
164
-
165
- return new Response('Content not available offline', {
166
- status: 503,
167
- statusText: 'Service Unavailable'
168
- });
169
- }
170
- }
171
-
172
- // Background sync for offline submissions
173
- self.addEventListener('sync', event => {
174
- console.log('Service Worker: Background sync triggered', event.tag);
175
-
176
- if (event.tag === 'sync-contributions') {
177
- event.waitUntil(syncContributions());
178
- }
179
- });
180
-
181
- // Sync offline contributions when connection is restored
182
- async function syncContributions() {
183
- try {
184
- console.log('Service Worker: Syncing offline contributions...');
185
-
186
- // Get offline contributions from IndexedDB
187
- const contributions = await getOfflineContributions();
188
-
189
- for (const contribution of contributions) {
190
- try {
191
- const response = await fetch('/api/contributions', {
192
- method: 'POST',
193
- headers: {
194
- 'Content-Type': 'application/json',
195
- },
196
- body: JSON.stringify(contribution)
197
- });
198
-
199
- if (response.ok) {
200
- await removeOfflineContribution(contribution.id);
201
- console.log('Synced contribution:', contribution.id);
202
- } else {
203
- console.error('Failed to sync contribution:', contribution.id);
204
- }
205
- } catch (error) {
206
- console.error('Error syncing contribution:', error);
207
- }
208
- }
209
-
210
- // Notify the main thread about sync completion
211
- const clients = await self.clients.matchAll();
212
- clients.forEach(client => {
213
- client.postMessage({
214
- type: 'SYNC_COMPLETE',
215
- count: contributions.length
216
- });
217
- });
218
-
219
- } catch (error) {
220
- console.error('Background sync failed:', error);
221
- }
222
- }
223
-
224
- // IndexedDB operations for offline storage
225
- async function getOfflineContributions() {
226
- return new Promise((resolve, reject) => {
227
- const request = indexedDB.open('CorpusCollectionDB', 1);
228
-
229
- request.onerror = () => reject(request.error);
230
-
231
- request.onsuccess = () => {
232
- const db = request.result;
233
- const transaction = db.transaction(['offline_contributions'], 'readonly');
234
- const store = transaction.objectStore('offline_contributions');
235
- const getAllRequest = store.getAll();
236
-
237
- getAllRequest.onsuccess = () => resolve(getAllRequest.result);
238
- getAllRequest.onerror = () => reject(getAllRequest.error);
239
- };
240
-
241
- request.onupgradeneeded = (event) => {
242
- const db = event.target.result;
243
- if (!db.objectStoreNames.contains('offline_contributions')) {
244
- const store = db.createObjectStore('offline_contributions', { keyPath: 'id' });
245
- store.createIndex('timestamp', 'timestamp', { unique: false });
246
- }
247
- };
248
- });
249
- }
250
-
251
- async function removeOfflineContribution(id) {
252
- return new Promise((resolve, reject) => {
253
- const request = indexedDB.open('CorpusCollectionDB', 1);
254
-
255
- request.onsuccess = () => {
256
- const db = request.result;
257
- const transaction = db.transaction(['offline_contributions'], 'readwrite');
258
- const store = transaction.objectStore('offline_contributions');
259
- const deleteRequest = store.delete(id);
260
-
261
- deleteRequest.onsuccess = () => resolve();
262
- deleteRequest.onerror = () => reject(deleteRequest.error);
263
- };
264
- });
265
- }
266
-
267
- // Handle messages from the main thread
268
- self.addEventListener('message', event => {
269
- const { type, data } = event.data;
270
-
271
- switch (type) {
272
- case 'SKIP_WAITING':
273
- self.skipWaiting();
274
- break;
275
-
276
- case 'CACHE_URLS':
277
- cacheUrls(data.urls);
278
- break;
279
-
280
- case 'CLEAR_CACHE':
281
- clearCache();
282
- break;
283
-
284
- default:
285
- console.log('Unknown message type:', type);
286
- }
287
- });
288
-
289
- // Cache additional URLs dynamically
290
- async function cacheUrls(urls) {
291
- try {
292
- const cache = await caches.open(CACHE_NAME);
293
- await cache.addAll(urls);
294
- console.log('Cached additional URLs:', urls);
295
- } catch (error) {
296
- console.error('Failed to cache URLs:', error);
297
- }
298
- }
299
-
300
- // Clear all caches
301
- async function clearCache() {
302
- try {
303
- const cacheNames = await caches.keys();
304
- await Promise.all(cacheNames.map(name => caches.delete(name)));
305
- console.log('All caches cleared');
306
- } catch (error) {
307
- console.error('Failed to clear caches:', error);
308
- }
309
- }
310
-
311
- // Periodic cleanup of old cached data
312
- setInterval(async () => {
313
- try {
314
- const cache = await caches.open(CACHE_NAME);
315
- const requests = await cache.keys();
316
-
317
- // Remove old cached responses (older than 7 days)
318
- const oneWeekAgo = Date.now() - (7 * 24 * 60 * 60 * 1000);
319
-
320
- for (const request of requests) {
321
- const response = await cache.match(request);
322
- const dateHeader = response.headers.get('date');
323
-
324
- if (dateHeader) {
325
- const responseDate = new Date(dateHeader).getTime();
326
- if (responseDate < oneWeekAgo) {
327
- await cache.delete(request);
328
- console.log('Removed old cached response:', request.url);
329
- }
330
- }
331
- }
332
- } catch (error) {
333
- console.error('Cache cleanup failed:', error);
334
- }
335
- }, 24 * 60 * 60 * 1000); // Run daily
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/requirements.txt DELETED
@@ -1,6 +0,0 @@
1
- streamlit>=1.28.0
2
- pandas>=1.5.0
3
- numpy>=1.24.0
4
- Pillow>=9.0.0
5
- requests>=2.28.0
6
- python-dateutil>=2.8.0
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/services/__init__.py DELETED
@@ -1 +0,0 @@
1
- # Services module for AI, language processing, and validation services
 
 
intern_project/corpus_collection_engine/services/ai_service.py DELETED
@@ -1,417 +0,0 @@
1
- """
2
- AI Service for text generation, translation, and image processing
3
- """
4
-
5
- import logging
6
- from typing import Dict, List, Optional, Tuple, Any
7
- import json
8
- import time
9
- from datetime import datetime
10
-
11
- # For Hugging Face Spaces deployment, disable transformers to avoid auth issues
12
- TRANSFORMERS_AVAILABLE = False
13
-
14
- from corpus_collection_engine.config import AI_CONFIG, SUPPORTED_LANGUAGES
15
- from corpus_collection_engine.services.language_service import LanguageService
16
-
17
-
18
- class AIService:
19
- """Service for AI-powered text generation, translation, and processing"""
20
-
21
- def __init__(self):
22
- self.logger = logging.getLogger(__name__)
23
- self.language_service = LanguageService()
24
-
25
- # AI model configurations
26
- self.config = AI_CONFIG
27
- self.models = {}
28
- self.fallback_mode = True # Always use fallback for public deployment
29
-
30
- # Initialize models (will use fallback mode)
31
- self._initialize_models()
32
-
33
- # Circuit breaker for model failures
34
- self.circuit_breaker = {
35
- 'failures': 0,
36
- 'last_failure': None,
37
- 'threshold': 3,
38
- 'timeout': 300 # 5 minutes
39
- }
40
-
41
- def _initialize_models(self):
42
- """Initialize AI models with fallback handling"""
43
- try:
44
- if TRANSFORMERS_AVAILABLE:
45
- self.logger.info("Initializing AI models...")
46
-
47
- # For MVP, use lightweight models that are readily available
48
- # In production, replace with Sarvam-1 or other Indic language models
49
-
50
- # Text generation model (lightweight)
51
- try:
52
- # For Hugging Face Spaces deployment, disable model loading to avoid auth issues
53
- # Use fallback text generation instead
54
- self.models['text_generator'] = None
55
- self.logger.info("Text generation model disabled for public deployment")
56
- except Exception as e:
57
- self.logger.warning(f"Could not load text generation model: {e}")
58
-
59
- # Translation model (if available)
60
- try:
61
- # For MVP, we'll use a simple approach
62
- # In production, use proper Indic translation models
63
- self.models['translator'] = None # Placeholder
64
- self.logger.info("Translation service initialized")
65
- except Exception as e:
66
- self.logger.warning(f"Could not load translation model: {e}")
67
-
68
- else:
69
- self.logger.warning("Transformers library not available, using fallback mode")
70
- self.fallback_mode = True
71
-
72
- except Exception as e:
73
- self.logger.error(f"Error initializing AI models: {e}")
74
- self.fallback_mode = True
75
-
76
- def _is_circuit_breaker_open(self) -> bool:
77
- """Check if circuit breaker is open due to recent failures"""
78
- if self.circuit_breaker['failures'] < self.circuit_breaker['threshold']:
79
- return False
80
-
81
- if self.circuit_breaker['last_failure']:
82
- time_since_failure = time.time() - self.circuit_breaker['last_failure']
83
- if time_since_failure > self.circuit_breaker['timeout']:
84
- # Reset circuit breaker
85
- self.circuit_breaker['failures'] = 0
86
- self.circuit_breaker['last_failure'] = None
87
- return False
88
-
89
- return True
90
-
91
- def _record_failure(self):
92
- """Record a model failure for circuit breaker"""
93
- self.circuit_breaker['failures'] += 1
94
- self.circuit_breaker['last_failure'] = time.time()
95
-
96
- def _record_success(self):
97
- """Record a successful operation"""
98
- if self.circuit_breaker['failures'] > 0:
99
- self.circuit_breaker['failures'] = max(0, self.circuit_breaker['failures'] - 1)
100
-
101
- def generate_text(self, prompt: str, language: str = "en",
102
- max_length: int = 100) -> Tuple[Optional[str], float]:
103
- """
104
- Generate text based on prompt
105
-
106
- Args:
107
- prompt: Input prompt for text generation
108
- language: Target language for generation
109
- max_length: Maximum length of generated text
110
-
111
- Returns:
112
- Tuple of (generated_text, confidence_score)
113
- """
114
- if self._is_circuit_breaker_open():
115
- self.logger.warning("AI service circuit breaker is open")
116
- return self._fallback_text_generation(prompt, language), 0.3
117
-
118
- try:
119
- # For Hugging Face Spaces deployment, always use fallback mode
120
- # to avoid authentication issues with external models
121
- pass
122
-
123
- except Exception as e:
124
- self.logger.error(f"Error in text generation: {e}")
125
- self._record_failure()
126
-
127
- # Fallback to rule-based generation
128
- return self._fallback_text_generation(prompt, language), 0.4
129
-
130
- def _format_prompt_for_language(self, prompt: str, language: str) -> str:
131
- """Format prompt based on target language"""
132
- if language == "en":
133
- return prompt
134
-
135
- # For Indic languages, add context
136
- lang_name = self.language_service.get_language_name(language)
137
- return f"In {lang_name}: {prompt}"
138
-
139
- def _fallback_text_generation(self, prompt: str, language: str) -> str:
140
- """Fallback text generation using templates"""
141
- # Simple template-based generation for common scenarios
142
- templates = {
143
- "meme_caption": [
144
- "When you {prompt}",
145
- "That moment when {prompt}",
146
- "Me: {prompt}",
147
- "{prompt} be like:",
148
- "POV: {prompt}"
149
- ],
150
- "recipe_suggestion": [
151
- "Try adding {prompt} for better taste",
152
- "This {prompt} recipe is perfect for festivals",
153
- "Traditional {prompt} with a modern twist",
154
- "Family recipe for {prompt}"
155
- ],
156
- "story_continuation": [
157
- "Once upon a time, {prompt}",
158
- "In the village, {prompt}",
159
- "The wise elder said, {prompt}",
160
- "As the story goes, {prompt}"
161
- ]
162
- }
163
-
164
- # Detect prompt type and use appropriate template
165
- prompt_lower = prompt.lower()
166
- if any(word in prompt_lower for word in ["meme", "funny", "joke"]):
167
- template_list = templates["meme_caption"]
168
- elif any(word in prompt_lower for word in ["recipe", "cook", "food"]):
169
- template_list = templates["recipe_suggestion"]
170
- elif any(word in prompt_lower for word in ["story", "tale", "once"]):
171
- template_list = templates["story_continuation"]
172
- else:
173
- # Generic response
174
- return f"Here's something about {prompt}..."
175
-
176
- # Select a random template
177
- import random
178
- template = random.choice(template_list)
179
- return template.format(prompt=prompt)
180
-
181
- def translate_text(self, text: str, source_lang: str,
182
- target_lang: str) -> Tuple[Optional[str], float]:
183
- """
184
- Translate text between languages
185
-
186
- Args:
187
- text: Text to translate
188
- source_lang: Source language code
189
- target_lang: Target language code
190
-
191
- Returns:
192
- Tuple of (translated_text, confidence_score)
193
- """
194
- if self._is_circuit_breaker_open():
195
- return self._fallback_translation(text, source_lang, target_lang), 0.2
196
-
197
- try:
198
- # For MVP, we'll use a simple approach
199
- # In production, use proper translation models like IndicTrans
200
-
201
- if source_lang == target_lang:
202
- return text, 1.0
203
-
204
- # Placeholder for actual translation
205
- # In production, integrate with translation APIs or models
206
- translated = self._fallback_translation(text, source_lang, target_lang)
207
- return translated, 0.6
208
-
209
- except Exception as e:
210
- self.logger.error(f"Error in translation: {e}")
211
- self._record_failure()
212
- return self._fallback_translation(text, source_lang, target_lang), 0.3
213
-
214
- def _fallback_translation(self, text: str, source_lang: str, target_lang: str) -> str:
215
- """Fallback translation using simple rules"""
216
- # For MVP, return original text with language indicator
217
- # In production, implement proper translation
218
-
219
- if source_lang == target_lang:
220
- return text
221
-
222
- source_name = self.language_service.get_language_name(source_lang)
223
- target_name = self.language_service.get_language_name(target_lang)
224
-
225
- return f"[{source_name} → {target_name}] {text}"
226
-
227
- def generate_caption(self, image_description: str, language: str = "en") -> Tuple[Optional[str], float]:
228
- """
229
- Generate caption for image based on description
230
-
231
- Args:
232
- image_description: Description of the image
233
- language: Target language for caption
234
-
235
- Returns:
236
- Tuple of (caption, confidence_score)
237
- """
238
- # Use text generation with image-specific prompts
239
- prompts = [
240
- f"Caption for image showing {image_description}:",
241
- f"Describe this image: {image_description}",
242
- f"What's happening in this picture of {image_description}?"
243
- ]
244
-
245
- import random
246
- prompt = random.choice(prompts)
247
-
248
- return self.generate_text(prompt, language, max_length=50)
249
-
250
- def suggest_cultural_tags(self, content: str, language: str,
251
- region: Optional[str] = None) -> List[str]:
252
- """
253
- Suggest cultural tags based on content
254
-
255
- Args:
256
- content: Text content to analyze
257
- language: Language of the content
258
- region: Optional region information
259
-
260
- Returns:
261
- List of suggested cultural tags
262
- """
263
- tags = []
264
- content_lower = content.lower()
265
-
266
- # Festival-related tags
267
- festivals = {
268
- "diwali": ["festival", "lights", "celebration", "hindu"],
269
- "holi": ["festival", "colors", "spring", "celebration"],
270
- "eid": ["festival", "islamic", "celebration", "community"],
271
- "christmas": ["festival", "christian", "celebration", "winter"],
272
- "dussehra": ["festival", "victory", "hindu", "tradition"],
273
- "ganesh": ["festival", "hindu", "elephant", "wisdom"],
274
- "navratri": ["festival", "dance", "hindu", "goddess"]
275
- }
276
-
277
- for festival, festival_tags in festivals.items():
278
- if festival in content_lower:
279
- tags.extend(festival_tags)
280
-
281
- # Food-related tags
282
- foods = {
283
- "biryani": ["food", "rice", "spices", "traditional"],
284
- "curry": ["food", "spices", "traditional", "sauce"],
285
- "roti": ["food", "bread", "staple", "traditional"],
286
- "dal": ["food", "lentils", "protein", "staple"],
287
- "samosa": ["food", "snack", "fried", "traditional"],
288
- "lassi": ["drink", "yogurt", "traditional", "cooling"]
289
- }
290
-
291
- for food, food_tags in foods.items():
292
- if food in content_lower:
293
- tags.extend(food_tags)
294
-
295
- # Regional tags
296
- if region:
297
- region_lower = region.lower()
298
- regional_tags = {
299
- "maharashtra": ["marathi", "western_india", "mumbai"],
300
- "karnataka": ["kannada", "southern_india", "bangalore"],
301
- "tamil nadu": ["tamil", "southern_india", "chennai"],
302
- "kerala": ["malayalam", "southern_india", "backwaters"],
303
- "punjab": ["punjabi", "northern_india", "agriculture"],
304
- "bengal": ["bengali", "eastern_india", "kolkata"],
305
- "gujarat": ["gujarati", "western_india", "business"]
306
- }
307
-
308
- for region_key, region_tags in regional_tags.items():
309
- if region_key in region_lower:
310
- tags.extend(region_tags)
311
-
312
- # Language-specific tags
313
- if language != "en":
314
- tags.append("multilingual")
315
- tags.append(f"{language}_language")
316
-
317
- # Remove duplicates and return
318
- return list(set(tags))
319
-
320
- def analyze_sentiment(self, text: str, language: str = "en") -> Dict[str, float]:
321
- """
322
- Analyze sentiment of text
323
-
324
- Args:
325
- text: Text to analyze
326
- language: Language of the text
327
-
328
- Returns:
329
- Dictionary with sentiment scores
330
- """
331
- # Simple rule-based sentiment analysis for MVP
332
- # In production, use proper sentiment analysis models
333
-
334
- positive_words = [
335
- "good", "great", "excellent", "amazing", "wonderful", "beautiful",
336
- "love", "like", "happy", "joy", "celebration", "festival",
337
- "अच्छा", "सुंदर", "खुशी", "प्रेम" # Hindi examples
338
- ]
339
-
340
- negative_words = [
341
- "bad", "terrible", "awful", "hate", "sad", "angry", "disappointed",
342
- "बुरा", "गुस्सा", "दुख" # Hindi examples
343
- ]
344
-
345
- text_lower = text.lower()
346
- positive_count = sum(1 for word in positive_words if word in text_lower)
347
- negative_count = sum(1 for word in negative_words if word in text_lower)
348
- total_words = len(text.split())
349
-
350
- if total_words == 0:
351
- return {"positive": 0.5, "negative": 0.5, "neutral": 0.0}
352
-
353
- positive_score = positive_count / total_words
354
- negative_score = negative_count / total_words
355
- neutral_score = max(0, 1 - positive_score - negative_score)
356
-
357
- return {
358
- "positive": min(1.0, positive_score * 2),
359
- "negative": min(1.0, negative_score * 2),
360
- "neutral": neutral_score
361
- }
362
-
363
- def extract_keywords(self, text: str, language: str = "en",
364
- max_keywords: int = 10) -> List[str]:
365
- """
366
- Extract keywords from text
367
-
368
- Args:
369
- text: Text to analyze
370
- language: Language of the text
371
- max_keywords: Maximum number of keywords to return
372
-
373
- Returns:
374
- List of extracted keywords
375
- """
376
- # Simple keyword extraction for MVP
377
- # In production, use proper NLP libraries
378
-
379
- # Common stop words (basic list)
380
- stop_words = {
381
- "en": {"the", "a", "an", "and", "or", "but", "in", "on", "at", "to", "for", "of", "with", "by"},
382
- "hi": {"और", "या", "में", "पर", "से", "को", "का", "की", "के", "है", "हैं", "था", "थी", "थे"}
383
- }
384
-
385
- # Get stop words for language
386
- lang_stop_words = stop_words.get(language, stop_words["en"])
387
-
388
- # Simple tokenization and filtering
389
- words = text.lower().split()
390
- keywords = []
391
-
392
- for word in words:
393
- # Remove punctuation
394
- word = ''.join(char for char in word if char.isalnum())
395
-
396
- # Filter out stop words and short words
397
- if len(word) > 2 and word not in lang_stop_words:
398
- keywords.append(word)
399
-
400
- # Count frequency and return most common
401
- from collections import Counter
402
- word_counts = Counter(keywords)
403
-
404
- return [word for word, count in word_counts.most_common(max_keywords)]
405
-
406
- def get_service_status(self) -> Dict[str, Any]:
407
- """Get current status of AI service"""
408
- return {
409
- "fallback_mode": self.fallback_mode,
410
- "models_loaded": list(self.models.keys()),
411
- "circuit_breaker": {
412
- "failures": self.circuit_breaker["failures"],
413
- "is_open": self._is_circuit_breaker_open()
414
- },
415
- "transformers_available": TRANSFORMERS_AVAILABLE,
416
- "last_updated": datetime.now().isoformat()
417
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/services/analytics_service.py DELETED
@@ -1,766 +0,0 @@
1
- """
2
- Analytics and Metrics Collection Service
3
- """
4
-
5
- import streamlit as st
6
- from typing import Dict, List, Any, Optional, Tuple
7
- from datetime import datetime, timedelta
8
- from dataclasses import dataclass
9
- from enum import Enum
10
- import json
11
- import logging
12
- import pandas as pd
13
- from collections import defaultdict, Counter
14
-
15
- from corpus_collection_engine.models.data_models import UserContribution, ActivityType, ValidationStatus
16
- from corpus_collection_engine.services.storage_service import StorageService
17
- from corpus_collection_engine.services.language_service import LanguageService
18
- from corpus_collection_engine.services.engagement_service import EngagementService
19
-
20
-
21
- class MetricType(Enum):
22
- """Types of metrics to track"""
23
- CONTRIBUTION_COUNT = "contribution_count"
24
- USER_ENGAGEMENT = "user_engagement"
25
- LANGUAGE_DIVERSITY = "language_diversity"
26
- QUALITY_SCORE = "quality_score"
27
- CULTURAL_IMPACT = "cultural_impact"
28
- GEOGRAPHIC_DISTRIBUTION = "geographic_distribution"
29
- ACTIVITY_POPULARITY = "activity_popularity"
30
- RETENTION_RATE = "retention_rate"
31
-
32
-
33
- @dataclass
34
- class MetricSnapshot:
35
- """Snapshot of a metric at a point in time"""
36
- metric_type: MetricType
37
- value: float
38
- timestamp: datetime
39
- metadata: Dict[str, Any]
40
-
41
-
42
- @dataclass
43
- class AnalyticsReport:
44
- """Comprehensive analytics report"""
45
- report_id: str
46
- generated_at: datetime
47
- total_contributions: int
48
- unique_contributors: int
49
- language_distribution: Dict[str, int]
50
- activity_distribution: Dict[str, int]
51
- regional_distribution: Dict[str, int]
52
- quality_metrics: Dict[str, float]
53
- engagement_metrics: Dict[str, float]
54
- growth_metrics: Dict[str, float]
55
- cultural_impact_score: float
56
- recommendations: List[str]
57
-
58
-
59
- class AnalyticsService:
60
- """Service for collecting and analyzing platform metrics"""
61
-
62
- def __init__(self):
63
- self.logger = logging.getLogger(__name__)
64
- self.storage_service = StorageService()
65
- self.language_service = LanguageService()
66
- self.engagement_service = EngagementService()
67
-
68
- # Initialize analytics tracking
69
- if 'analytics_initialized' not in st.session_state:
70
- st.session_state.analytics_initialized = False
71
- st.session_state.metrics_cache = {}
72
- st.session_state.last_analytics_update = None
73
-
74
- def initialize_analytics(self):
75
- """Initialize analytics tracking"""
76
- if st.session_state.analytics_initialized:
77
- return
78
-
79
- try:
80
- # Set up analytics tracking
81
- st.session_state.analytics_initialized = True
82
- st.session_state.last_analytics_update = datetime.now()
83
-
84
- self.logger.info("Analytics service initialized")
85
-
86
- except Exception as e:
87
- self.logger.error(f"Analytics initialization failed: {e}")
88
-
89
- def generate_comprehensive_report(self) -> AnalyticsReport:
90
- """Generate comprehensive analytics report"""
91
- try:
92
- # Get all contributions for analysis
93
- all_contributions = self._get_all_contributions()
94
-
95
- # Calculate basic metrics
96
- total_contributions = len(all_contributions)
97
- unique_contributors = len(set(contrib.user_session for contrib in all_contributions))
98
-
99
- # Language distribution
100
- language_distribution = self._calculate_language_distribution(all_contributions)
101
-
102
- # Activity distribution
103
- activity_distribution = self._calculate_activity_distribution(all_contributions)
104
-
105
- # Regional distribution
106
- regional_distribution = self._calculate_regional_distribution(all_contributions)
107
-
108
- # Quality metrics
109
- quality_metrics = self._calculate_quality_metrics(all_contributions)
110
-
111
- # Engagement metrics
112
- engagement_metrics = self._calculate_engagement_metrics(all_contributions)
113
-
114
- # Growth metrics
115
- growth_metrics = self._calculate_growth_metrics(all_contributions)
116
-
117
- # Cultural impact score
118
- cultural_impact_score = self._calculate_platform_cultural_impact(all_contributions)
119
-
120
- # Generate recommendations
121
- recommendations = self._generate_recommendations(
122
- all_contributions, language_distribution, activity_distribution
123
- )
124
-
125
- return AnalyticsReport(
126
- report_id=f"report_{datetime.now().strftime('%Y%m%d_%H%M%S')}",
127
- generated_at=datetime.now(),
128
- total_contributions=total_contributions,
129
- unique_contributors=unique_contributors,
130
- language_distribution=language_distribution,
131
- activity_distribution=activity_distribution,
132
- regional_distribution=regional_distribution,
133
- quality_metrics=quality_metrics,
134
- engagement_metrics=engagement_metrics,
135
- growth_metrics=growth_metrics,
136
- cultural_impact_score=cultural_impact_score,
137
- recommendations=recommendations
138
- )
139
-
140
- except Exception as e:
141
- self.logger.error(f"Error generating analytics report: {e}")
142
- return self._create_empty_report()
143
-
144
- def _get_all_contributions(self) -> List[UserContribution]:
145
- """Get all contributions from storage"""
146
- all_contributions = []
147
-
148
- # Get contributions for all supported languages
149
- supported_languages = self.language_service.get_supported_languages_list()
150
-
151
- for lang_info in supported_languages:
152
- lang_code = lang_info['code']
153
- contributions = self.storage_service.get_contributions_by_language(lang_code, limit=10000)
154
- all_contributions.extend(contributions)
155
-
156
- # Remove duplicates based on contribution ID
157
- seen_ids = set()
158
- unique_contributions = []
159
- for contrib in all_contributions:
160
- if contrib.id not in seen_ids:
161
- seen_ids.add(contrib.id)
162
- unique_contributions.append(contrib)
163
-
164
- return unique_contributions
165
-
166
- def _calculate_language_distribution(self, contributions: List[UserContribution]) -> Dict[str, int]:
167
- """Calculate distribution of contributions by language"""
168
- language_counts = Counter(contrib.language for contrib in contributions)
169
- return dict(language_counts)
170
-
171
- def _calculate_activity_distribution(self, contributions: List[UserContribution]) -> Dict[str, int]:
172
- """Calculate distribution of contributions by activity type"""
173
- activity_counts = Counter(contrib.activity_type.value for contrib in contributions)
174
- return dict(activity_counts)
175
-
176
- def _calculate_regional_distribution(self, contributions: List[UserContribution]) -> Dict[str, int]:
177
- """Calculate distribution of contributions by region"""
178
- regional_counts = defaultdict(int)
179
-
180
- for contrib in contributions:
181
- region = contrib.cultural_context.get('region', 'Unknown')
182
- if region and region.strip():
183
- regional_counts[region.strip()] += 1
184
-
185
- return dict(regional_counts)
186
-
187
- def _calculate_quality_metrics(self, contributions: List[UserContribution]) -> Dict[str, float]:
188
- """Calculate quality-related metrics"""
189
- if not contributions:
190
- return {}
191
-
192
- # Calculate average content length by activity
193
- activity_lengths = defaultdict(list)
194
- for contrib in contributions:
195
- content_length = len(str(contrib.content_data))
196
- activity_lengths[contrib.activity_type.value].append(content_length)
197
-
198
- quality_metrics = {}
199
-
200
- # Average content length per activity
201
- for activity, lengths in activity_lengths.items():
202
- quality_metrics[f"avg_content_length_{activity}"] = sum(lengths) / len(lengths)
203
-
204
- # Overall average content length
205
- all_lengths = [len(str(contrib.content_data)) for contrib in contributions]
206
- quality_metrics["avg_content_length_overall"] = sum(all_lengths) / len(all_lengths)
207
-
208
- # Percentage with cultural context
209
- with_cultural_context = sum(1 for contrib in contributions
210
- if contrib.cultural_context.get('cultural_significance'))
211
- quality_metrics["cultural_context_percentage"] = (with_cultural_context / len(contributions)) * 100
212
-
213
- # Percentage with regional information
214
- with_region = sum(1 for contrib in contributions
215
- if contrib.cultural_context.get('region'))
216
- quality_metrics["regional_info_percentage"] = (with_region / len(contributions)) * 100
217
-
218
- return quality_metrics
219
-
220
- def _calculate_engagement_metrics(self, contributions: List[UserContribution]) -> Dict[str, float]:
221
- """Calculate user engagement metrics"""
222
- if not contributions:
223
- return {}
224
-
225
- # Group contributions by user session
226
- user_contributions = defaultdict(list)
227
- for contrib in contributions:
228
- user_contributions[contrib.user_session].append(contrib)
229
-
230
- engagement_metrics = {}
231
-
232
- # Average contributions per user
233
- engagement_metrics["avg_contributions_per_user"] = len(contributions) / len(user_contributions)
234
-
235
- # User retention (users with multiple contributions)
236
- multi_contribution_users = sum(1 for contribs in user_contributions.values() if len(contribs) > 1)
237
- engagement_metrics["user_retention_rate"] = (multi_contribution_users / len(user_contributions)) * 100
238
-
239
- # Activity diversity per user
240
- user_activity_diversity = []
241
- for contribs in user_contributions.values():
242
- unique_activities = len(set(contrib.activity_type for contrib in contribs))
243
- user_activity_diversity.append(unique_activities)
244
-
245
- engagement_metrics["avg_activity_diversity_per_user"] = sum(user_activity_diversity) / len(user_activity_diversity)
246
-
247
- # Language diversity per user
248
- user_language_diversity = []
249
- for contribs in user_contributions.values():
250
- unique_languages = len(set(contrib.language for contrib in contribs))
251
- user_language_diversity.append(unique_languages)
252
-
253
- engagement_metrics["avg_language_diversity_per_user"] = sum(user_language_diversity) / len(user_language_diversity)
254
-
255
- return engagement_metrics
256
-
257
- def _calculate_growth_metrics(self, contributions: List[UserContribution]) -> Dict[str, float]:
258
- """Calculate growth and trend metrics"""
259
- if not contributions:
260
- return {}
261
-
262
- # Sort contributions by timestamp
263
- sorted_contributions = sorted(contributions, key=lambda x: x.timestamp)
264
-
265
- growth_metrics = {}
266
-
267
- # Daily contribution counts for the last 30 days
268
- now = datetime.now()
269
- daily_counts = defaultdict(int)
270
-
271
- for contrib in sorted_contributions:
272
- days_ago = (now - contrib.timestamp).days
273
- if days_ago <= 30:
274
- date_key = contrib.timestamp.date()
275
- daily_counts[date_key] += 1
276
-
277
- # Calculate growth rate (last 7 days vs previous 7 days)
278
- last_7_days = sum(count for date, count in daily_counts.items()
279
- if (now.date() - date).days <= 7)
280
- previous_7_days = sum(count for date, count in daily_counts.items()
281
- if 7 < (now.date() - date).days <= 14)
282
-
283
- if previous_7_days > 0:
284
- growth_metrics["weekly_growth_rate"] = ((last_7_days - previous_7_days) / previous_7_days) * 100
285
- else:
286
- growth_metrics["weekly_growth_rate"] = 0.0
287
-
288
- # Average daily contributions
289
- if daily_counts:
290
- growth_metrics["avg_daily_contributions"] = sum(daily_counts.values()) / len(daily_counts)
291
- else:
292
- growth_metrics["avg_daily_contributions"] = 0.0
293
-
294
- # Peak day contribution count
295
- growth_metrics["peak_daily_contributions"] = max(daily_counts.values()) if daily_counts else 0
296
-
297
- return growth_metrics
298
-
299
- def _calculate_platform_cultural_impact(self, contributions: List[UserContribution]) -> float:
300
- """Calculate overall platform cultural impact score"""
301
- if not contributions:
302
- return 0.0
303
-
304
- impact_score = 0.0
305
-
306
- # Base score for total contributions
307
- impact_score += len(contributions) * 10
308
-
309
- # Bonus for language diversity
310
- unique_languages = len(set(contrib.language for contrib in contributions))
311
- impact_score += unique_languages * 50
312
-
313
- # Bonus for regional diversity
314
- unique_regions = len(set(contrib.cultural_context.get('region', '')
315
- for contrib in contributions
316
- if contrib.cultural_context.get('region')))
317
- impact_score += unique_regions * 30
318
-
319
- # Bonus for activity diversity
320
- unique_activities = len(set(contrib.activity_type for contrib in contributions))
321
- impact_score += unique_activities * 40
322
-
323
- # Bonus for cultural context richness
324
- with_cultural_significance = sum(1 for contrib in contributions
325
- if contrib.cultural_context.get('cultural_significance'))
326
- impact_score += with_cultural_significance * 5
327
-
328
- # Normalize to 0-100 scale
329
- max_possible_score = len(contributions) * 100 # Rough estimate
330
- normalized_score = min(100.0, (impact_score / max_possible_score) * 100) if max_possible_score > 0 else 0.0
331
-
332
- return round(normalized_score, 1)
333
-
334
- def _generate_recommendations(self, contributions: List[UserContribution],
335
- language_dist: Dict[str, int],
336
- activity_dist: Dict[str, int]) -> List[str]:
337
- """Generate actionable recommendations based on analytics"""
338
- recommendations = []
339
-
340
- if not contributions:
341
- recommendations.append("Start collecting contributions to generate meaningful analytics")
342
- return recommendations
343
-
344
- # Language diversity recommendations
345
- if len(language_dist) < 3:
346
- recommendations.append("Encourage contributions in more Indian languages to increase diversity")
347
-
348
- # Activity balance recommendations
349
- if activity_dist:
350
- min_activity_count = min(activity_dist.values())
351
- max_activity_count = max(activity_dist.values())
352
-
353
- if max_activity_count > min_activity_count * 3:
354
- underrepresented_activities = [activity for activity, count in activity_dist.items()
355
- if count == min_activity_count]
356
- recommendations.append(f"Promote {', '.join(underrepresented_activities)} activities to balance contribution types")
357
-
358
- # Quality recommendations
359
- quality_metrics = self._calculate_quality_metrics(contributions)
360
- cultural_context_pct = quality_metrics.get("cultural_context_percentage", 0)
361
-
362
- if cultural_context_pct < 70:
363
- recommendations.append("Encourage users to provide more cultural context in their contributions")
364
-
365
- # Engagement recommendations
366
- engagement_metrics = self._calculate_engagement_metrics(contributions)
367
- retention_rate = engagement_metrics.get("user_retention_rate", 0)
368
-
369
- if retention_rate < 30:
370
- recommendations.append("Implement strategies to improve user retention and repeat contributions")
371
-
372
- # Growth recommendations
373
- growth_metrics = self._calculate_growth_metrics(contributions)
374
- weekly_growth = growth_metrics.get("weekly_growth_rate", 0)
375
-
376
- if weekly_growth < 10:
377
- recommendations.append("Focus on user acquisition strategies to increase weekly growth")
378
-
379
- return recommendations
380
-
381
- def _create_empty_report(self) -> AnalyticsReport:
382
- """Create empty analytics report for error cases"""
383
- return AnalyticsReport(
384
- report_id=f"empty_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}",
385
- generated_at=datetime.now(),
386
- total_contributions=0,
387
- unique_contributors=0,
388
- language_distribution={},
389
- activity_distribution={},
390
- regional_distribution={},
391
- quality_metrics={},
392
- engagement_metrics={},
393
- growth_metrics={},
394
- cultural_impact_score=0.0,
395
- recommendations=["No data available for analysis"]
396
- )
397
-
398
- def render_analytics_dashboard(self):
399
- """Render comprehensive analytics dashboard"""
400
- st.title("📊 Analytics Dashboard")
401
- st.markdown("*Insights into cultural preservation impact*")
402
-
403
- # Generate report
404
- with st.spinner("Generating analytics report..."):
405
- report = self.generate_comprehensive_report()
406
-
407
- # Overview metrics
408
- st.subheader("🌟 Platform Overview")
409
-
410
- col1, col2, col3, col4 = st.columns(4)
411
-
412
- with col1:
413
- st.metric(
414
- "Total Contributions",
415
- report.total_contributions,
416
- delta=f"+{report.growth_metrics.get('weekly_growth_rate', 0):.1f}% this week" if report.growth_metrics else None
417
- )
418
-
419
- with col2:
420
- st.metric(
421
- "Active Contributors",
422
- report.unique_contributors,
423
- delta=f"{report.engagement_metrics.get('user_retention_rate', 0):.1f}% retention" if report.engagement_metrics else None
424
- )
425
-
426
- with col3:
427
- st.metric(
428
- "Languages Covered",
429
- len(report.language_distribution),
430
- delta=f"{len(report.language_distribution)} of 11 supported"
431
- )
432
-
433
- with col4:
434
- st.metric(
435
- "Cultural Impact",
436
- f"{report.cultural_impact_score}/100",
437
- delta=f"Platform-wide score"
438
- )
439
-
440
- # Language Distribution
441
- if report.language_distribution:
442
- st.subheader("🌍 Language Distribution")
443
-
444
- # Create language chart
445
- lang_df = pd.DataFrame(
446
- list(report.language_distribution.items()),
447
- columns=['Language', 'Contributions']
448
- )
449
-
450
- # Map language codes to names
451
- lang_names = {}
452
- for lang_info in self.language_service.get_supported_languages_list():
453
- lang_names[lang_info['code']] = lang_info['name']
454
-
455
- lang_df['Language Name'] = lang_df['Language'].map(lambda x: lang_names.get(x, x))
456
-
457
- col1, col2 = st.columns([2, 1])
458
-
459
- with col1:
460
- st.bar_chart(lang_df.set_index('Language Name')['Contributions'])
461
-
462
- with col2:
463
- st.dataframe(
464
- lang_df[['Language Name', 'Contributions']].sort_values('Contributions', ascending=False),
465
- use_container_width=True
466
- )
467
-
468
- # Activity Distribution
469
- if report.activity_distribution:
470
- st.subheader("🎭 Activity Popularity")
471
-
472
- activity_names = {
473
- 'meme': '🎭 Memes',
474
- 'recipe': '🍛 Recipes',
475
- 'folklore': '📚 Folklore',
476
- 'landmark': '🏛️ Landmarks'
477
- }
478
-
479
- activity_df = pd.DataFrame(
480
- list(report.activity_distribution.items()),
481
- columns=['Activity', 'Contributions']
482
- )
483
- activity_df['Activity Name'] = activity_df['Activity'].map(lambda x: activity_names.get(x, x.title()))
484
-
485
- col1, col2 = st.columns([2, 1])
486
-
487
- with col1:
488
- st.bar_chart(activity_df.set_index('Activity Name')['Contributions'])
489
-
490
- with col2:
491
- for _, row in activity_df.iterrows():
492
- st.metric(row['Activity Name'], row['Contributions'])
493
-
494
- # Regional Distribution
495
- if report.regional_distribution:
496
- st.subheader("📍 Regional Contributions")
497
-
498
- # Show top regions
499
- sorted_regions = sorted(report.regional_distribution.items(), key=lambda x: x[1], reverse=True)
500
-
501
- col1, col2 = st.columns([2, 1])
502
-
503
- with col1:
504
- region_df = pd.DataFrame(sorted_regions[:10], columns=['Region', 'Contributions'])
505
- st.bar_chart(region_df.set_index('Region')['Contributions'])
506
-
507
- with col2:
508
- st.markdown("**Top Regions:**")
509
- for region, count in sorted_regions[:5]:
510
- st.markdown(f"• {region}: {count}")
511
-
512
- # Quality Metrics
513
- if report.quality_metrics:
514
- st.subheader("💎 Quality Metrics")
515
-
516
- col1, col2, col3 = st.columns(3)
517
-
518
- with col1:
519
- cultural_context_pct = report.quality_metrics.get("cultural_context_percentage", 0)
520
- st.metric(
521
- "Cultural Context",
522
- f"{cultural_context_pct:.1f}%",
523
- delta="of contributions"
524
- )
525
-
526
- with col2:
527
- regional_info_pct = report.quality_metrics.get("regional_info_percentage", 0)
528
- st.metric(
529
- "Regional Info",
530
- f"{regional_info_pct:.1f}%",
531
- delta="with location"
532
- )
533
-
534
- with col3:
535
- avg_length = report.quality_metrics.get("avg_content_length_overall", 0)
536
- st.metric(
537
- "Avg Content Length",
538
- f"{avg_length:.0f}",
539
- delta="characters"
540
- )
541
-
542
- # Engagement Metrics
543
- if report.engagement_metrics:
544
- st.subheader("🤝 User Engagement")
545
-
546
- col1, col2, col3 = st.columns(3)
547
-
548
- with col1:
549
- avg_contributions = report.engagement_metrics.get("avg_contributions_per_user", 0)
550
- st.metric(
551
- "Avg Contributions",
552
- f"{avg_contributions:.1f}",
553
- delta="per user"
554
- )
555
-
556
- with col2:
557
- retention_rate = report.engagement_metrics.get("user_retention_rate", 0)
558
- st.metric(
559
- "User Retention",
560
- f"{retention_rate:.1f}%",
561
- delta="return users"
562
- )
563
-
564
- with col3:
565
- activity_diversity = report.engagement_metrics.get("avg_activity_diversity_per_user", 0)
566
- st.metric(
567
- "Activity Diversity",
568
- f"{activity_diversity:.1f}",
569
- delta="activities per user"
570
- )
571
-
572
- # Growth Trends
573
- if report.growth_metrics:
574
- st.subheader("📈 Growth Trends")
575
-
576
- col1, col2, col3 = st.columns(3)
577
-
578
- with col1:
579
- weekly_growth = report.growth_metrics.get("weekly_growth_rate", 0)
580
- st.metric(
581
- "Weekly Growth",
582
- f"{weekly_growth:+.1f}%",
583
- delta="vs previous week"
584
- )
585
-
586
- with col2:
587
- avg_daily = report.growth_metrics.get("avg_daily_contributions", 0)
588
- st.metric(
589
- "Daily Average",
590
- f"{avg_daily:.1f}",
591
- delta="contributions"
592
- )
593
-
594
- with col3:
595
- peak_daily = report.growth_metrics.get("peak_daily_contributions", 0)
596
- st.metric(
597
- "Peak Day",
598
- f"{peak_daily}",
599
- delta="contributions"
600
- )
601
-
602
- # Recommendations
603
- if report.recommendations:
604
- st.subheader("💡 Recommendations")
605
-
606
- for i, recommendation in enumerate(report.recommendations, 1):
607
- st.markdown(f"{i}. {recommendation}")
608
-
609
- # Export options
610
- st.subheader("📤 Export Data")
611
-
612
- col1, col2 = st.columns(2)
613
-
614
- with col1:
615
- if st.button("📊 Export Analytics Report", use_container_width=True):
616
- report_json = self._export_report_to_json(report)
617
- st.download_button(
618
- label="Download JSON Report",
619
- data=report_json,
620
- file_name=f"analytics_report_{report.report_id}.json",
621
- mime="application/json"
622
- )
623
-
624
- with col2:
625
- if st.button("📈 Export Contribution Data", use_container_width=True):
626
- contributions_csv = self._export_contributions_to_csv()
627
- if contributions_csv:
628
- st.download_button(
629
- label="Download CSV Data",
630
- data=contributions_csv,
631
- file_name=f"contributions_data_{datetime.now().strftime('%Y%m%d')}.csv",
632
- mime="text/csv"
633
- )
634
-
635
- def _export_report_to_json(self, report: AnalyticsReport) -> str:
636
- """Export analytics report to JSON format"""
637
- report_dict = {
638
- 'report_id': report.report_id,
639
- 'generated_at': report.generated_at.isoformat(),
640
- 'total_contributions': report.total_contributions,
641
- 'unique_contributors': report.unique_contributors,
642
- 'language_distribution': report.language_distribution,
643
- 'activity_distribution': report.activity_distribution,
644
- 'regional_distribution': report.regional_distribution,
645
- 'quality_metrics': report.quality_metrics,
646
- 'engagement_metrics': report.engagement_metrics,
647
- 'growth_metrics': report.growth_metrics,
648
- 'cultural_impact_score': report.cultural_impact_score,
649
- 'recommendations': report.recommendations
650
- }
651
-
652
- return json.dumps(report_dict, indent=2, ensure_ascii=False)
653
-
654
- def _export_contributions_to_csv(self) -> Optional[str]:
655
- """Export contributions data to CSV format"""
656
- try:
657
- contributions = self._get_all_contributions()
658
-
659
- if not contributions:
660
- return None
661
-
662
- # Prepare data for CSV
663
- csv_data = []
664
- for contrib in contributions:
665
- csv_data.append({
666
- 'id': contrib.id,
667
- 'user_session': contrib.user_session,
668
- 'activity_type': contrib.activity_type.value,
669
- 'language': contrib.language,
670
- 'timestamp': contrib.timestamp.isoformat(),
671
- 'validation_status': contrib.validation_status.value,
672
- 'region': contrib.cultural_context.get('region', ''),
673
- 'cultural_significance': contrib.cultural_context.get('cultural_significance', ''),
674
- 'content_length': len(str(contrib.content_data))
675
- })
676
-
677
- # Convert to DataFrame and then CSV
678
- df = pd.DataFrame(csv_data)
679
- return df.to_csv(index=False)
680
-
681
- except Exception as e:
682
- self.logger.error(f"Error exporting contributions to CSV: {e}")
683
- return None
684
-
685
- def get_real_time_metrics(self) -> Dict[str, Any]:
686
- """Get real-time metrics for dashboard updates"""
687
- try:
688
- # Get basic statistics from storage service
689
- storage_stats = self.storage_service.get_statistics()
690
-
691
- # Calculate additional real-time metrics
692
- current_time = datetime.now()
693
-
694
- # Recent activity (last 24 hours)
695
- recent_contributions = []
696
- for lang_info in self.language_service.get_supported_languages_list():
697
- lang_contributions = self.storage_service.get_contributions_by_language(
698
- lang_info['code'], limit=100
699
- )
700
- recent_contributions.extend([
701
- contrib for contrib in lang_contributions
702
- if (current_time - contrib.timestamp).total_seconds() < 86400 # 24 hours
703
- ])
704
-
705
- return {
706
- 'total_contributions': storage_stats.get('total_contributions', 0),
707
- 'contributions_by_language': storage_stats.get('contributions_by_language', {}),
708
- 'contributions_by_activity': storage_stats.get('contributions_by_activity', {}),
709
- 'recent_24h_contributions': len(recent_contributions),
710
- 'last_updated': current_time.isoformat()
711
- }
712
-
713
- except Exception as e:
714
- self.logger.error(f"Error getting real-time metrics: {e}")
715
- return {}
716
-
717
- def track_user_action(self, action: str, user_session: str, metadata: Dict[str, Any] = None):
718
- """Track user actions for analytics (simplified implementation)"""
719
- try:
720
- # In a full implementation, this would log to an analytics database
721
- # For now, we'll just log the action
722
-
723
- action_data = {
724
- 'action': action,
725
- 'user_session': user_session,
726
- 'timestamp': datetime.now().isoformat(),
727
- 'metadata': metadata or {}
728
- }
729
-
730
- self.logger.info(f"User action tracked: {json.dumps(action_data)}")
731
-
732
- except Exception as e:
733
- self.logger.error(f"Error tracking user action: {e}")
734
-
735
- def get_contribution_trends(self, days: int = 30) -> Dict[str, List[int]]:
736
- """Get contribution trends over specified number of days"""
737
- try:
738
- contributions = self._get_all_contributions()
739
-
740
- # Calculate daily contributions for the last N days
741
- now = datetime.now()
742
- daily_counts = defaultdict(int)
743
-
744
- for contrib in contributions:
745
- days_ago = (now - contrib.timestamp).days
746
- if days_ago <= days:
747
- date_key = contrib.timestamp.date()
748
- daily_counts[date_key] += 1
749
-
750
- # Create time series data
751
- dates = []
752
- counts = []
753
-
754
- for i in range(days, -1, -1):
755
- date = (now - timedelta(days=i)).date()
756
- dates.append(date.isoformat())
757
- counts.append(daily_counts.get(date, 0))
758
-
759
- return {
760
- 'dates': dates,
761
- 'contributions': counts
762
- }
763
-
764
- except Exception as e:
765
- self.logger.error(f"Error getting contribution trends: {e}")
766
- return {'dates': [], 'contributions': []}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/services/engagement_service.py DELETED
@@ -1,665 +0,0 @@
1
- """
2
- User Engagement and Feedback Service
3
- """
4
-
5
- import streamlit as st
6
- from typing import Dict, List, Any, Optional, Tuple
7
- from datetime import datetime, timedelta
8
- from dataclasses import dataclass
9
- from enum import Enum
10
- import json
11
- import logging
12
-
13
- from corpus_collection_engine.models.data_models import UserContribution, ActivityType
14
- from corpus_collection_engine.services.storage_service import StorageService
15
- from corpus_collection_engine.services.language_service import LanguageService
16
-
17
-
18
- class EngagementLevel(Enum):
19
- """User engagement levels"""
20
- NEWCOMER = "newcomer"
21
- CONTRIBUTOR = "contributor"
22
- ACTIVE_CONTRIBUTOR = "active_contributor"
23
- CULTURAL_AMBASSADOR = "cultural_ambassador"
24
- HERITAGE_GUARDIAN = "heritage_guardian"
25
-
26
-
27
- class AchievementType(Enum):
28
- """Types of achievements users can earn"""
29
- FIRST_CONTRIBUTION = "first_contribution"
30
- MULTILINGUAL = "multilingual"
31
- STORYTELLER = "storyteller"
32
- RECIPE_MASTER = "recipe_master"
33
- MEME_CREATOR = "meme_creator"
34
- LANDMARK_EXPLORER = "landmark_explorer"
35
- CULTURAL_BRIDGE = "cultural_bridge"
36
- CONSISTENCY_CHAMPION = "consistency_champion"
37
- QUALITY_CONTRIBUTOR = "quality_contributor"
38
- COMMUNITY_BUILDER = "community_builder"
39
-
40
-
41
- @dataclass
42
- class Achievement:
43
- """User achievement"""
44
- type: AchievementType
45
- title: str
46
- description: str
47
- icon: str
48
- earned_date: datetime
49
- points: int
50
-
51
-
52
- @dataclass
53
- class UserStats:
54
- """User statistics and engagement metrics"""
55
- total_contributions: int
56
- contributions_by_activity: Dict[str, int]
57
- contributions_by_language: Dict[str, int]
58
- engagement_level: EngagementLevel
59
- achievements: List[Achievement]
60
- total_points: int
61
- streak_days: int
62
- last_contribution_date: Optional[datetime]
63
- favorite_activity: Optional[str]
64
- cultural_impact_score: float
65
-
66
-
67
- class EngagementService:
68
- """Service for managing user engagement and feedback"""
69
-
70
- def __init__(self):
71
- self.logger = logging.getLogger(__name__)
72
- self.storage_service = StorageService()
73
- self.language_service = LanguageService()
74
-
75
- # Initialize engagement tracking in session state
76
- if 'user_stats' not in st.session_state:
77
- st.session_state.user_stats = None
78
- if 'recent_achievements' not in st.session_state:
79
- st.session_state.recent_achievements = []
80
- if 'onboarding_completed' not in st.session_state:
81
- st.session_state.onboarding_completed = False
82
-
83
- def get_user_stats(self, user_session_id: str) -> UserStats:
84
- """Get comprehensive user statistics"""
85
- try:
86
- # Get user contributions
87
- contributions = self.storage_service.get_contributions_by_session(user_session_id)
88
-
89
- if not contributions:
90
- return self._create_new_user_stats()
91
-
92
- # Calculate statistics
93
- total_contributions = len(contributions)
94
-
95
- # Group by activity type
96
- contributions_by_activity = {}
97
- for contrib in contributions:
98
- activity = contrib.activity_type.value
99
- contributions_by_activity[activity] = contributions_by_activity.get(activity, 0) + 1
100
-
101
- # Group by language
102
- contributions_by_language = {}
103
- for contrib in contributions:
104
- lang = contrib.language
105
- contributions_by_language[lang] = contributions_by_language.get(lang, 0) + 1
106
-
107
- # Calculate engagement level
108
- engagement_level = self._calculate_engagement_level(total_contributions, contributions_by_language)
109
-
110
- # Calculate achievements
111
- achievements = self._calculate_achievements(contributions)
112
-
113
- # Calculate points
114
- total_points = sum(achievement.points for achievement in achievements)
115
-
116
- # Calculate streak
117
- streak_days = self._calculate_streak(contributions)
118
-
119
- # Get last contribution date
120
- last_contribution_date = max(contrib.timestamp for contrib in contributions) if contributions else None
121
-
122
- # Find favorite activity
123
- favorite_activity = max(contributions_by_activity, key=contributions_by_activity.get) if contributions_by_activity else None
124
-
125
- # Calculate cultural impact score
126
- cultural_impact_score = self._calculate_cultural_impact(contributions)
127
-
128
- return UserStats(
129
- total_contributions=total_contributions,
130
- contributions_by_activity=contributions_by_activity,
131
- contributions_by_language=contributions_by_language,
132
- engagement_level=engagement_level,
133
- achievements=achievements,
134
- total_points=total_points,
135
- streak_days=streak_days,
136
- last_contribution_date=last_contribution_date,
137
- favorite_activity=favorite_activity,
138
- cultural_impact_score=cultural_impact_score
139
- )
140
-
141
- except Exception as e:
142
- self.logger.error(f"Error calculating user stats: {e}")
143
- return self._create_new_user_stats()
144
-
145
- def _create_new_user_stats(self) -> UserStats:
146
- """Create stats for new user"""
147
- return UserStats(
148
- total_contributions=0,
149
- contributions_by_activity={},
150
- contributions_by_language={},
151
- engagement_level=EngagementLevel.NEWCOMER,
152
- achievements=[],
153
- total_points=0,
154
- streak_days=0,
155
- last_contribution_date=None,
156
- favorite_activity=None,
157
- cultural_impact_score=0.0
158
- )
159
-
160
- def _calculate_engagement_level(self, total_contributions: int,
161
- contributions_by_language: Dict[str, int]) -> EngagementLevel:
162
- """Calculate user engagement level based on contributions"""
163
- num_languages = len(contributions_by_language)
164
-
165
- if total_contributions >= 50 and num_languages >= 3:
166
- return EngagementLevel.HERITAGE_GUARDIAN
167
- elif total_contributions >= 25 and num_languages >= 2:
168
- return EngagementLevel.CULTURAL_AMBASSADOR
169
- elif total_contributions >= 10:
170
- return EngagementLevel.ACTIVE_CONTRIBUTOR
171
- elif total_contributions >= 3:
172
- return EngagementLevel.CONTRIBUTOR
173
- else:
174
- return EngagementLevel.NEWCOMER
175
-
176
- def _calculate_achievements(self, contributions: List[UserContribution]) -> List[Achievement]:
177
- """Calculate user achievements based on contributions"""
178
- achievements = []
179
-
180
- if not contributions:
181
- return achievements
182
-
183
- # First contribution
184
- if len(contributions) >= 1:
185
- achievements.append(Achievement(
186
- type=AchievementType.FIRST_CONTRIBUTION,
187
- title="First Steps",
188
- description="Made your first contribution to cultural preservation",
189
- icon="🌟",
190
- earned_date=contributions[0].timestamp,
191
- points=10
192
- ))
193
-
194
- # Activity-specific achievements
195
- activity_counts = {}
196
- for contrib in contributions:
197
- activity = contrib.activity_type.value
198
- activity_counts[activity] = activity_counts.get(activity, 0) + 1
199
-
200
- # Meme creator achievement
201
- if activity_counts.get('meme', 0) >= 5:
202
- achievements.append(Achievement(
203
- type=AchievementType.MEME_CREATOR,
204
- title="Meme Master",
205
- description="Created 5+ cultural memes",
206
- icon="🎭",
207
- earned_date=datetime.now(),
208
- points=25
209
- ))
210
-
211
- # Recipe master achievement
212
- if activity_counts.get('recipe', 0) >= 3:
213
- achievements.append(Achievement(
214
- type=AchievementType.RECIPE_MASTER,
215
- title="Recipe Keeper",
216
- description="Shared 3+ family recipes",
217
- icon="🍛",
218
- earned_date=datetime.now(),
219
- points=30
220
- ))
221
-
222
- # Storyteller achievement
223
- if activity_counts.get('folklore', 0) >= 3:
224
- achievements.append(Achievement(
225
- type=AchievementType.STORYTELLER,
226
- title="Master Storyteller",
227
- description="Preserved 3+ traditional stories",
228
- icon="📚",
229
- earned_date=datetime.now(),
230
- points=35
231
- ))
232
-
233
- # Landmark explorer achievement
234
- if activity_counts.get('landmark', 0) >= 5:
235
- achievements.append(Achievement(
236
- type=AchievementType.LANDMARK_EXPLORER,
237
- title="Heritage Explorer",
238
- description="Documented 5+ cultural landmarks",
239
- icon="🏛️",
240
- earned_date=datetime.now(),
241
- points=40
242
- ))
243
-
244
- # Multilingual achievement
245
- languages = set(contrib.language for contrib in contributions)
246
- if len(languages) >= 2:
247
- achievements.append(Achievement(
248
- type=AchievementType.MULTILINGUAL,
249
- title="Cultural Bridge",
250
- description=f"Contributed in {len(languages)} languages",
251
- icon="🌍",
252
- earned_date=datetime.now(),
253
- points=20
254
- ))
255
-
256
- # Quality contributor achievement
257
- high_quality_contributions = sum(1 for contrib in contributions
258
- if len(str(contrib.content_data)) > 100)
259
- if high_quality_contributions >= 5:
260
- achievements.append(Achievement(
261
- type=AchievementType.QUALITY_CONTRIBUTOR,
262
- title="Quality Guardian",
263
- description="Consistently provides detailed contributions",
264
- icon="💎",
265
- earned_date=datetime.now(),
266
- points=50
267
- ))
268
-
269
- return achievements
270
-
271
- def _calculate_streak(self, contributions: List[UserContribution]) -> int:
272
- """Calculate user's contribution streak in days"""
273
- if not contributions:
274
- return 0
275
-
276
- # Sort contributions by date
277
- sorted_contributions = sorted(contributions, key=lambda x: x.timestamp, reverse=True)
278
-
279
- # Get unique contribution dates
280
- contribution_dates = list(set(contrib.timestamp.date() for contrib in sorted_contributions))
281
- contribution_dates.sort(reverse=True)
282
-
283
- if not contribution_dates:
284
- return 0
285
-
286
- # Calculate streak from most recent date
287
- streak = 0
288
- current_date = datetime.now().date()
289
-
290
- for i, contrib_date in enumerate(contribution_dates):
291
- expected_date = current_date - timedelta(days=i)
292
-
293
- if contrib_date == expected_date or (i == 0 and contrib_date == current_date - timedelta(days=1)):
294
- streak += 1
295
- else:
296
- break
297
-
298
- return streak
299
-
300
- def _calculate_cultural_impact(self, contributions: List[UserContribution]) -> float:
301
- """Calculate cultural impact score based on contribution quality and diversity"""
302
- if not contributions:
303
- return 0.0
304
-
305
- impact_score = 0.0
306
-
307
- # Base score for each contribution
308
- impact_score += len(contributions) * 10
309
-
310
- # Bonus for language diversity
311
- languages = set(contrib.language for contrib in contributions)
312
- impact_score += len(languages) * 15
313
-
314
- # Bonus for activity diversity
315
- activities = set(contrib.activity_type.value for contrib in contributions)
316
- impact_score += len(activities) * 20
317
-
318
- # Bonus for cultural context richness
319
- for contrib in contributions:
320
- cultural_context = contrib.cultural_context
321
- if cultural_context.get('cultural_significance'):
322
- impact_score += 5
323
- if cultural_context.get('region'):
324
- impact_score += 3
325
-
326
- # Normalize to 0-100 scale
327
- max_possible_score = len(contributions) * 50 # Rough estimate
328
- normalized_score = min(100.0, (impact_score / max_possible_score) * 100) if max_possible_score > 0 else 0.0
329
-
330
- return round(normalized_score, 1)
331
-
332
- def render_user_dashboard(self, user_session_id: str):
333
- """Render user engagement dashboard"""
334
- st.subheader("🏆 Your Cultural Impact Dashboard")
335
-
336
- # Get user stats
337
- user_stats = self.get_user_stats(user_session_id)
338
- st.session_state.user_stats = user_stats
339
-
340
- # Overview metrics
341
- col1, col2, col3, col4 = st.columns(4)
342
-
343
- with col1:
344
- st.metric(
345
- "Contributions",
346
- user_stats.total_contributions,
347
- delta=f"+{user_stats.total_contributions}" if user_stats.total_contributions > 0 else None
348
- )
349
-
350
- with col2:
351
- st.metric(
352
- "Languages",
353
- len(user_stats.contributions_by_language),
354
- delta=f"+{len(user_stats.contributions_by_language)}" if user_stats.contributions_by_language else None
355
- )
356
-
357
- with col3:
358
- st.metric(
359
- "Points",
360
- user_stats.total_points,
361
- delta=f"+{user_stats.total_points}" if user_stats.total_points > 0 else None
362
- )
363
-
364
- with col4:
365
- st.metric(
366
- "Streak",
367
- f"{user_stats.streak_days} days",
368
- delta=f"+{user_stats.streak_days}" if user_stats.streak_days > 0 else None
369
- )
370
-
371
- # Engagement level
372
- level_info = self._get_engagement_level_info(user_stats.engagement_level)
373
- st.markdown(f"### {level_info['icon']} {level_info['title']}")
374
- st.markdown(f"*{level_info['description']}*")
375
-
376
- # Progress to next level
377
- self._render_progress_to_next_level(user_stats)
378
-
379
- # Cultural impact score
380
- st.markdown(f"### 🌟 Cultural Impact Score: {user_stats.cultural_impact_score}/100")
381
- st.progress(user_stats.cultural_impact_score / 100)
382
-
383
- # Activity breakdown
384
- if user_stats.contributions_by_activity:
385
- st.markdown("### 📊 Your Contributions by Activity")
386
-
387
- activity_names = {
388
- 'meme': '🎭 Memes',
389
- 'recipe': '🍛 Recipes',
390
- 'folklore': '📚 Folklore',
391
- 'landmark': '🏛️ Landmarks'
392
- }
393
-
394
- cols = st.columns(len(user_stats.contributions_by_activity))
395
- for i, (activity, count) in enumerate(user_stats.contributions_by_activity.items()):
396
- with cols[i]:
397
- st.metric(activity_names.get(activity, activity.title()), count)
398
-
399
- # Recent achievements
400
- if user_stats.achievements:
401
- st.markdown("### 🏅 Your Achievements")
402
- self._render_achievements(user_stats.achievements)
403
-
404
- def _get_engagement_level_info(self, level: EngagementLevel) -> Dict[str, str]:
405
- """Get display information for engagement level"""
406
- level_info = {
407
- EngagementLevel.NEWCOMER: {
408
- 'icon': '🌱',
409
- 'title': 'Cultural Newcomer',
410
- 'description': 'Welcome to your cultural preservation journey!'
411
- },
412
- EngagementLevel.CONTRIBUTOR: {
413
- 'icon': '🌿',
414
- 'title': 'Cultural Contributor',
415
- 'description': 'You\'re making meaningful contributions to cultural preservation!'
416
- },
417
- EngagementLevel.ACTIVE_CONTRIBUTOR: {
418
- 'icon': '🌳',
419
- 'title': 'Active Cultural Contributor',
420
- 'description': 'Your dedication to cultural preservation is inspiring!'
421
- },
422
- EngagementLevel.CULTURAL_AMBASSADOR: {
423
- 'icon': '🏛️',
424
- 'title': 'Cultural Ambassador',
425
- 'description': 'You\'re a true ambassador of cultural heritage!'
426
- },
427
- EngagementLevel.HERITAGE_GUARDIAN: {
428
- 'icon': '👑',
429
- 'title': 'Heritage Guardian',
430
- 'description': 'You\'re a guardian of cultural heritage for future generations!'
431
- }
432
- }
433
-
434
- return level_info.get(level, level_info[EngagementLevel.NEWCOMER])
435
-
436
- def _render_progress_to_next_level(self, user_stats: UserStats):
437
- """Render progress towards next engagement level"""
438
- current_level = user_stats.engagement_level
439
- total_contributions = user_stats.total_contributions
440
- num_languages = len(user_stats.contributions_by_language)
441
-
442
- # Define requirements for next level
443
- next_level_requirements = {
444
- EngagementLevel.NEWCOMER: {'contributions': 3, 'languages': 1, 'next': 'Contributor'},
445
- EngagementLevel.CONTRIBUTOR: {'contributions': 10, 'languages': 1, 'next': 'Active Contributor'},
446
- EngagementLevel.ACTIVE_CONTRIBUTOR: {'contributions': 25, 'languages': 2, 'next': 'Cultural Ambassador'},
447
- EngagementLevel.CULTURAL_AMBASSADOR: {'contributions': 50, 'languages': 3, 'next': 'Heritage Guardian'},
448
- EngagementLevel.HERITAGE_GUARDIAN: {'contributions': float('inf'), 'languages': float('inf'), 'next': 'Maximum Level Reached!'}
449
- }
450
-
451
- requirements = next_level_requirements.get(current_level)
452
- if not requirements or current_level == EngagementLevel.HERITAGE_GUARDIAN:
453
- return
454
-
455
- st.markdown(f"### 🎯 Progress to {requirements['next']}")
456
-
457
- # Contributions progress
458
- contrib_progress = min(1.0, total_contributions / requirements['contributions'])
459
- st.markdown(f"**Contributions:** {total_contributions}/{requirements['contributions']}")
460
- st.progress(contrib_progress)
461
-
462
- # Languages progress
463
- if requirements['languages'] > 1:
464
- lang_progress = min(1.0, num_languages / requirements['languages'])
465
- st.markdown(f"**Languages:** {num_languages}/{requirements['languages']}")
466
- st.progress(lang_progress)
467
-
468
- def _render_achievements(self, achievements: List[Achievement]):
469
- """Render user achievements"""
470
- if not achievements:
471
- st.info("Complete activities to earn your first achievement!")
472
- return
473
-
474
- # Sort achievements by points (highest first)
475
- sorted_achievements = sorted(achievements, key=lambda x: x.points, reverse=True)
476
-
477
- cols = st.columns(min(3, len(sorted_achievements)))
478
- for i, achievement in enumerate(sorted_achievements):
479
- with cols[i % 3]:
480
- st.markdown(f"""
481
- <div style="
482
- border: 2px solid #FF6B35;
483
- border-radius: 10px;
484
- padding: 16px;
485
- text-align: center;
486
- background: linear-gradient(135deg, #FF6B35, #F7931E);
487
- color: white;
488
- margin: 8px 0;
489
- ">
490
- <div style="font-size: 40px; margin-bottom: 8px;">{achievement.icon}</div>
491
- <div style="font-weight: bold; font-size: 16px; margin-bottom: 4px;">{achievement.title}</div>
492
- <div style="font-size: 12px; opacity: 0.9; margin-bottom: 8px;">{achievement.description}</div>
493
- <div style="font-size: 14px; font-weight: bold;">{achievement.points} points</div>
494
- </div>
495
- """, unsafe_allow_html=True)
496
-
497
- def render_immediate_feedback(self, contribution: UserContribution):
498
- """Render immediate feedback after contribution"""
499
- st.success("🎉 Contribution submitted successfully!")
500
-
501
- # Show immediate impact
502
- impact_messages = {
503
- ActivityType.MEME: "Your meme adds humor and cultural context to our collection!",
504
- ActivityType.RECIPE: "Your recipe preserves culinary traditions for future generations!",
505
- ActivityType.FOLKLORE: "Your story keeps traditional wisdom alive!",
506
- ActivityType.LANDMARK: "Your landmark documentation enriches our cultural map!"
507
- }
508
-
509
- message = impact_messages.get(contribution.activity_type, "Your contribution enriches our cultural heritage!")
510
- st.info(f"💫 {message}")
511
-
512
- # Check for new achievements
513
- user_session_id = st.session_state.get('user_session_id', 'anonymous')
514
- user_stats = self.get_user_stats(user_session_id)
515
-
516
- # Show achievement notifications
517
- self._check_and_show_new_achievements(user_stats)
518
-
519
- # Show progress update
520
- self._show_progress_update(user_stats)
521
-
522
- def _check_and_show_new_achievements(self, user_stats: UserStats):
523
- """Check for and display new achievements"""
524
- # This is a simplified version - in a full implementation,
525
- # you'd track which achievements are new since last session
526
-
527
- if user_stats.achievements and user_stats.total_contributions <= 3:
528
- # Show achievement for new users
529
- latest_achievement = user_stats.achievements[-1]
530
-
531
- st.balloons()
532
- st.markdown(f"""
533
- <div style="
534
- background: linear-gradient(135deg, #4CAF50, #45a049);
535
- color: white;
536
- padding: 20px;
537
- border-radius: 10px;
538
- text-align: center;
539
- margin: 16px 0;
540
- ">
541
- <div style="font-size: 50px; margin-bottom: 10px;">{latest_achievement.icon}</div>
542
- <div style="font-size: 24px; font-weight: bold; margin-bottom: 8px;">Achievement Unlocked!</div>
543
- <div style="font-size: 18px; margin-bottom: 4px;">{latest_achievement.title}</div>
544
- <div style="font-size: 14px; opacity: 0.9;">{latest_achievement.description}</div>
545
- <div style="font-size: 16px; font-weight: bold; margin-top: 10px;">+{latest_achievement.points} points</div>
546
- </div>
547
- """, unsafe_allow_html=True)
548
-
549
- def _show_progress_update(self, user_stats: UserStats):
550
- """Show progress update after contribution"""
551
- col1, col2 = st.columns(2)
552
-
553
- with col1:
554
- st.metric(
555
- "Total Contributions",
556
- user_stats.total_contributions,
557
- delta=1
558
- )
559
-
560
- with col2:
561
- st.metric(
562
- "Cultural Impact",
563
- f"{user_stats.cultural_impact_score}/100",
564
- delta=f"+{round(user_stats.cultural_impact_score / user_stats.total_contributions, 1) if user_stats.total_contributions > 0 else 0}"
565
- )
566
-
567
- def render_onboarding_flow(self) -> bool:
568
- """Render onboarding flow for new users - Auto-complete for public deployment"""
569
- if st.session_state.onboarding_completed:
570
- return True
571
-
572
- # Auto-complete onboarding for Hugging Face Spaces deployment
573
- st.session_state.onboarding_completed = True
574
- return True
575
-
576
- def render_social_sharing(self, contribution: UserContribution):
577
- """Render social sharing options"""
578
- st.markdown("### 📢 Share Your Contribution")
579
-
580
- # Generate sharing text
581
- activity_names = {
582
- ActivityType.MEME: "meme",
583
- ActivityType.RECIPE: "family recipe",
584
- ActivityType.FOLKLORE: "traditional story",
585
- ActivityType.LANDMARK: "cultural landmark"
586
- }
587
-
588
- activity_name = activity_names.get(contribution.activity_type, "cultural contribution")
589
-
590
- sharing_text = f"I just shared a {activity_name} on Corpus Collection Engine! 🇮🇳 Join me in preserving Indian cultural heritage through AI. #CulturalHeritage #IndianCulture #AI4Culture"
591
-
592
- # Social sharing buttons (simplified - in production, use proper sharing APIs)
593
- col1, col2, col3 = st.columns(3)
594
-
595
- with col1:
596
- if st.button("📱 Share on WhatsApp", use_container_width=True):
597
- whatsapp_url = f"https://wa.me/?text={sharing_text}"
598
- st.markdown(f"[Open WhatsApp]({whatsapp_url})")
599
-
600
- with col2:
601
- if st.button("🐦 Share on Twitter", use_container_width=True):
602
- twitter_url = f"https://twitter.com/intent/tweet?text={sharing_text}"
603
- st.markdown(f"[Open Twitter]({twitter_url})")
604
-
605
- with col3:
606
- if st.button("📋 Copy Link", use_container_width=True):
607
- st.code(sharing_text)
608
- st.success("Text copied! Share it anywhere you like!")
609
-
610
- def get_engagement_analytics(self) -> Dict[str, Any]:
611
- """Get engagement analytics for the platform"""
612
- try:
613
- # Get all contributions for analytics
614
- all_stats = self.storage_service.get_statistics()
615
-
616
- return {
617
- 'total_users': len(set()), # Would need user tracking
618
- 'total_contributions': all_stats.get('total_contributions', 0),
619
- 'contributions_by_language': all_stats.get('contributions_by_language', {}),
620
- 'contributions_by_activity': all_stats.get('contributions_by_activity', {}),
621
- 'engagement_trends': {}, # Would calculate from historical data
622
- 'achievement_distribution': {}, # Would calculate from user achievements
623
- 'cultural_impact_total': 0 # Would sum all user impact scores
624
- }
625
-
626
- except Exception as e:
627
- self.logger.error(f"Error getting engagement analytics: {e}")
628
- return {}
629
-
630
- def render_session_summary(self):
631
- """Render session summary for user engagement"""
632
- try:
633
- # Get current session stats
634
- user_session_id = st.session_state.get('user_session_id', 'anonymous')
635
- user_stats = self.get_user_stats(user_session_id)
636
-
637
- # Only show if user has made contributions
638
- if user_stats.total_contributions > 0:
639
- with st.sidebar:
640
- st.markdown("---")
641
- st.markdown("### 🏆 Session Summary")
642
-
643
- col1, col2 = st.columns(2)
644
- with col1:
645
- st.metric("Contributions", user_stats.total_contributions)
646
- with col2:
647
- st.metric("Impact", f"{user_stats.cultural_impact_score:.1f}")
648
-
649
- # Show current level
650
- level_info = self._get_engagement_level_info(user_stats.engagement_level)
651
- st.markdown(f"**Level:** {level_info['icon']} {level_info['title']}")
652
-
653
- # Show recent achievements
654
- if user_stats.achievements:
655
- st.markdown("**Latest Achievement:**")
656
- latest = user_stats.achievements[-1]
657
- st.markdown(f"{latest.icon} {latest.title}")
658
-
659
- # Encourage continued participation
660
- if user_stats.total_contributions < 5:
661
- st.info("Keep contributing to unlock more achievements! 🌟")
662
-
663
- except Exception as e:
664
- self.logger.error(f"Error rendering session summary: {e}")
665
- # Fail silently to not disrupt user experience
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/services/language_service.py DELETED
@@ -1,295 +0,0 @@
1
- """
2
- Language service for Indic language processing and detection
3
- """
4
-
5
- from typing import Dict, List, Optional, Tuple
6
- import logging
7
-
8
- # Try to import langdetect, fall back to basic detection if not available
9
- try:
10
- from langdetect import detect, DetectorFactory
11
- from langdetect.lang_detect_exception import LangDetectException
12
- LANGDETECT_AVAILABLE = True
13
- # Set seed for consistent language detection results
14
- DetectorFactory.seed = 0
15
- except ImportError:
16
- LANGDETECT_AVAILABLE = False
17
- LangDetectException = Exception
18
-
19
- from corpus_collection_engine.config import SUPPORTED_LANGUAGES
20
-
21
-
22
- class LanguageService:
23
- """Service for language detection, validation, and processing"""
24
-
25
- def __init__(self):
26
- self.logger = logging.getLogger(__name__)
27
- self.supported_languages = SUPPORTED_LANGUAGES
28
- self.indic_scripts = {
29
- 'hi': 'देवनागरी', # Devanagari
30
- 'bn': 'বাংলা', # Bengali
31
- 'ta': 'தமிழ்', # Tamil
32
- 'te': 'తెలుగు', # Telugu
33
- 'ml': 'മലയാളം', # Malayalam
34
- 'kn': 'ಕನ್ನಡ', # Kannada
35
- 'gu': 'ગુજરાતી', # Gujarati
36
- 'mr': 'मराठी', # Marathi
37
- 'pa': 'ਪੰਜਾਬੀ', # Punjabi
38
- 'or': 'ଓଡ଼ିଆ' # Odia
39
- }
40
-
41
- # Unicode ranges for Indic scripts
42
- self.script_ranges = {
43
- 'devanagari': (0x0900, 0x097F),
44
- 'bengali': (0x0980, 0x09FF),
45
- 'tamil': (0x0B80, 0x0BFF),
46
- 'telugu': (0x0C00, 0x0C7F),
47
- 'malayalam': (0x0D00, 0x0D7F),
48
- 'kannada': (0x0C80, 0x0CFF),
49
- 'gujarati': (0x0A80, 0x0AFF),
50
- 'punjabi': (0x0A00, 0x0A7F),
51
- 'odia': (0x0B00, 0x0B7F)
52
- }
53
-
54
- def detect_language(self, text: str, confidence_threshold: float = 0.7) -> Tuple[Optional[str], float]:
55
- """
56
- Detect language from text with confidence score
57
-
58
- Args:
59
- text: Input text to analyze
60
- confidence_threshold: Minimum confidence required
61
-
62
- Returns:
63
- Tuple of (language_code, confidence_score)
64
- """
65
- if not text or not text.strip():
66
- return None, 0.0
67
-
68
- text = text.strip()
69
-
70
- # First try script-based detection for Indic languages
71
- script_lang = self._detect_by_script(text)
72
- if script_lang:
73
- return script_lang, 0.9 # High confidence for script-based detection
74
-
75
- # Fall back to langdetect library if available
76
- if LANGDETECT_AVAILABLE:
77
- try:
78
- detected_lang = detect(text)
79
-
80
- # Map some common language codes
81
- lang_mapping = {
82
- 'hi': 'hi', # Hindi
83
- 'bn': 'bn', # Bengali
84
- 'ta': 'ta', # Tamil
85
- 'te': 'te', # Telugu
86
- 'ml': 'ml', # Malayalam
87
- 'kn': 'kn', # Kannada
88
- 'gu': 'gu', # Gujarati
89
- 'mr': 'mr', # Marathi
90
- 'pa': 'pa', # Punjabi
91
- 'or': 'or', # Odia
92
- 'en': 'en' # English
93
- }
94
-
95
- mapped_lang = lang_mapping.get(detected_lang)
96
- if mapped_lang and mapped_lang in self.supported_languages:
97
- return mapped_lang, 0.8 # Good confidence for library detection
98
-
99
- # If detected language is not in our supported list, return English as fallback
100
- return 'en', 0.5
101
-
102
- except LangDetectException:
103
- # If detection fails, return English as fallback
104
- return 'en', 0.3
105
- else:
106
- # If langdetect is not available, use basic heuristics
107
- return self._basic_language_detection(text)
108
-
109
- def _detect_by_script(self, text: str) -> Optional[str]:
110
- """Detect language based on Unicode script ranges"""
111
- script_counts = {}
112
-
113
- for char in text:
114
- char_code = ord(char)
115
-
116
- # Check each script range
117
- for script, (start, end) in self.script_ranges.items():
118
- if start <= char_code <= end:
119
- script_counts[script] = script_counts.get(script, 0) + 1
120
- break
121
-
122
- if not script_counts:
123
- return None
124
-
125
- # Find the most common script
126
- dominant_script = max(script_counts, key=script_counts.get)
127
-
128
- # Map script to language code
129
- script_to_lang = {
130
- 'devanagari': 'hi', # Could be Hindi or Marathi, default to Hindi
131
- 'bengali': 'bn',
132
- 'tamil': 'ta',
133
- 'telugu': 'te',
134
- 'malayalam': 'ml',
135
- 'kannada': 'kn',
136
- 'gujarati': 'gu',
137
- 'punjabi': 'pa',
138
- 'odia': 'or'
139
- }
140
-
141
- return script_to_lang.get(dominant_script)
142
-
143
- def _basic_language_detection(self, text: str) -> Tuple[str, float]:
144
- """Basic language detection when langdetect is not available"""
145
- # Check for common English patterns
146
- english_words = ['the', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by']
147
- text_lower = text.lower()
148
-
149
- english_count = sum(1 for word in english_words if word in text_lower)
150
- if english_count > 0:
151
- return 'en', 0.6
152
-
153
- # Check for common Hindi words (in Devanagari)
154
- hindi_words = ['और', 'का', 'की', 'के', 'में', 'से', 'को', 'है', 'हैं', 'था', 'थी', 'थे']
155
- hindi_count = sum(1 for word in hindi_words if word in text)
156
- if hindi_count > 0:
157
- return 'hi', 0.6
158
-
159
- # Default to English if no patterns match
160
- return 'en', 0.3
161
-
162
- def validate_language_code(self, language_code: str) -> bool:
163
- """Validate if language code is supported"""
164
- return language_code in self.supported_languages
165
-
166
- def get_language_name(self, language_code: str) -> str:
167
- """Get human-readable language name"""
168
- return self.supported_languages.get(language_code, "Unknown")
169
-
170
- def get_native_script_name(self, language_code: str) -> str:
171
- """Get native script name for the language"""
172
- return self.indic_scripts.get(language_code, language_code.upper())
173
-
174
- def is_indic_language(self, language_code: str) -> bool:
175
- """Check if language is an Indic language"""
176
- return language_code in self.indic_scripts
177
-
178
- def transliterate_to_latin(self, text: str, source_language: str) -> str:
179
- """
180
- Basic transliteration to Latin script (placeholder implementation)
181
- In a full implementation, this would use proper transliteration libraries
182
- """
183
- # This is a simplified implementation
184
- # In production, you'd use libraries like indic-transliteration
185
-
186
- if not self.is_indic_language(source_language):
187
- return text
188
-
189
- # For now, just return the original text
190
- # TODO: Implement proper transliteration using indic-transliteration library
191
- return text
192
-
193
- def get_text_statistics(self, text: str) -> Dict[str, any]:
194
- """Get statistics about the text"""
195
- if not text:
196
- return {
197
- 'character_count': 0,
198
- 'word_count': 0,
199
- 'detected_language': None,
200
- 'confidence': 0.0,
201
- 'script_distribution': {}
202
- }
203
-
204
- # Basic statistics
205
- char_count = len(text)
206
- word_count = len(text.split())
207
-
208
- # Language detection
209
- detected_lang, confidence = self.detect_language(text)
210
-
211
- # Script distribution
212
- script_dist = self._get_script_distribution(text)
213
-
214
- return {
215
- 'character_count': char_count,
216
- 'word_count': word_count,
217
- 'detected_language': detected_lang,
218
- 'confidence': confidence,
219
- 'script_distribution': script_dist
220
- }
221
-
222
- def _get_script_distribution(self, text: str) -> Dict[str, float]:
223
- """Get distribution of different scripts in text"""
224
- script_counts = {}
225
- total_chars = 0
226
-
227
- for char in text:
228
- if char.isalpha(): # Only count alphabetic characters
229
- total_chars += 1
230
- char_code = ord(char)
231
-
232
- # Check each script range
233
- script_found = False
234
- for script, (start, end) in self.script_ranges.items():
235
- if start <= char_code <= end:
236
- script_counts[script] = script_counts.get(script, 0) + 1
237
- script_found = True
238
- break
239
-
240
- # If not in any Indic script, assume Latin
241
- if not script_found and char.isascii():
242
- script_counts['latin'] = script_counts.get('latin', 0) + 1
243
-
244
- # Convert to percentages
245
- if total_chars == 0:
246
- return {}
247
-
248
- return {script: (count / total_chars) * 100
249
- for script, count in script_counts.items()}
250
-
251
- def suggest_language_from_region(self, region: str) -> List[str]:
252
- """Suggest likely languages based on region"""
253
- region = region.lower().strip()
254
-
255
- # Regional language mapping
256
- region_languages = {
257
- 'maharashtra': ['mr', 'hi'],
258
- 'karnataka': ['kn', 'hi'],
259
- 'tamil nadu': ['ta', 'hi'],
260
- 'andhra pradesh': ['te', 'hi'],
261
- 'telangana': ['te', 'hi'],
262
- 'kerala': ['ml', 'hi'],
263
- 'west bengal': ['bn', 'hi'],
264
- 'gujarat': ['gu', 'hi'],
265
- 'punjab': ['pa', 'hi'],
266
- 'odisha': ['or', 'hi'],
267
- 'delhi': ['hi'],
268
- 'uttar pradesh': ['hi'],
269
- 'bihar': ['hi'],
270
- 'rajasthan': ['hi'],
271
- 'madhya pradesh': ['hi'],
272
- 'haryana': ['hi']
273
- }
274
-
275
- # Find matching regions
276
- for region_key, languages in region_languages.items():
277
- if region_key in region:
278
- return languages
279
-
280
- # Default to Hindi and English
281
- return ['hi', 'en']
282
-
283
- def get_supported_languages_list(self) -> List[Dict[str, str]]:
284
- """Get list of supported languages with metadata"""
285
- languages = []
286
-
287
- for code, name in self.supported_languages.items():
288
- languages.append({
289
- 'code': code,
290
- 'name': name,
291
- 'native_name': self.get_native_script_name(code),
292
- 'is_indic': self.is_indic_language(code)
293
- })
294
-
295
- return languages
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/services/privacy_service.py DELETED
@@ -1,1069 +0,0 @@
1
- """
2
- Privacy and Consent Management Service
3
- """
4
-
5
- import streamlit as st
6
- from typing import Dict, List, Any, Optional, Tuple
7
- from datetime import datetime, timedelta
8
- from dataclasses import dataclass
9
- from enum import Enum
10
- import json
11
- import logging
12
- import hashlib
13
-
14
- from corpus_collection_engine.models.data_models import UserContribution
15
- from corpus_collection_engine.services.storage_service import StorageService
16
-
17
-
18
- class ConsentType(Enum):
19
- """Types of consent that can be given"""
20
-
21
- DATA_COLLECTION = "data_collection"
22
- AI_TRAINING = "ai_training"
23
- RESEARCH_USE = "research_use"
24
- PUBLIC_SHARING = "public_sharing"
25
- ANALYTICS = "analytics"
26
- MARKETING = "marketing"
27
-
28
-
29
- class DataCategory(Enum):
30
- """Categories of data being processed"""
31
-
32
- CULTURAL_CONTENT = "cultural_content"
33
- LANGUAGE_DATA = "language_data"
34
- REGIONAL_INFO = "regional_info"
35
- USER_BEHAVIOR = "user_behavior"
36
- TECHNICAL_DATA = "technical_data"
37
-
38
-
39
- @dataclass
40
- class ConsentRecord:
41
- """Record of user consent"""
42
-
43
- user_session: str
44
- consent_type: ConsentType
45
- granted: bool
46
- timestamp: datetime
47
- version: str
48
- ip_hash: Optional[str] = None
49
- user_agent_hash: Optional[str] = None
50
-
51
-
52
- @dataclass
53
- class PrivacySettings:
54
- """User privacy settings"""
55
-
56
- user_session: str
57
- consents: Dict[ConsentType, ConsentRecord]
58
- data_retention_days: int
59
- anonymize_data: bool
60
- allow_data_export: bool
61
- created_at: datetime
62
- updated_at: datetime
63
-
64
-
65
- class PrivacyService:
66
- """Service for managing user privacy and consent"""
67
-
68
- def __init__(self):
69
- self.logger = logging.getLogger(__name__)
70
- self.storage_service = StorageService()
71
-
72
- # Privacy policy version
73
- self.current_privacy_version = "1.0"
74
- self.current_terms_version = "1.0"
75
-
76
- # Initialize privacy state
77
- if "privacy_initialized" not in st.session_state:
78
- st.session_state.privacy_initialized = False
79
- st.session_state.consent_given = {}
80
- st.session_state.privacy_settings = None
81
- st.session_state.show_privacy_banner = True
82
-
83
- def initialize_privacy_management(self):
84
- """Initialize privacy management system"""
85
- if st.session_state.privacy_initialized:
86
- return
87
-
88
- try:
89
- # Load existing privacy settings if available
90
- user_session_id = st.session_state.get("user_session_id", "anonymous")
91
- self._load_privacy_settings(user_session_id)
92
-
93
- st.session_state.privacy_initialized = True
94
- self.logger.info("Privacy management initialized")
95
-
96
- except Exception as e:
97
- self.logger.error(f"Privacy management initialization failed: {e}")
98
-
99
- def render_consent_interface(self) -> bool:
100
- """Render consent interface - Auto-consent for public deployment"""
101
- # Auto-consent for Hugging Face Spaces deployment
102
- # This removes the need for explicit consent flow
103
- return True
104
-
105
- def render_privacy_banner(self):
106
- """Render privacy consent banner"""
107
- if not st.session_state.show_privacy_banner:
108
- return
109
-
110
- # Check if user has already given essential consent
111
- if self._has_essential_consent():
112
- st.session_state.show_privacy_banner = False
113
- return
114
-
115
- # Render privacy banner
116
- banner_html = """
117
- <div style="
118
- position: fixed;
119
- bottom: 0;
120
- left: 0;
121
- right: 0;
122
- background: linear-gradient(135deg, #2C3E50, #34495E);
123
- color: white;
124
- padding: 20px;
125
- z-index: 9999;
126
- box-shadow: 0 -2px 10px rgba(0,0,0,0.3);
127
- ">
128
- <div style="max-width: 1200px; margin: 0 auto;">
129
- <h4 style="margin: 0 0 10px 0; color: #FF6B35;">🔒 Your Privacy Matters</h4>
130
- <p style="margin: 0 0 15px 0; font-size: 14px; line-height: 1.4;">
131
- We use your contributions to preserve Indian cultural heritage and train AI systems.
132
- Your data is handled with respect and transparency.
133
- </p>
134
- <div style="display: flex; gap: 10px; flex-wrap: wrap;">
135
- <button onclick="window.parent.postMessage({type: 'ACCEPT_ESSENTIAL'}, '*')"
136
- style="background: #FF6B35; color: white; border: none; padding: 8px 16px; border-radius: 4px; cursor: pointer;">
137
- Accept Essential
138
- </button>
139
- <button onclick="window.parent.postMessage({type: 'CUSTOMIZE_PRIVACY'}, '*')"
140
- style="background: transparent; color: white; border: 1px solid white; padding: 8px 16px; border-radius: 4px; cursor: pointer;">
141
- Customize
142
- </button>
143
- <button onclick="window.parent.postMessage({type: 'VIEW_PRIVACY_POLICY'}, '*')"
144
- style="background: transparent; color: #FF6B35; border: none; padding: 8px 16px; cursor: pointer; text-decoration: underline;">
145
- Privacy Policy
146
- </button>
147
- </div>
148
- </div>
149
- </div>
150
-
151
- <script>
152
- window.addEventListener('message', function(event) {
153
- if (event.data.type === 'HIDE_PRIVACY_BANNER') {
154
- document.querySelector('[style*="position: fixed"]').style.display = 'none';
155
- }
156
- });
157
- </script>
158
- """
159
-
160
- st.components.v1.html(banner_html, height=0)
161
-
162
- def render_privacy_settings(self):
163
- """Render comprehensive privacy settings interface"""
164
- st.title("🔒 Privacy & Data Management")
165
- st.markdown("*Control how your data is used for cultural preservation*")
166
-
167
- # Current privacy status
168
- user_session_id = st.session_state.get("user_session_id", "anonymous")
169
- privacy_settings = self._get_privacy_settings(user_session_id)
170
-
171
- # Privacy overview
172
- st.subheader("📊 Your Privacy Status")
173
-
174
- col1, col2, col3 = st.columns(3)
175
-
176
- with col1:
177
- essential_consent = self._has_essential_consent()
178
- st.metric(
179
- "Essential Consent",
180
- "✅ Given" if essential_consent else "❌ Required",
181
- delta="Required for app usage",
182
- )
183
-
184
- with col2:
185
- total_consents = len(
186
- [c for c in privacy_settings.consents.values() if c.granted]
187
- )
188
- st.metric(
189
- "Active Consents", total_consents, delta=f"out of {len(ConsentType)}"
190
- )
191
-
192
- with col3:
193
- data_retention = privacy_settings.data_retention_days
194
- st.metric(
195
- "Data Retention", f"{data_retention} days", delta="Automatic deletion"
196
- )
197
-
198
- # Consent management
199
- st.subheader("✅ Consent Management")
200
-
201
- consent_descriptions = {
202
- ConsentType.DATA_COLLECTION: {
203
- "title": "Data Collection",
204
- "description": "Allow collection of your cultural contributions for preservation",
205
- "essential": True,
206
- },
207
- ConsentType.AI_TRAINING: {
208
- "title": "AI Training",
209
- "description": "Use your contributions to train AI models for cultural understanding",
210
- "essential": True,
211
- },
212
- ConsentType.RESEARCH_USE: {
213
- "title": "Research Use",
214
- "description": "Allow academic researchers to study your contributions (anonymized)",
215
- "essential": False,
216
- },
217
- ConsentType.PUBLIC_SHARING: {
218
- "title": "Public Sharing",
219
- "description": "Share your contributions in public cultural archives",
220
- "essential": False,
221
- },
222
- ConsentType.ANALYTICS: {
223
- "title": "Analytics",
224
- "description": "Use your data for platform improvement and analytics",
225
- "essential": False,
226
- },
227
- ConsentType.MARKETING: {
228
- "title": "Marketing Communications",
229
- "description": "Receive updates about cultural preservation initiatives",
230
- "essential": False,
231
- },
232
- }
233
-
234
- consent_changes = {}
235
-
236
- for consent_type, info in consent_descriptions.items():
237
- current_consent = privacy_settings.consents.get(consent_type)
238
- current_status = current_consent.granted if current_consent else False
239
-
240
- col1, col2 = st.columns([3, 1])
241
-
242
- with col1:
243
- st.markdown(f"**{info['title']}**")
244
- st.markdown(f"*{info['description']}*")
245
- if info["essential"]:
246
- st.markdown("🔴 **Essential** - Required for app functionality")
247
-
248
- with col2:
249
- if info["essential"]:
250
- # Essential consents cannot be disabled
251
- st.checkbox(
252
- "Enabled",
253
- value=True,
254
- disabled=True,
255
- key=f"consent_{consent_type.value}",
256
- )
257
- else:
258
- new_status = st.checkbox(
259
- "Enable",
260
- value=current_status,
261
- key=f"consent_{consent_type.value}",
262
- )
263
- if new_status != current_status:
264
- consent_changes[consent_type] = new_status
265
-
266
- st.divider()
267
-
268
- # Save consent changes
269
- if consent_changes and st.button("💾 Save Privacy Settings", type="primary"):
270
- self._update_consents(user_session_id, consent_changes)
271
- st.success("Privacy settings updated successfully!")
272
- st.rerun()
273
-
274
- # Data management
275
- st.subheader("📁 Data Management")
276
-
277
- col1, col2 = st.columns(2)
278
-
279
- with col1:
280
- st.markdown("**Data Retention**")
281
- new_retention = st.selectbox(
282
- "How long should we keep your data?",
283
- [30, 90, 180, 365, -1],
284
- format_func=lambda x: f"{x} days" if x > 0 else "Keep indefinitely",
285
- index=[30, 90, 180, 365, -1].index(
286
- privacy_settings.data_retention_days
287
- ),
288
- )
289
-
290
- if new_retention != privacy_settings.data_retention_days:
291
- if st.button("Update Retention Period"):
292
- self._update_data_retention(user_session_id, new_retention)
293
- st.success("Data retention period updated!")
294
- st.rerun()
295
-
296
- with col2:
297
- st.markdown("**Data Anonymization**")
298
- new_anonymize = st.checkbox(
299
- "Anonymize my contributions",
300
- value=privacy_settings.anonymize_data,
301
- help="Remove identifying information from your contributions",
302
- )
303
-
304
- if new_anonymize != privacy_settings.anonymize_data:
305
- if st.button("Update Anonymization"):
306
- self._update_anonymization(user_session_id, new_anonymize)
307
- st.success("Anonymization setting updated!")
308
- st.rerun()
309
-
310
- # Data export and deletion
311
- st.subheader("📤 Your Data Rights")
312
-
313
- col1, col2, col3 = st.columns(3)
314
-
315
- with col1:
316
- if st.button("📊 Export My Data", use_container_width=True):
317
- self._export_user_data(user_session_id)
318
-
319
- with col2:
320
- if st.button("🔍 View My Contributions", use_container_width=True):
321
- self._show_user_contributions(user_session_id)
322
-
323
- with col3:
324
- if st.button(
325
- "🗑️ Delete My Data", use_container_width=True, type="secondary"
326
- ):
327
- self._show_data_deletion_options(user_session_id)
328
-
329
- def render_privacy_policy(self):
330
- """Render comprehensive privacy policy"""
331
- st.title("📋 Privacy Policy")
332
- st.markdown(
333
- f"*Version {self.current_privacy_version} - Effective Date: January 1, 2024*"
334
- )
335
-
336
- st.markdown("""
337
- ## 🎯 Our Mission
338
-
339
- The Corpus Collection Engine is dedicated to preserving Indian cultural heritage through
340
- AI-powered data collection. We believe in transparency, respect, and ethical data practices.
341
-
342
- ## 📊 What Data We Collect
343
-
344
- ### Cultural Contributions
345
- - **Memes**: Text captions and cultural context you provide
346
- - **Recipes**: Family recipes, ingredients, and cooking instructions
347
- - **Folklore**: Traditional stories, proverbs, and cultural wisdom
348
- - **Landmarks**: Photos and descriptions of cultural sites
349
-
350
- ### Cultural Context
351
- - Regional information you provide
352
- - Cultural significance descriptions
353
- - Language preferences
354
- - Personal stories and family connections
355
-
356
- ### Technical Data
357
- - Session identifiers (anonymized)
358
- - Language detection results
359
- - Usage patterns and engagement metrics
360
- - Device and browser information (anonymized)
361
-
362
- ## 🎯 How We Use Your Data
363
-
364
- ### Primary Purposes
365
- 1. **Cultural Preservation**: Building a comprehensive archive of Indian cultural heritage
366
- 2. **AI Training**: Teaching AI systems to understand and respect cultural diversity
367
- 3. **Research**: Supporting academic research on Indian languages and culture
368
- 4. **Community Building**: Connecting people through shared cultural experiences
369
-
370
- ### Secondary Purposes (With Your Consent)
371
- - Platform improvement and analytics
372
- - Academic research collaborations
373
- - Public cultural archives and exhibitions
374
- - Educational resources and materials
375
-
376
- ## 🔒 How We Protect Your Data
377
-
378
- ### Security Measures
379
- - **Encryption**: All data is encrypted in transit and at rest
380
- - **Access Control**: Strict access controls and authentication
381
- - **Anonymization**: Personal identifiers are removed or hashed
382
- - **Regular Audits**: Security assessments and vulnerability testing
383
-
384
- ### Data Minimization
385
- - We only collect data necessary for our cultural preservation mission
386
- - Personal information is separated from cultural content
387
- - Automatic deletion of old session data
388
- - Optional anonymization of all contributions
389
-
390
- ## 👥 Data Sharing
391
-
392
- ### We Never Share
393
- - Personal identifying information
394
- - Private session data
395
- - Individual user behavior patterns
396
- - Contact information or personal details
397
-
398
- ### We May Share (With Consent)
399
- - Anonymized cultural contributions with researchers
400
- - Aggregated statistics for academic studies
401
- - Cultural content for public archives
402
- - Educational materials for cultural learning
403
-
404
- ## ⚖️ Your Rights
405
-
406
- ### Access Rights
407
- - View all your contributions
408
- - Download your data in standard formats
409
- - See how your data is being used
410
- - Review your consent history
411
-
412
- ### Control Rights
413
- - Modify or delete your contributions
414
- - Change your privacy settings anytime
415
- - Withdraw consent for non-essential uses
416
- - Request data anonymization
417
-
418
- ### Deletion Rights
419
- - Delete individual contributions
420
- - Request complete data deletion
421
- - Automatic deletion after retention period
422
- - Right to be forgotten
423
-
424
- ## 🌍 International Considerations
425
-
426
- ### Data Location
427
- - Data is stored in secure facilities
428
- - We comply with applicable data protection laws
429
- - Cross-border transfers are protected
430
- - Local data residency options available
431
-
432
- ### Legal Compliance
433
- - We follow Indian data protection regulations
434
- - Compliance with international privacy standards
435
- - Regular legal and compliance reviews
436
- - Transparent reporting on data requests
437
-
438
- ## 👶 Children's Privacy
439
-
440
- - Our service is designed for users 13 and older
441
- - We do not knowingly collect data from children under 13
442
- - Parental consent required for users under 18
443
- - Special protections for young users
444
-
445
- ## 📞 Contact Us
446
-
447
- ### Privacy Questions
448
- If you have questions about this privacy policy or your data:
449
-
450
- - **Email**: [email protected]
451
- - **Address**: [Privacy Officer Address]
452
- - **Response Time**: We respond within 30 days
453
-
454
- ### Data Protection Officer
455
- Our Data Protection Officer is available for privacy concerns:
456
- - **Email**: [email protected]
457
- - **Specialized Training**: Cultural data sensitivity
458
-
459
- ## 🔄 Policy Updates
460
-
461
- - We may update this policy to reflect changes in our practices
462
- - Users will be notified of significant changes
463
- - Continued use implies acceptance of updates
464
- - Previous versions available upon request
465
-
466
- ---
467
-
468
- *This privacy policy reflects our commitment to ethical cultural preservation
469
- and respect for user privacy. Thank you for helping preserve Indian cultural heritage!*
470
- """)
471
-
472
- def render_terms_of_service(self):
473
- """Render terms of service"""
474
- st.title("📜 Terms of Service")
475
- st.markdown(
476
- f"*Version {self.current_terms_version} - Effective Date: January 1, 2024*"
477
- )
478
-
479
- st.markdown("""
480
- ## 🤝 Agreement to Terms
481
-
482
- By using the Corpus Collection Engine, you agree to these terms of service.
483
- If you disagree with any part of these terms, please do not use our service.
484
-
485
- ## 🎯 Service Description
486
-
487
- ### Our Mission
488
- The Corpus Collection Engine is a platform for preserving Indian cultural heritage
489
- through community contributions and AI-powered analysis.
490
-
491
- ### What We Provide
492
- - Tools for sharing cultural content (memes, recipes, folklore, landmarks)
493
- - AI-powered features for content enhancement
494
- - Community platform for cultural exchange
495
- - Educational resources about Indian culture
496
-
497
- ## 👤 User Responsibilities
498
-
499
- ### Content Guidelines
500
- - **Authenticity**: Share genuine cultural content
501
- - **Respect**: Treat all cultures and communities with respect
502
- - **Accuracy**: Provide accurate information to the best of your knowledge
503
- - **Originality**: Only share content you have rights to share
504
-
505
- ### Prohibited Content
506
- - Hate speech or discriminatory content
507
- - False or misleading cultural information
508
- - Copyrighted material without permission
509
- - Personal information of others
510
- - Spam or commercial content
511
-
512
- ### User Conduct
513
- - Use the service for its intended cultural preservation purpose
514
- - Respect other users and their contributions
515
- - Follow community guidelines and cultural sensitivities
516
- - Report inappropriate content or behavior
517
-
518
- ## 🏛️ Intellectual Property
519
-
520
- ### Your Content
521
- - You retain ownership of your cultural contributions
522
- - You grant us license to use your content for cultural preservation
523
- - You can modify or delete your contributions anytime
524
- - We respect traditional knowledge and cultural heritage rights
525
-
526
- ### Our Platform
527
- - The Corpus Collection Engine platform is our intellectual property
528
- - You may not copy, modify, or distribute our software
529
- - Our AI models and algorithms are proprietary
530
- - Trademarks and logos are protected
531
-
532
- ### Traditional Knowledge
533
- - We respect indigenous and traditional knowledge rights
534
- - Cultural content is treated with appropriate sensitivity
535
- - Community ownership of cultural heritage is acknowledged
536
- - Traditional knowledge is not claimed as our property
537
-
538
- ## 🔒 Privacy and Data
539
-
540
- - Your privacy is governed by our Privacy Policy
541
- - We collect data only for cultural preservation purposes
542
- - You have control over your data and privacy settings
543
- - We implement strong security measures to protect your data
544
-
545
- ## ⚠️ Disclaimers
546
-
547
- ### Service Availability
548
- - We strive for high availability but cannot guarantee 100% uptime
549
- - Maintenance and updates may temporarily interrupt service
550
- - We are not liable for service interruptions
551
-
552
- ### Content Accuracy
553
- - Cultural content is provided by community members
554
- - We do not verify the accuracy of all cultural information
555
- - Users should use their judgment when relying on cultural content
556
- - We are not responsible for inaccuracies in user-generated content
557
-
558
- ### AI Features
559
- - AI-generated content is provided as assistance only
560
- - AI may make mistakes or provide inaccurate suggestions
561
- - Users should review and verify AI-generated content
562
- - We continuously improve AI accuracy but cannot guarantee perfection
563
-
564
- ## 📞 Support and Contact
565
-
566
- ### Getting Help
567
- - **Technical Support**: [email protected]
568
- - **Cultural Questions**: [email protected]
569
- - **Legal Issues**: [email protected]
570
-
571
- ### Response Times
572
- - We aim to respond to inquiries within 48 hours
573
- - Complex issues may require additional time
574
- - Emergency security issues are prioritized
575
-
576
- ## 🔄 Changes to Terms
577
-
578
- - We may update these terms to reflect service changes
579
- - Users will be notified of significant changes
580
- - Continued use implies acceptance of updated terms
581
- - Previous versions available upon request
582
-
583
- ## ⚖️ Legal Information
584
-
585
- ### Governing Law
586
- - These terms are governed by Indian law
587
- - Disputes will be resolved in Indian courts
588
- - We comply with applicable international laws
589
-
590
- ### Limitation of Liability
591
- - Our liability is limited to the extent permitted by law
592
- - We are not liable for indirect or consequential damages
593
- - Maximum liability is limited to service fees (if any)
594
-
595
- ---
596
-
597
- *Thank you for helping preserve Indian cultural heritage through the
598
- Corpus Collection Engine. Together, we're building a lasting legacy
599
- for future generations.*
600
- """)
601
-
602
- def _has_essential_consent(self) -> bool:
603
- """Check if user has given essential consent"""
604
- user_session_id = st.session_state.get("user_session_id", "anonymous")
605
- privacy_settings = self._get_privacy_settings(user_session_id)
606
-
607
- essential_consents = [ConsentType.DATA_COLLECTION, ConsentType.AI_TRAINING]
608
-
609
- for consent_type in essential_consents:
610
- consent_record = privacy_settings.consents.get(consent_type)
611
- if not consent_record or not consent_record.granted:
612
- return False
613
-
614
- return True
615
-
616
- def _get_privacy_settings(self, user_session_id: str) -> PrivacySettings:
617
- """Get privacy settings for user"""
618
- if st.session_state.privacy_settings:
619
- return st.session_state.privacy_settings
620
-
621
- # Create default privacy settings
622
- default_consents = {}
623
-
624
- # Essential consents are granted by default when user starts using the app
625
- essential_consents = [ConsentType.DATA_COLLECTION, ConsentType.AI_TRAINING]
626
-
627
- for consent_type in ConsentType:
628
- is_essential = consent_type in essential_consents
629
- default_consents[consent_type] = ConsentRecord(
630
- user_session=user_session_id,
631
- consent_type=consent_type,
632
- granted=is_essential,
633
- timestamp=datetime.now(),
634
- version=self.current_privacy_version,
635
- )
636
-
637
- privacy_settings = PrivacySettings(
638
- user_session=user_session_id,
639
- consents=default_consents,
640
- data_retention_days=365,
641
- anonymize_data=False,
642
- allow_data_export=True,
643
- created_at=datetime.now(),
644
- updated_at=datetime.now(),
645
- )
646
-
647
- st.session_state.privacy_settings = privacy_settings
648
- return privacy_settings
649
-
650
- def _load_privacy_settings(self, user_session_id: str):
651
- """Load privacy settings from storage"""
652
- try:
653
- # In a full implementation, this would load from database
654
- # For now, we'll use session state
655
- if "privacy_settings" not in st.session_state:
656
- st.session_state.privacy_settings = None
657
-
658
- except Exception as e:
659
- self.logger.error(f"Error loading privacy settings: {e}")
660
-
661
- def _update_consents(
662
- self, user_session_id: str, consent_changes: Dict[ConsentType, bool]
663
- ):
664
- """Update user consent preferences"""
665
- try:
666
- privacy_settings = self._get_privacy_settings(user_session_id)
667
-
668
- for consent_type, granted in consent_changes.items():
669
- # Create new consent record
670
- consent_record = ConsentRecord(
671
- user_session=user_session_id,
672
- consent_type=consent_type,
673
- granted=granted,
674
- timestamp=datetime.now(),
675
- version=self.current_privacy_version,
676
- ip_hash=self._hash_ip(),
677
- user_agent_hash=self._hash_user_agent(),
678
- )
679
-
680
- privacy_settings.consents[consent_type] = consent_record
681
-
682
- privacy_settings.updated_at = datetime.now()
683
- st.session_state.privacy_settings = privacy_settings
684
-
685
- self.logger.info(
686
- f"Updated consents for user {user_session_id}: {consent_changes}"
687
- )
688
-
689
- except Exception as e:
690
- self.logger.error(f"Error updating consents: {e}")
691
-
692
- def _update_data_retention(self, user_session_id: str, retention_days: int):
693
- """Update data retention period"""
694
- try:
695
- privacy_settings = self._get_privacy_settings(user_session_id)
696
- privacy_settings.data_retention_days = retention_days
697
- privacy_settings.updated_at = datetime.now()
698
- st.session_state.privacy_settings = privacy_settings
699
-
700
- self.logger.info(
701
- f"Updated data retention for user {user_session_id}: {retention_days} days"
702
- )
703
-
704
- except Exception as e:
705
- self.logger.error(f"Error updating data retention: {e}")
706
-
707
- def _update_anonymization(self, user_session_id: str, anonymize: bool):
708
- """Update data anonymization setting"""
709
- try:
710
- privacy_settings = self._get_privacy_settings(user_session_id)
711
- privacy_settings.anonymize_data = anonymize
712
- privacy_settings.updated_at = datetime.now()
713
- st.session_state.privacy_settings = privacy_settings
714
-
715
- self.logger.info(
716
- f"Updated anonymization for user {user_session_id}: {anonymize}"
717
- )
718
-
719
- except Exception as e:
720
- self.logger.error(f"Error updating anonymization: {e}")
721
-
722
- def _export_user_data(self, user_session_id: str):
723
- """Export all user data"""
724
- try:
725
- # Get user contributions
726
- contributions = self.storage_service.get_contributions_by_session(
727
- user_session_id
728
- )
729
-
730
- # Get privacy settings
731
- privacy_settings = self._get_privacy_settings(user_session_id)
732
-
733
- # Prepare export data
734
- export_data = {
735
- "user_session": user_session_id,
736
- "export_date": datetime.now().isoformat(),
737
- "privacy_settings": {
738
- "data_retention_days": privacy_settings.data_retention_days,
739
- "anonymize_data": privacy_settings.anonymize_data,
740
- "allow_data_export": privacy_settings.allow_data_export,
741
- "consents": {
742
- consent_type.value: {
743
- "granted": record.granted,
744
- "timestamp": record.timestamp.isoformat(),
745
- "version": record.version,
746
- }
747
- for consent_type, record in privacy_settings.consents.items()
748
- },
749
- },
750
- "contributions": [],
751
- }
752
-
753
- # Add contributions data
754
- for contrib in contributions:
755
- contrib_data = {
756
- "id": contrib.id,
757
- "activity_type": contrib.activity_type.value,
758
- "language": contrib.language,
759
- "timestamp": contrib.timestamp.isoformat(),
760
- "content_data": contrib.content_data,
761
- "cultural_context": contrib.cultural_context,
762
- "validation_status": contrib.validation_status.value,
763
- }
764
- export_data["contributions"].append(contrib_data)
765
-
766
- # Create download
767
- export_json = json.dumps(export_data, indent=2, ensure_ascii=False)
768
-
769
- st.download_button(
770
- label="📥 Download My Data (JSON)",
771
- data=export_json,
772
- file_name=f"my_cultural_data_{datetime.now().strftime('%Y%m%d')}.json",
773
- mime="application/json",
774
- )
775
-
776
- st.success(f"Data export ready! Found {len(contributions)} contributions.")
777
-
778
- except Exception as e:
779
- self.logger.error(f"Error exporting user data: {e}")
780
- st.error("Failed to export data. Please try again.")
781
-
782
- def _show_user_contributions(self, user_session_id: str):
783
- """Show user's contributions"""
784
- try:
785
- contributions = self.storage_service.get_contributions_by_session(
786
- user_session_id
787
- )
788
-
789
- if not contributions:
790
- st.info("You haven't made any contributions yet.")
791
- return
792
-
793
- st.subheader(f"📊 Your {len(contributions)} Contributions")
794
-
795
- # Group by activity type
796
- activity_groups = {}
797
- for contrib in contributions:
798
- activity = contrib.activity_type.value
799
- if activity not in activity_groups:
800
- activity_groups[activity] = []
801
- activity_groups[activity].append(contrib)
802
-
803
- # Display by activity
804
- activity_names = {
805
- "meme": "🎭 Memes",
806
- "recipe": "🍛 Recipes",
807
- "folklore": "📚 Folklore",
808
- "landmark": "🏛️ Landmarks",
809
- }
810
-
811
- for activity, contribs in activity_groups.items():
812
- st.markdown(
813
- f"### {activity_names.get(activity, activity.title())} ({len(contribs)})"
814
- )
815
-
816
- for contrib in contribs[:5]: # Show first 5
817
- with st.expander(
818
- f"{contrib.id[:8]}... - {contrib.timestamp.strftime('%Y-%m-%d')}"
819
- ):
820
- col1, col2 = st.columns([2, 1])
821
-
822
- with col1:
823
- st.json(contrib.content_data)
824
-
825
- with col2:
826
- st.markdown(f"**Language:** {contrib.language}")
827
- st.markdown(
828
- f"**Status:** {contrib.validation_status.value}"
829
- )
830
- if contrib.cultural_context.get("region"):
831
- st.markdown(
832
- f"**Region:** {contrib.cultural_context['region']}"
833
- )
834
-
835
- if st.button(f"🗑️ Delete", key=f"delete_{contrib.id}"):
836
- self._delete_contribution(contrib.id)
837
- st.success("Contribution deleted!")
838
- st.rerun()
839
-
840
- if len(contribs) > 5:
841
- st.markdown(f"*... and {len(contribs) - 5} more*")
842
-
843
- except Exception as e:
844
- self.logger.error(f"Error showing user contributions: {e}")
845
- st.error("Failed to load contributions.")
846
-
847
- def _show_data_deletion_options(self, user_session_id: str):
848
- """Show data deletion options"""
849
- st.subheader("🗑️ Data Deletion Options")
850
-
851
- st.warning("""
852
- **Important**: Data deletion is permanent and cannot be undone.
853
- Consider exporting your data first if you want to keep a copy.
854
- """)
855
-
856
- deletion_options = st.radio(
857
- "What would you like to delete?",
858
- [
859
- "Delete specific contributions",
860
- "Delete all my contributions",
861
- "Delete all my data (contributions + settings)",
862
- ],
863
- )
864
-
865
- if deletion_options == "Delete specific contributions":
866
- st.info(
867
- "Use the 'View My Contributions' section above to delete individual items."
868
- )
869
-
870
- elif deletion_options == "Delete all my contributions":
871
- st.markdown("**This will delete:**")
872
- st.markdown(
873
- "- All your memes, recipes, folklore, and landmark contributions"
874
- )
875
- st.markdown("- Cultural context and metadata")
876
- st.markdown("- Contribution history")
877
-
878
- st.markdown("**This will keep:**")
879
- st.markdown("- Your privacy settings")
880
- st.markdown("- Your consent records")
881
-
882
- if st.checkbox("I understand this action cannot be undone"):
883
- if st.button("🗑️ Delete All My Contributions", type="secondary"):
884
- self._delete_all_contributions(user_session_id)
885
- st.success("All contributions deleted successfully.")
886
- st.rerun()
887
-
888
- elif deletion_options == "Delete all my data (contributions + settings)":
889
- st.markdown("**This will delete:**")
890
- st.markdown("- All your contributions")
891
- st.markdown("- All privacy settings")
892
- st.markdown("- All consent records")
893
- st.markdown("- All session data")
894
-
895
- st.error(
896
- "**Warning**: This is complete data deletion. You will need to start fresh if you use the app again."
897
- )
898
-
899
- confirm_text = st.text_input("Type 'DELETE ALL MY DATA' to confirm:")
900
-
901
- if confirm_text == "DELETE ALL MY DATA":
902
- if st.button("🗑️ Delete Everything", type="secondary"):
903
- self._delete_all_user_data(user_session_id)
904
- st.success("All your data has been deleted.")
905
- st.balloons()
906
- st.rerun()
907
-
908
- def _delete_contribution(self, contribution_id: str):
909
- """Delete a specific contribution"""
910
- try:
911
- # In a full implementation, this would delete from database
912
- self.logger.info(f"Deleted contribution: {contribution_id}")
913
-
914
- except Exception as e:
915
- self.logger.error(f"Error deleting contribution: {e}")
916
-
917
- def _delete_all_contributions(self, user_session_id: str):
918
- """Delete all contributions for a user"""
919
- try:
920
- contributions = self.storage_service.get_contributions_by_session(
921
- user_session_id
922
- )
923
-
924
- for contrib in contributions:
925
- self._delete_contribution(contrib.id)
926
-
927
- self.logger.info(f"Deleted all contributions for user: {user_session_id}")
928
-
929
- except Exception as e:
930
- self.logger.error(f"Error deleting all contributions: {e}")
931
-
932
- def _delete_all_user_data(self, user_session_id: str):
933
- """Delete all data for a user"""
934
- try:
935
- # Delete contributions
936
- self._delete_all_contributions(user_session_id)
937
-
938
- # Clear privacy settings
939
- st.session_state.privacy_settings = None
940
- st.session_state.consent_given = {}
941
-
942
- # Clear other session data
943
- for key in list(st.session_state.keys()):
944
- if "user" in key.lower() or "privacy" in key.lower():
945
- del st.session_state[key]
946
-
947
- self.logger.info(f"Deleted all data for user: {user_session_id}")
948
-
949
- except Exception as e:
950
- self.logger.error(f"Error deleting all user data: {e}")
951
-
952
- def _hash_ip(self) -> str:
953
- """Hash IP address for privacy"""
954
- try:
955
- # In a real implementation, get actual IP
956
- ip = "127.0.0.1" # Placeholder
957
- return hashlib.sha256(ip.encode()).hexdigest()[:16]
958
- except:
959
- return "unknown"
960
-
961
- def _hash_user_agent(self) -> str:
962
- """Hash user agent for privacy"""
963
- try:
964
- # In a real implementation, get actual user agent
965
- user_agent = "unknown" # Placeholder
966
- return hashlib.sha256(user_agent.encode()).hexdigest()[:16]
967
- except:
968
- return "unknown"
969
-
970
- def check_consent_for_action(self, action: str, user_session_id: str) -> bool:
971
- """Check if user has given consent for a specific action"""
972
- try:
973
- privacy_settings = self._get_privacy_settings(user_session_id)
974
-
975
- # Map actions to consent types
976
- action_consent_map = {
977
- "collect_data": ConsentType.DATA_COLLECTION,
978
- "train_ai": ConsentType.AI_TRAINING,
979
- "research_use": ConsentType.RESEARCH_USE,
980
- "public_sharing": ConsentType.PUBLIC_SHARING,
981
- "analytics": ConsentType.ANALYTICS,
982
- "marketing": ConsentType.MARKETING,
983
- }
984
-
985
- consent_type = action_consent_map.get(action)
986
- if not consent_type:
987
- return False
988
-
989
- consent_record = privacy_settings.consents.get(consent_type)
990
- return consent_record and consent_record.granted
991
-
992
- except Exception as e:
993
- self.logger.error(f"Error checking consent for action {action}: {e}")
994
- return False
995
-
996
- def get_data_retention_date(self, user_session_id: str) -> Optional[datetime]:
997
- """Get the date when user's data should be deleted"""
998
- try:
999
- privacy_settings = self._get_privacy_settings(user_session_id)
1000
-
1001
- if privacy_settings.data_retention_days == -1:
1002
- return None # Keep indefinitely
1003
-
1004
- return privacy_settings.created_at + timedelta(
1005
- days=privacy_settings.data_retention_days
1006
- )
1007
-
1008
- except Exception as e:
1009
- self.logger.error(f"Error calculating data retention date: {e}")
1010
- return None
1011
-
1012
- def should_anonymize_data(self, user_session_id: str) -> bool:
1013
- """Check if user's data should be anonymized"""
1014
- try:
1015
- privacy_settings = self._get_privacy_settings(user_session_id)
1016
- return privacy_settings.anonymize_data
1017
-
1018
- except Exception as e:
1019
- self.logger.error(f"Error checking anonymization setting: {e}")
1020
- return False
1021
-
1022
- def get_privacy_summary(self, user_session_id: str) -> Dict[str, Any]:
1023
- """Get privacy summary for user"""
1024
- try:
1025
- privacy_settings = self._get_privacy_settings(user_session_id)
1026
-
1027
- granted_consents = [
1028
- consent_type.value
1029
- for consent_type, record in privacy_settings.consents.items()
1030
- if record.granted
1031
- ]
1032
-
1033
- return {
1034
- "user_session": user_session_id,
1035
- "privacy_version": self.current_privacy_version,
1036
- "granted_consents": granted_consents,
1037
- "data_retention_days": privacy_settings.data_retention_days,
1038
- "anonymize_data": privacy_settings.anonymize_data,
1039
- "allow_data_export": privacy_settings.allow_data_export,
1040
- "settings_updated": privacy_settings.updated_at.isoformat(),
1041
- "has_essential_consent": self._has_essential_consent(),
1042
- }
1043
-
1044
- except Exception as e:
1045
- self.logger.error(f"Error getting privacy summary: {e}")
1046
- return {}
1047
-
1048
- def handle_privacy_banner_action(self, action: str, user_session_id: str):
1049
- """Handle privacy banner actions"""
1050
- try:
1051
- if action == "ACCEPT_ESSENTIAL":
1052
- # Grant essential consents
1053
- essential_consents = {
1054
- ConsentType.DATA_COLLECTION: True,
1055
- ConsentType.AI_TRAINING: True,
1056
- }
1057
- self._update_consents(user_session_id, essential_consents)
1058
- st.session_state.show_privacy_banner = False
1059
-
1060
- elif action == "CUSTOMIZE_PRIVACY":
1061
- # Show privacy settings
1062
- st.session_state.show_privacy_settings = True
1063
-
1064
- elif action == "VIEW_PRIVACY_POLICY":
1065
- # Show privacy policy
1066
- st.session_state.show_privacy_policy = True
1067
-
1068
- except Exception as e:
1069
- self.logger.error(f"Error handling privacy banner action: {e}")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/services/storage_service.py DELETED
@@ -1,509 +0,0 @@
1
- """
2
- Storage service with offline support for the Corpus Collection Engine
3
- """
4
-
5
- import sqlite3
6
- import json
7
- import os
8
- from datetime import datetime
9
- from typing import List, Dict, Optional, Any, Tuple
10
- from pathlib import Path
11
- import logging
12
-
13
- from corpus_collection_engine.models.data_models import (
14
- UserContribution, CorpusEntry, ActivitySession, ValidationStatus
15
- )
16
- from corpus_collection_engine.config import DATABASE_CONFIG, DATA_DIR
17
-
18
-
19
- class StorageService:
20
- """Service for managing local and remote data storage with offline support"""
21
-
22
- def __init__(self, db_path: Optional[str] = None):
23
- self.db_path = db_path or os.path.join(DATA_DIR, "corpus_collection.db")
24
- self.offline_queue_path = os.path.join(DATA_DIR, "offline_queue.json")
25
- self.logger = logging.getLogger(__name__)
26
-
27
- # Ensure data directory exists
28
- os.makedirs(DATA_DIR, exist_ok=True)
29
-
30
- # Initialize database
31
- self._initialize_database()
32
-
33
- # Load offline queue
34
- self.offline_queue = self._load_offline_queue()
35
-
36
- def _initialize_database(self):
37
- """Initialize SQLite database with required tables"""
38
- try:
39
- with sqlite3.connect(self.db_path) as conn:
40
- cursor = conn.cursor()
41
-
42
- # Create user_contributions table
43
- cursor.execute('''
44
- CREATE TABLE IF NOT EXISTS user_contributions (
45
- id TEXT PRIMARY KEY,
46
- user_session TEXT NOT NULL,
47
- activity_type TEXT NOT NULL,
48
- content_data TEXT NOT NULL,
49
- language TEXT NOT NULL,
50
- region TEXT,
51
- cultural_context TEXT NOT NULL,
52
- timestamp TEXT NOT NULL,
53
- validation_status TEXT NOT NULL,
54
- metadata TEXT NOT NULL,
55
- synced BOOLEAN DEFAULT FALSE,
56
- created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
57
- )
58
- ''')
59
-
60
- # Create corpus_entries table
61
- cursor.execute('''
62
- CREATE TABLE IF NOT EXISTS corpus_entries (
63
- id TEXT PRIMARY KEY,
64
- contribution_id TEXT NOT NULL,
65
- text_content TEXT,
66
- image_content BLOB,
67
- language TEXT NOT NULL,
68
- cultural_tags TEXT NOT NULL,
69
- quality_score REAL NOT NULL,
70
- processed_features TEXT NOT NULL,
71
- created_at TEXT NOT NULL,
72
- synced BOOLEAN DEFAULT FALSE,
73
- FOREIGN KEY (contribution_id) REFERENCES user_contributions (id)
74
- )
75
- ''')
76
-
77
- # Create activity_sessions table
78
- cursor.execute('''
79
- CREATE TABLE IF NOT EXISTS activity_sessions (
80
- session_id TEXT PRIMARY KEY,
81
- user_id TEXT,
82
- activity_type TEXT NOT NULL,
83
- start_time TEXT NOT NULL,
84
- contributions TEXT NOT NULL,
85
- engagement_metrics TEXT NOT NULL,
86
- synced BOOLEAN DEFAULT FALSE,
87
- created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
88
- )
89
- ''')
90
-
91
- # Create indexes for better performance
92
- cursor.execute('CREATE INDEX IF NOT EXISTS idx_contributions_session ON user_contributions(user_session)')
93
- cursor.execute('CREATE INDEX IF NOT EXISTS idx_contributions_activity ON user_contributions(activity_type)')
94
- cursor.execute('CREATE INDEX IF NOT EXISTS idx_contributions_language ON user_contributions(language)')
95
- cursor.execute('CREATE INDEX IF NOT EXISTS idx_contributions_synced ON user_contributions(synced)')
96
- cursor.execute('CREATE INDEX IF NOT EXISTS idx_corpus_language ON corpus_entries(language)')
97
- cursor.execute('CREATE INDEX IF NOT EXISTS idx_sessions_activity ON activity_sessions(activity_type)')
98
-
99
- conn.commit()
100
- self.logger.info("Database initialized successfully")
101
-
102
- except sqlite3.Error as e:
103
- self.logger.error(f"Database initialization error: {e}")
104
- raise
105
-
106
- def _load_offline_queue(self) -> List[Dict[str, Any]]:
107
- """Load offline queue from file"""
108
- try:
109
- if os.path.exists(self.offline_queue_path):
110
- with open(self.offline_queue_path, 'r', encoding='utf-8') as f:
111
- return json.load(f)
112
- except (json.JSONDecodeError, IOError) as e:
113
- self.logger.warning(f"Could not load offline queue: {e}")
114
-
115
- return []
116
-
117
- def _save_offline_queue(self):
118
- """Save offline queue to file"""
119
- try:
120
- with open(self.offline_queue_path, 'w', encoding='utf-8') as f:
121
- json.dump(self.offline_queue, f, indent=2, ensure_ascii=False)
122
- except IOError as e:
123
- self.logger.error(f"Could not save offline queue: {e}")
124
-
125
- def save_contribution(self, contribution: UserContribution, offline_mode: bool = False) -> bool:
126
- """
127
- Save user contribution to local database
128
-
129
- Args:
130
- contribution: UserContribution object to save
131
- offline_mode: If True, add to offline queue for later sync
132
-
133
- Returns:
134
- bool: Success status
135
- """
136
- try:
137
- with sqlite3.connect(self.db_path) as conn:
138
- cursor = conn.cursor()
139
-
140
- # Convert contribution to dict for storage
141
- data = contribution.to_dict()
142
-
143
- cursor.execute('''
144
- INSERT OR REPLACE INTO user_contributions
145
- (id, user_session, activity_type, content_data, language, region,
146
- cultural_context, timestamp, validation_status, metadata, synced)
147
- VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
148
- ''', (
149
- data['id'], data['user_session'], data['activity_type'],
150
- data['content_data'], data['language'], data['region'],
151
- data['cultural_context'], data['timestamp'],
152
- data['validation_status'], data['metadata'], not offline_mode
153
- ))
154
-
155
- conn.commit()
156
-
157
- # Add to offline queue if in offline mode
158
- if offline_mode:
159
- self.offline_queue.append({
160
- 'type': 'contribution',
161
- 'data': data,
162
- 'timestamp': datetime.now().isoformat()
163
- })
164
- self._save_offline_queue()
165
-
166
- self.logger.info(f"Contribution {contribution.id} saved successfully")
167
- return True
168
-
169
- except sqlite3.Error as e:
170
- self.logger.error(f"Error saving contribution: {e}")
171
- return False
172
-
173
- def get_contribution(self, contribution_id: str) -> Optional[UserContribution]:
174
- """Get contribution by ID"""
175
- try:
176
- with sqlite3.connect(self.db_path) as conn:
177
- cursor = conn.cursor()
178
- cursor.execute('''
179
- SELECT * FROM user_contributions WHERE id = ?
180
- ''', (contribution_id,))
181
-
182
- row = cursor.fetchone()
183
- if row:
184
- # Convert row to dict
185
- columns = [desc[0] for desc in cursor.description]
186
- data = dict(zip(columns, row))
187
-
188
- # Remove database-specific fields
189
- data.pop('synced', None)
190
- data.pop('created_at', None)
191
-
192
- return UserContribution.from_dict(data)
193
-
194
- except sqlite3.Error as e:
195
- self.logger.error(f"Error retrieving contribution: {e}")
196
-
197
- return None
198
-
199
- def get_contributions_by_session(self, session_id: str) -> List[UserContribution]:
200
- """Get all contributions for a session"""
201
- contributions = []
202
-
203
- try:
204
- with sqlite3.connect(self.db_path) as conn:
205
- cursor = conn.cursor()
206
- cursor.execute('''
207
- SELECT * FROM user_contributions WHERE user_session = ?
208
- ORDER BY timestamp DESC
209
- ''', (session_id,))
210
-
211
- rows = cursor.fetchall()
212
- columns = [desc[0] for desc in cursor.description]
213
-
214
- for row in rows:
215
- data = dict(zip(columns, row))
216
- data.pop('synced', None)
217
- data.pop('created_at', None)
218
- contributions.append(UserContribution.from_dict(data))
219
-
220
- except sqlite3.Error as e:
221
- self.logger.error(f"Error retrieving contributions by session: {e}")
222
-
223
- return contributions
224
-
225
- def get_contributions_by_language(self, language: str, limit: int = 100) -> List[UserContribution]:
226
- """Get contributions by language"""
227
- contributions = []
228
-
229
- try:
230
- with sqlite3.connect(self.db_path) as conn:
231
- cursor = conn.cursor()
232
- cursor.execute('''
233
- SELECT * FROM user_contributions WHERE language = ?
234
- ORDER BY timestamp DESC LIMIT ?
235
- ''', (language, limit))
236
-
237
- rows = cursor.fetchall()
238
- columns = [desc[0] for desc in cursor.description]
239
-
240
- for row in rows:
241
- data = dict(zip(columns, row))
242
- data.pop('synced', None)
243
- data.pop('created_at', None)
244
- contributions.append(UserContribution.from_dict(data))
245
-
246
- except sqlite3.Error as e:
247
- self.logger.error(f"Error retrieving contributions by language: {e}")
248
-
249
- return contributions
250
-
251
- def save_corpus_entry(self, entry: CorpusEntry) -> bool:
252
- """Save corpus entry to database"""
253
- try:
254
- with sqlite3.connect(self.db_path) as conn:
255
- cursor = conn.cursor()
256
-
257
- data = entry.to_dict()
258
-
259
- cursor.execute('''
260
- INSERT OR REPLACE INTO corpus_entries
261
- (id, contribution_id, text_content, image_content, language,
262
- cultural_tags, quality_score, processed_features, created_at)
263
- VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
264
- ''', (
265
- data['id'], data['contribution_id'], data['text_content'],
266
- data['image_content'], data['language'], data['cultural_tags'],
267
- data['quality_score'], data['processed_features'], data['created_at']
268
- ))
269
-
270
- conn.commit()
271
- self.logger.info(f"Corpus entry {entry.id} saved successfully")
272
- return True
273
-
274
- except sqlite3.Error as e:
275
- self.logger.error(f"Error saving corpus entry: {e}")
276
- return False
277
-
278
- def save_activity_session(self, session: ActivitySession) -> bool:
279
- """Save activity session to database"""
280
- try:
281
- with sqlite3.connect(self.db_path) as conn:
282
- cursor = conn.cursor()
283
-
284
- data = session.to_dict()
285
-
286
- cursor.execute('''
287
- INSERT OR REPLACE INTO activity_sessions
288
- (session_id, user_id, activity_type, start_time,
289
- contributions, engagement_metrics)
290
- VALUES (?, ?, ?, ?, ?, ?)
291
- ''', (
292
- data['session_id'], data['user_id'], data['activity_type'],
293
- data['start_time'], data['contributions'], data['engagement_metrics']
294
- ))
295
-
296
- conn.commit()
297
- self.logger.info(f"Activity session {session.session_id} saved successfully")
298
- return True
299
-
300
- except sqlite3.Error as e:
301
- self.logger.error(f"Error saving activity session: {e}")
302
- return False
303
-
304
- def get_statistics(self) -> Dict[str, Any]:
305
- """Get database statistics"""
306
- stats = {
307
- 'total_contributions': 0,
308
- 'contributions_by_language': {},
309
- 'contributions_by_activity': {},
310
- 'unsynced_contributions': 0,
311
- 'total_corpus_entries': 0,
312
- 'total_sessions': 0,
313
- 'offline_queue_size': len(self.offline_queue)
314
- }
315
-
316
- try:
317
- with sqlite3.connect(self.db_path) as conn:
318
- cursor = conn.cursor()
319
-
320
- # Total contributions
321
- cursor.execute('SELECT COUNT(*) FROM user_contributions')
322
- stats['total_contributions'] = cursor.fetchone()[0]
323
-
324
- # Contributions by language
325
- cursor.execute('''
326
- SELECT language, COUNT(*) FROM user_contributions
327
- GROUP BY language
328
- ''')
329
- stats['contributions_by_language'] = dict(cursor.fetchall())
330
-
331
- # Contributions by activity
332
- cursor.execute('''
333
- SELECT activity_type, COUNT(*) FROM user_contributions
334
- GROUP BY activity_type
335
- ''')
336
- stats['contributions_by_activity'] = dict(cursor.fetchall())
337
-
338
- # Unsynced contributions
339
- cursor.execute('SELECT COUNT(*) FROM user_contributions WHERE synced = FALSE')
340
- stats['unsynced_contributions'] = cursor.fetchone()[0]
341
-
342
- # Total corpus entries
343
- cursor.execute('SELECT COUNT(*) FROM corpus_entries')
344
- stats['total_corpus_entries'] = cursor.fetchone()[0]
345
-
346
- # Total sessions
347
- cursor.execute('SELECT COUNT(*) FROM activity_sessions')
348
- stats['total_sessions'] = cursor.fetchone()[0]
349
-
350
- except sqlite3.Error as e:
351
- self.logger.error(f"Error getting statistics: {e}")
352
-
353
- return stats
354
-
355
- def get_unsynced_contributions(self, limit: int = 100) -> List[UserContribution]:
356
- """Get contributions that haven't been synced to remote storage"""
357
- contributions = []
358
-
359
- try:
360
- with sqlite3.connect(self.db_path) as conn:
361
- cursor = conn.cursor()
362
- cursor.execute('''
363
- SELECT * FROM user_contributions WHERE synced = FALSE
364
- ORDER BY timestamp ASC LIMIT ?
365
- ''', (limit,))
366
-
367
- rows = cursor.fetchall()
368
- columns = [desc[0] for desc in cursor.description]
369
-
370
- for row in rows:
371
- data = dict(zip(columns, row))
372
- data.pop('synced', None)
373
- data.pop('created_at', None)
374
- contributions.append(UserContribution.from_dict(data))
375
-
376
- except sqlite3.Error as e:
377
- self.logger.error(f"Error retrieving unsynced contributions: {e}")
378
-
379
- return contributions
380
-
381
- def mark_contribution_synced(self, contribution_id: str) -> bool:
382
- """Mark contribution as synced to remote storage"""
383
- try:
384
- with sqlite3.connect(self.db_path) as conn:
385
- cursor = conn.cursor()
386
- cursor.execute('''
387
- UPDATE user_contributions SET synced = TRUE WHERE id = ?
388
- ''', (contribution_id,))
389
-
390
- conn.commit()
391
- return cursor.rowcount > 0
392
-
393
- except sqlite3.Error as e:
394
- self.logger.error(f"Error marking contribution as synced: {e}")
395
- return False
396
-
397
- def process_offline_queue(self) -> int:
398
- """Process offline queue and attempt to sync items"""
399
- processed_count = 0
400
-
401
- if not self.offline_queue:
402
- return processed_count
403
-
404
- # Create a copy of the queue to process
405
- queue_copy = self.offline_queue.copy()
406
- self.offline_queue.clear()
407
-
408
- for item in queue_copy:
409
- try:
410
- if item['type'] == 'contribution':
411
- # Re-save contribution with sync enabled
412
- contribution = UserContribution.from_dict(item['data'])
413
- if self.save_contribution(contribution, offline_mode=False):
414
- processed_count += 1
415
- else:
416
- # If save fails, add back to queue
417
- self.offline_queue.append(item)
418
-
419
- except Exception as e:
420
- self.logger.error(f"Error processing offline queue item: {e}")
421
- # Add back to queue for retry
422
- self.offline_queue.append(item)
423
-
424
- # Save updated queue
425
- self._save_offline_queue()
426
-
427
- if processed_count > 0:
428
- self.logger.info(f"Processed {processed_count} items from offline queue")
429
-
430
- return processed_count
431
-
432
- def cleanup_old_data(self, days_old: int = 30) -> int:
433
- """Clean up old synced data to save space"""
434
- try:
435
- with sqlite3.connect(self.db_path) as conn:
436
- cursor = conn.cursor()
437
-
438
- # Delete old synced contributions
439
- cursor.execute('''
440
- DELETE FROM user_contributions
441
- WHERE synced = TRUE
442
- AND created_at < datetime('now', '-{} days')
443
- '''.format(days_old))
444
-
445
- deleted_count = cursor.rowcount
446
- conn.commit()
447
-
448
- self.logger.info(f"Cleaned up {deleted_count} old records")
449
- return deleted_count
450
-
451
- except sqlite3.Error as e:
452
- self.logger.error(f"Error cleaning up old data: {e}")
453
- return 0
454
-
455
- def export_data(self, output_path: str, include_synced: bool = False) -> bool:
456
- """Export data to JSON file"""
457
- try:
458
- export_data = {
459
- 'contributions': [],
460
- 'corpus_entries': [],
461
- 'sessions': [],
462
- 'export_timestamp': datetime.now().isoformat()
463
- }
464
-
465
- with sqlite3.connect(self.db_path) as conn:
466
- cursor = conn.cursor()
467
-
468
- # Export contributions
469
- sync_condition = "" if include_synced else "WHERE synced = FALSE"
470
- cursor.execute(f'SELECT * FROM user_contributions {sync_condition}')
471
-
472
- columns = [desc[0] for desc in cursor.description]
473
- for row in cursor.fetchall():
474
- data = dict(zip(columns, row))
475
- data.pop('synced', None)
476
- data.pop('created_at', None)
477
- export_data['contributions'].append(data)
478
-
479
- # Export corpus entries
480
- cursor.execute('SELECT * FROM corpus_entries')
481
- columns = [desc[0] for desc in cursor.description]
482
- for row in cursor.fetchall():
483
- data = dict(zip(columns, row))
484
- data.pop('synced', None)
485
- # Convert blob to base64 if present
486
- if data.get('image_content'):
487
- import base64
488
- data['image_content'] = base64.b64encode(data['image_content']).decode('utf-8')
489
- export_data['corpus_entries'].append(data)
490
-
491
- # Export sessions
492
- cursor.execute('SELECT * FROM activity_sessions')
493
- columns = [desc[0] for desc in cursor.description]
494
- for row in cursor.fetchall():
495
- data = dict(zip(columns, row))
496
- data.pop('synced', None)
497
- data.pop('created_at', None)
498
- export_data['sessions'].append(data)
499
-
500
- # Write to file
501
- with open(output_path, 'w', encoding='utf-8') as f:
502
- json.dump(export_data, f, indent=2, ensure_ascii=False)
503
-
504
- self.logger.info(f"Data exported to {output_path}")
505
- return True
506
-
507
- except Exception as e:
508
- self.logger.error(f"Error exporting data: {e}")
509
- return False
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/services/validation_service.py DELETED
@@ -1,618 +0,0 @@
1
- """
2
- Validation Service for content moderation and quality control
3
- """
4
-
5
- import re
6
- import logging
7
- from typing import Dict, List, Tuple, Any, Optional, Set
8
- from datetime import datetime
9
- from dataclasses import dataclass
10
- from enum import Enum
11
-
12
- from corpus_collection_engine.models.data_models import UserContribution, ValidationStatus, ActivityType
13
- from corpus_collection_engine.services.language_service import LanguageService
14
- from corpus_collection_engine.services.ai_service import AIService
15
- from corpus_collection_engine.config import VALIDATION_CONFIG
16
-
17
-
18
- class ModerationAction(Enum):
19
- """Actions that can be taken during moderation"""
20
- APPROVE = "approve"
21
- REJECT = "reject"
22
- FLAG_REVIEW = "flag_review"
23
- REQUEST_EDIT = "request_edit"
24
-
25
-
26
- class ContentIssue(Enum):
27
- """Types of content issues that can be detected"""
28
- INAPPROPRIATE_LANGUAGE = "inappropriate_language"
29
- SPAM_CONTENT = "spam_content"
30
- LOW_QUALITY = "low_quality"
31
- CULTURAL_INSENSITIVITY = "cultural_insensitivity"
32
- DUPLICATE_CONTENT = "duplicate_content"
33
- INSUFFICIENT_CONTENT = "insufficient_content"
34
- PRIVACY_VIOLATION = "privacy_violation"
35
- COPYRIGHT_VIOLATION = "copyright_violation"
36
-
37
-
38
- @dataclass
39
- class ModerationResult:
40
- """Result of content moderation"""
41
- action: ModerationAction
42
- confidence: float
43
- issues: List[ContentIssue]
44
- suggestions: List[str]
45
- quality_score: float
46
- explanation: str
47
-
48
-
49
- class ValidationService:
50
- """Service for content validation and moderation"""
51
-
52
- def __init__(self):
53
- self.logger = logging.getLogger(__name__)
54
- self.language_service = LanguageService()
55
- self.ai_service = AIService()
56
-
57
- # Load moderation rules and filters
58
- self._initialize_filters()
59
-
60
- # Quality thresholds
61
- self.quality_thresholds = {
62
- 'minimum_score': 0.3,
63
- 'auto_approve_score': 0.8,
64
- 'review_score': 0.5
65
- }
66
-
67
- # Content similarity threshold for duplicate detection
68
- self.similarity_threshold = 0.85
69
-
70
- def _initialize_filters(self):
71
- """Initialize content filters and moderation rules"""
72
- # Inappropriate content patterns (basic examples)
73
- self.inappropriate_patterns = [
74
- r'\b(hate|violence|discrimination)\b',
75
- r'\b(offensive|abusive|harassment)\b',
76
- # Add more patterns as needed, considering cultural context
77
- ]
78
-
79
- # Spam indicators
80
- self.spam_patterns = [
81
- r'(http[s]?://|www\.)', # URLs
82
- r'(\b\d{10}\b)', # Phone numbers
83
- r'(buy now|click here|limited offer)', # Commercial spam
84
- r'(.)\1{10,}', # Repeated characters
85
- ]
86
-
87
- # Low quality indicators
88
- self.low_quality_patterns = [
89
- r'^(.{1,10})$', # Very short content
90
- r'^[A-Z\s!]{20,}$', # All caps
91
- r'[^\w\s]{5,}', # Too many special characters
92
- ]
93
-
94
- # Cultural sensitivity keywords (to be handled carefully)
95
- self.cultural_sensitivity_keywords = [
96
- 'caste', 'religion', 'community', 'tradition', 'ritual'
97
- ]
98
-
99
- # Privacy-related patterns
100
- self.privacy_patterns = [
101
- r'\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b', # Credit card
102
- r'\b[A-Z]{5}[0-9]{4}[A-Z]{1}\b', # PAN card
103
- r'\b\d{12}\b', # Aadhaar-like numbers
104
- ]
105
-
106
- def moderate_contribution(self, contribution: UserContribution) -> ModerationResult:
107
- """
108
- Perform comprehensive moderation on a user contribution
109
-
110
- Args:
111
- contribution: UserContribution to moderate
112
-
113
- Returns:
114
- ModerationResult with action and details
115
- """
116
- issues = []
117
- suggestions = []
118
- quality_scores = []
119
-
120
- try:
121
- # Extract text content for analysis
122
- text_content = self._extract_text_content(contribution)
123
-
124
- # 1. Language and basic validation
125
- lang_score, lang_issues = self._validate_language_content(text_content, contribution.language)
126
- quality_scores.append(lang_score)
127
- issues.extend(lang_issues)
128
-
129
- # 2. Content appropriateness check
130
- appropriate_score, appropriate_issues = self._check_content_appropriateness(text_content)
131
- quality_scores.append(appropriate_score)
132
- issues.extend(appropriate_issues)
133
-
134
- # 3. Spam detection
135
- spam_score, spam_issues = self._detect_spam_content(text_content)
136
- quality_scores.append(spam_score)
137
- issues.extend(spam_issues)
138
-
139
- # 4. Quality assessment
140
- quality_score, quality_issues = self._assess_content_quality(contribution)
141
- quality_scores.append(quality_score)
142
- issues.extend(quality_issues)
143
-
144
- # 5. Cultural sensitivity check
145
- cultural_score, cultural_issues = self._check_cultural_sensitivity(text_content, contribution.language)
146
- quality_scores.append(cultural_score)
147
- issues.extend(cultural_issues)
148
-
149
- # 6. Privacy and safety check
150
- privacy_score, privacy_issues = self._check_privacy_safety(text_content)
151
- quality_scores.append(privacy_score)
152
- issues.extend(privacy_issues)
153
-
154
- # 7. Activity-specific validation
155
- activity_score, activity_issues = self._validate_activity_specific(contribution)
156
- quality_scores.append(activity_score)
157
- issues.extend(activity_issues)
158
-
159
- # Calculate overall quality score
160
- overall_quality = sum(quality_scores) / len(quality_scores) if quality_scores else 0.0
161
-
162
- # Generate suggestions based on issues
163
- suggestions = self._generate_suggestions(issues, contribution)
164
-
165
- # Determine moderation action
166
- action, confidence, explanation = self._determine_action(overall_quality, issues)
167
-
168
- return ModerationResult(
169
- action=action,
170
- confidence=confidence,
171
- issues=issues,
172
- suggestions=suggestions,
173
- quality_score=overall_quality,
174
- explanation=explanation
175
- )
176
-
177
- except Exception as e:
178
- self.logger.error(f"Error during moderation: {e}")
179
- return ModerationResult(
180
- action=ModerationAction.FLAG_REVIEW,
181
- confidence=0.5,
182
- issues=[ContentIssue.LOW_QUALITY],
183
- suggestions=["Content requires manual review due to processing error"],
184
- quality_score=0.3,
185
- explanation="Automatic moderation failed, requires manual review"
186
- )
187
-
188
- def _extract_text_content(self, contribution: UserContribution) -> str:
189
- """Extract all text content from contribution for analysis"""
190
- text_parts = []
191
-
192
- # Extract from content_data based on activity type
193
- content_data = contribution.content_data
194
-
195
- if contribution.activity_type == ActivityType.MEME:
196
- texts = content_data.get('texts', [])
197
- text_parts.extend([text for text in texts if text])
198
-
199
- elif contribution.activity_type == ActivityType.RECIPE:
200
- text_parts.append(content_data.get('title', ''))
201
- text_parts.append(content_data.get('instructions', ''))
202
- text_parts.append(content_data.get('family_story', ''))
203
- # Add ingredients
204
- ingredients = content_data.get('ingredients', [])
205
- for ing in ingredients:
206
- if isinstance(ing, dict) and ing.get('name'):
207
- text_parts.append(ing['name'])
208
-
209
- elif contribution.activity_type == ActivityType.FOLKLORE:
210
- text_parts.append(content_data.get('title', ''))
211
- text_parts.append(content_data.get('story', ''))
212
- text_parts.append(content_data.get('meaning', ''))
213
-
214
- elif contribution.activity_type == ActivityType.LANDMARK:
215
- text_parts.append(content_data.get('name', ''))
216
- text_parts.append(content_data.get('description', ''))
217
-
218
- # Extract from cultural context
219
- cultural_context = contribution.cultural_context
220
- text_parts.append(cultural_context.get('cultural_significance', ''))
221
- text_parts.append(cultural_context.get('additional_context', ''))
222
-
223
- # Combine all text
224
- combined_text = ' '.join([text for text in text_parts if text and text.strip()])
225
- return combined_text
226
-
227
- def _validate_language_content(self, text: str, expected_language: str) -> Tuple[float, List[ContentIssue]]:
228
- """Validate language consistency and quality"""
229
- issues = []
230
- score = 1.0
231
-
232
- if not text or len(text.strip()) < 10:
233
- issues.append(ContentIssue.INSUFFICIENT_CONTENT)
234
- score = 0.2
235
- return score, issues
236
-
237
- # Check language consistency
238
- detected_lang, confidence = self.language_service.detect_language(text)
239
-
240
- if detected_lang and detected_lang != expected_language and confidence > 0.7:
241
- # Language mismatch - might be intentional for multilingual content
242
- if confidence > 0.9:
243
- issues.append(ContentIssue.LOW_QUALITY)
244
- score *= 0.7
245
-
246
- # Check text statistics
247
- stats = self.language_service.get_text_statistics(text)
248
-
249
- # Very short content
250
- if stats['word_count'] < 5:
251
- issues.append(ContentIssue.INSUFFICIENT_CONTENT)
252
- score *= 0.5
253
-
254
- # Very long content might be spam
255
- if stats['word_count'] > 1000:
256
- score *= 0.9
257
-
258
- return score, issues
259
-
260
- def _check_content_appropriateness(self, text: str) -> Tuple[float, List[ContentIssue]]:
261
- """Check for inappropriate content"""
262
- issues = []
263
- score = 1.0
264
-
265
- text_lower = text.lower()
266
-
267
- # Check for inappropriate patterns
268
- for pattern in self.inappropriate_patterns:
269
- if re.search(pattern, text_lower, re.IGNORECASE):
270
- issues.append(ContentIssue.INAPPROPRIATE_LANGUAGE)
271
- score *= 0.3
272
- break
273
-
274
- # Use AI sentiment analysis for additional context
275
- try:
276
- sentiment = self.ai_service.analyze_sentiment(text)
277
- if sentiment.get('negative', 0) > 0.8:
278
- score *= 0.7 # High negative sentiment might indicate issues
279
- except:
280
- pass # AI analysis is optional
281
-
282
- return score, issues
283
-
284
- def _detect_spam_content(self, text: str) -> Tuple[float, List[ContentIssue]]:
285
- """Detect spam and promotional content"""
286
- issues = []
287
- score = 1.0
288
-
289
- spam_indicators = 0
290
-
291
- # Check spam patterns
292
- for pattern in self.spam_patterns:
293
- if re.search(pattern, text, re.IGNORECASE):
294
- spam_indicators += 1
295
-
296
- # Check for repeated words/phrases
297
- words = text.lower().split()
298
- if len(words) > 10:
299
- word_freq = {}
300
- for word in words:
301
- word_freq[word] = word_freq.get(word, 0) + 1
302
-
303
- # If any word appears more than 30% of the time, it might be spam
304
- max_freq = max(word_freq.values()) if word_freq else 0
305
- if max_freq > len(words) * 0.3:
306
- spam_indicators += 1
307
-
308
- # Determine spam score
309
- if spam_indicators >= 2:
310
- issues.append(ContentIssue.SPAM_CONTENT)
311
- score = 0.2
312
- elif spam_indicators == 1:
313
- score *= 0.7
314
-
315
- return score, issues
316
-
317
- def _assess_content_quality(self, contribution: UserContribution) -> Tuple[float, List[ContentIssue]]:
318
- """Assess overall content quality"""
319
- issues = []
320
- score = 1.0
321
-
322
- content_data = contribution.content_data
323
-
324
- # Activity-specific quality checks
325
- if contribution.activity_type == ActivityType.MEME:
326
- texts = content_data.get('texts', [])
327
- if not any(text.strip() for text in texts):
328
- issues.append(ContentIssue.INSUFFICIENT_CONTENT)
329
- score *= 0.3
330
-
331
- elif contribution.activity_type == ActivityType.RECIPE:
332
- title = content_data.get('title', '')
333
- instructions = content_data.get('instructions', '')
334
- ingredients = content_data.get('ingredients', [])
335
-
336
- if len(title.strip()) < 3:
337
- issues.append(ContentIssue.LOW_QUALITY)
338
- score *= 0.7
339
-
340
- if len(instructions.strip()) < 20:
341
- issues.append(ContentIssue.INSUFFICIENT_CONTENT)
342
- score *= 0.5
343
-
344
- valid_ingredients = [ing for ing in ingredients if isinstance(ing, dict) and ing.get('name', '').strip()]
345
- if len(valid_ingredients) < 2:
346
- issues.append(ContentIssue.INSUFFICIENT_CONTENT)
347
- score *= 0.6
348
-
349
- elif contribution.activity_type == ActivityType.FOLKLORE:
350
- story = content_data.get('story', '')
351
- if len(story.strip()) < 50:
352
- issues.append(ContentIssue.INSUFFICIENT_CONTENT)
353
- score *= 0.4
354
-
355
- elif contribution.activity_type == ActivityType.LANDMARK:
356
- description = content_data.get('description', '')
357
- if len(description.strip()) < 20:
358
- issues.append(ContentIssue.INSUFFICIENT_CONTENT)
359
- score *= 0.5
360
-
361
- # Check cultural context quality
362
- cultural_significance = contribution.cultural_context.get('cultural_significance', '')
363
- if len(cultural_significance.strip()) < 10:
364
- score *= 0.9 # Minor penalty for missing cultural context
365
-
366
- return score, issues
367
-
368
- def _check_cultural_sensitivity(self, text: str, language: str) -> Tuple[float, List[ContentIssue]]:
369
- """Check for cultural sensitivity issues"""
370
- issues = []
371
- score = 1.0
372
-
373
- text_lower = text.lower()
374
-
375
- # Check for potentially sensitive topics
376
- sensitive_count = 0
377
- for keyword in self.cultural_sensitivity_keywords:
378
- if keyword in text_lower:
379
- sensitive_count += 1
380
-
381
- # If multiple sensitive keywords, flag for review
382
- if sensitive_count >= 3:
383
- issues.append(ContentIssue.CULTURAL_INSENSITIVITY)
384
- score *= 0.8 # Requires careful review, not necessarily rejection
385
-
386
- # Use AI to suggest cultural tags and check for appropriateness
387
- try:
388
- cultural_tags = self.ai_service.suggest_cultural_tags(text, language)
389
- # If AI suggests concerning tags, reduce score slightly
390
- concerning_tags = ['controversial', 'sensitive', 'political']
391
- if any(tag in cultural_tags for tag in concerning_tags):
392
- score *= 0.9
393
- except:
394
- pass # AI analysis is optional
395
-
396
- return score, issues
397
-
398
- def _check_privacy_safety(self, text: str) -> Tuple[float, List[ContentIssue]]:
399
- """Check for privacy violations and personal information"""
400
- issues = []
401
- score = 1.0
402
-
403
- # Check for privacy-sensitive patterns
404
- for pattern in self.privacy_patterns:
405
- if re.search(pattern, text):
406
- issues.append(ContentIssue.PRIVACY_VIOLATION)
407
- score *= 0.3
408
- break
409
-
410
- # Check for email addresses
411
- email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
412
- if re.search(email_pattern, text):
413
- issues.append(ContentIssue.PRIVACY_VIOLATION)
414
- score *= 0.5
415
-
416
- return score, issues
417
-
418
- def _validate_activity_specific(self, contribution: UserContribution) -> Tuple[float, List[ContentIssue]]:
419
- """Perform activity-specific validation"""
420
- issues = []
421
- score = 1.0
422
-
423
- # Use the existing validation from base activity
424
- try:
425
- if contribution.activity_type == ActivityType.MEME:
426
- from corpus_collection_engine.activities.meme_creator import MemeCreatorActivity
427
- activity = MemeCreatorActivity()
428
- elif contribution.activity_type == ActivityType.RECIPE:
429
- from corpus_collection_engine.activities.recipe_exchange import RecipeExchangeActivity
430
- activity = RecipeExchangeActivity()
431
- elif contribution.activity_type == ActivityType.FOLKLORE:
432
- from corpus_collection_engine.activities.folklore_collector import FolkloreCollectorActivity
433
- activity = FolkloreCollectorActivity()
434
- elif contribution.activity_type == ActivityType.LANDMARK:
435
- from corpus_collection_engine.activities.landmark_identifier import LandmarkIdentifierActivity
436
- activity = LandmarkIdentifierActivity()
437
- else:
438
- return score, issues
439
-
440
- is_valid, message = activity.validate_content(contribution.content_data)
441
- if not is_valid:
442
- issues.append(ContentIssue.LOW_QUALITY)
443
- score *= 0.4
444
-
445
- except Exception as e:
446
- self.logger.warning(f"Activity-specific validation failed: {e}")
447
- score *= 0.9
448
-
449
- return score, issues
450
-
451
- def _generate_suggestions(self, issues: List[ContentIssue], contribution: UserContribution) -> List[str]:
452
- """Generate improvement suggestions based on detected issues"""
453
- suggestions = []
454
-
455
- if ContentIssue.INSUFFICIENT_CONTENT in issues:
456
- if contribution.activity_type == ActivityType.MEME:
457
- suggestions.append("Add more descriptive text to your meme captions")
458
- elif contribution.activity_type == ActivityType.RECIPE:
459
- suggestions.append("Provide more detailed cooking instructions and ingredients")
460
- elif contribution.activity_type == ActivityType.FOLKLORE:
461
- suggestions.append("Expand your story with more details and context")
462
- elif contribution.activity_type == ActivityType.LANDMARK:
463
- suggestions.append("Add more descriptive details about the landmark")
464
-
465
- if ContentIssue.LOW_QUALITY in issues:
466
- suggestions.append("Improve the quality and clarity of your content")
467
- suggestions.append("Check spelling and grammar")
468
-
469
- if ContentIssue.INAPPROPRIATE_LANGUAGE in issues:
470
- suggestions.append("Please use respectful and appropriate language")
471
-
472
- if ContentIssue.SPAM_CONTENT in issues:
473
- suggestions.append("Remove promotional content and focus on cultural sharing")
474
-
475
- if ContentIssue.CULTURAL_INSENSITIVITY in issues:
476
- suggestions.append("Please ensure your content is culturally sensitive and respectful")
477
-
478
- if ContentIssue.PRIVACY_VIOLATION in issues:
479
- suggestions.append("Remove personal information like phone numbers, addresses, or ID numbers")
480
-
481
- # General suggestions
482
- suggestions.append("Add more cultural context and significance")
483
- suggestions.append("Share personal stories or family connections")
484
-
485
- return suggestions
486
-
487
- def _determine_action(self, quality_score: float, issues: List[ContentIssue]) -> Tuple[ModerationAction, float, str]:
488
- """Determine the appropriate moderation action"""
489
-
490
- # Critical issues that require rejection
491
- critical_issues = [
492
- ContentIssue.INAPPROPRIATE_LANGUAGE,
493
- ContentIssue.PRIVACY_VIOLATION
494
- ]
495
-
496
- if any(issue in issues for issue in critical_issues):
497
- return (
498
- ModerationAction.REJECT,
499
- 0.9,
500
- "Content contains inappropriate language or privacy violations"
501
- )
502
-
503
- # High quality content - auto approve
504
- if quality_score >= self.quality_thresholds['auto_approve_score'] and len(issues) == 0:
505
- return (
506
- ModerationAction.APPROVE,
507
- 0.95,
508
- "High quality content approved automatically"
509
- )
510
-
511
- # Medium quality - approve with minor issues
512
- if quality_score >= self.quality_thresholds['review_score'] and len(issues) <= 2:
513
- return (
514
- ModerationAction.APPROVE,
515
- 0.8,
516
- "Content approved with minor quality issues"
517
- )
518
-
519
- # Low quality but not critical - request edit
520
- if quality_score >= self.quality_thresholds['minimum_score']:
521
- return (
522
- ModerationAction.REQUEST_EDIT,
523
- 0.7,
524
- "Content needs improvement before approval"
525
- )
526
-
527
- # Very low quality - flag for review
528
- return (
529
- ModerationAction.FLAG_REVIEW,
530
- 0.6,
531
- "Content requires manual review due to quality concerns"
532
- )
533
-
534
- def check_duplicate_content(self, contribution: UserContribution,
535
- existing_contributions: List[UserContribution]) -> Tuple[bool, float]:
536
- """Check for duplicate or very similar content"""
537
- if not existing_contributions:
538
- return False, 0.0
539
-
540
- current_text = self._extract_text_content(contribution)
541
- if len(current_text.strip()) < 20:
542
- return False, 0.0
543
-
544
- # Simple similarity check based on common words
545
- current_words = set(current_text.lower().split())
546
-
547
- max_similarity = 0.0
548
-
549
- for existing in existing_contributions:
550
- if existing.activity_type != contribution.activity_type:
551
- continue
552
-
553
- existing_text = self._extract_text_content(existing)
554
- existing_words = set(existing_text.lower().split())
555
-
556
- if len(existing_words) == 0:
557
- continue
558
-
559
- # Calculate Jaccard similarity
560
- intersection = len(current_words.intersection(existing_words))
561
- union = len(current_words.union(existing_words))
562
-
563
- if union > 0:
564
- similarity = intersection / union
565
- max_similarity = max(max_similarity, similarity)
566
-
567
- is_duplicate = max_similarity >= self.similarity_threshold
568
- return is_duplicate, max_similarity
569
-
570
- def get_moderation_statistics(self, contributions: List[UserContribution]) -> Dict[str, Any]:
571
- """Get moderation statistics for a set of contributions"""
572
- if not contributions:
573
- return {}
574
-
575
- stats = {
576
- 'total_contributions': len(contributions),
577
- 'by_status': {},
578
- 'by_activity': {},
579
- 'quality_distribution': {'high': 0, 'medium': 0, 'low': 0},
580
- 'common_issues': {},
581
- 'average_quality_score': 0.0
582
- }
583
-
584
- total_quality = 0.0
585
-
586
- for contrib in contributions:
587
- # Count by status
588
- status = contrib.validation_status.value
589
- stats['by_status'][status] = stats['by_status'].get(status, 0) + 1
590
-
591
- # Count by activity
592
- activity = contrib.activity_type.value
593
- stats['by_activity'][activity] = stats['by_activity'].get(activity, 0) + 1
594
-
595
- # Moderate to get quality score
596
- try:
597
- result = self.moderate_contribution(contrib)
598
- total_quality += result.quality_score
599
-
600
- # Quality distribution
601
- if result.quality_score >= 0.8:
602
- stats['quality_distribution']['high'] += 1
603
- elif result.quality_score >= 0.5:
604
- stats['quality_distribution']['medium'] += 1
605
- else:
606
- stats['quality_distribution']['low'] += 1
607
-
608
- # Common issues
609
- for issue in result.issues:
610
- issue_name = issue.value
611
- stats['common_issues'][issue_name] = stats['common_issues'].get(issue_name, 0) + 1
612
-
613
- except Exception as e:
614
- self.logger.warning(f"Error moderating contribution {contrib.id}: {e}")
615
-
616
- stats['average_quality_score'] = total_quality / len(contributions) if contributions else 0.0
617
-
618
- return stats
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/utils/__init__.py DELETED
@@ -1 +0,0 @@
1
- # Utils package for Corpus Collection Engine
 
 
intern_project/corpus_collection_engine/utils/error_handler.py DELETED
@@ -1,557 +0,0 @@
1
- """
2
- Comprehensive error handling system for the Corpus Collection Engine
3
- """
4
-
5
- import streamlit as st
6
- import logging
7
- import traceback
8
- import sys
9
- from typing import Dict, Any, Optional, Callable, List
10
- from datetime import datetime
11
- from enum import Enum
12
- import json
13
- from functools import wraps
14
-
15
- from corpus_collection_engine.config import PWA_CONFIG
16
-
17
-
18
- class ErrorSeverity(Enum):
19
- """Error severity levels"""
20
- LOW = "low"
21
- MEDIUM = "medium"
22
- HIGH = "high"
23
- CRITICAL = "critical"
24
-
25
-
26
- class ErrorCategory(Enum):
27
- """Error categories"""
28
- NETWORK = "network"
29
- AI_SERVICE = "ai_service"
30
- STORAGE = "storage"
31
- VALIDATION = "validation"
32
- USER_INPUT = "user_input"
33
- SYSTEM = "system"
34
- PERFORMANCE = "performance"
35
-
36
-
37
- class ErrorHandler:
38
- """Comprehensive error handling and recovery system"""
39
-
40
- def __init__(self):
41
- self.logger = logging.getLogger(__name__)
42
- self.config = PWA_CONFIG
43
-
44
- # Initialize error tracking
45
- if 'error_history' not in st.session_state:
46
- st.session_state.error_history = []
47
-
48
- if 'error_stats' not in st.session_state:
49
- st.session_state.error_stats = {
50
- 'total_errors': 0,
51
- 'errors_by_category': {},
52
- 'errors_by_severity': {},
53
- 'last_error_time': None
54
- }
55
-
56
- # Error recovery strategies
57
- self.recovery_strategies = {
58
- ErrorCategory.NETWORK: self._handle_network_error,
59
- ErrorCategory.AI_SERVICE: self._handle_ai_service_error,
60
- ErrorCategory.STORAGE: self._handle_storage_error,
61
- ErrorCategory.VALIDATION: self._handle_validation_error,
62
- ErrorCategory.USER_INPUT: self._handle_user_input_error,
63
- ErrorCategory.SYSTEM: self._handle_system_error,
64
- ErrorCategory.PERFORMANCE: self._handle_performance_error
65
- }
66
-
67
- # User-friendly error messages
68
- self.error_messages = {
69
- ErrorCategory.NETWORK: {
70
- ErrorSeverity.LOW: "Connection seems slow. Some features may be limited.",
71
- ErrorSeverity.MEDIUM: "Network connection issues detected. Working in offline mode.",
72
- ErrorSeverity.HIGH: "Unable to connect to services. Please check your internet connection.",
73
- ErrorSeverity.CRITICAL: "No network connection. App is running in offline-only mode."
74
- },
75
- ErrorCategory.AI_SERVICE: {
76
- ErrorSeverity.LOW: "AI service is running slower than usual.",
77
- ErrorSeverity.MEDIUM: "AI service temporarily unavailable. Using fallback options.",
78
- ErrorSeverity.HIGH: "AI features are currently disabled due to service issues.",
79
- ErrorSeverity.CRITICAL: "All AI services are unavailable. Manual input required."
80
- },
81
- ErrorCategory.STORAGE: {
82
- ErrorSeverity.LOW: "Data saving is slightly delayed.",
83
- ErrorSeverity.MEDIUM: "Some data couldn't be saved. Will retry automatically.",
84
- ErrorSeverity.HIGH: "Storage issues detected. Data saved locally only.",
85
- ErrorSeverity.CRITICAL: "Unable to save data. Please try again later."
86
- },
87
- ErrorCategory.VALIDATION: {
88
- ErrorSeverity.LOW: "Minor validation issues detected.",
89
- ErrorSeverity.MEDIUM: "Some content needs review before submission.",
90
- ErrorSeverity.HIGH: "Content validation failed. Please check your input.",
91
- ErrorSeverity.CRITICAL: "Content cannot be processed due to validation errors."
92
- },
93
- ErrorCategory.USER_INPUT: {
94
- ErrorSeverity.LOW: "Please check your input.",
95
- ErrorSeverity.MEDIUM: "Some required fields are missing or invalid.",
96
- ErrorSeverity.HIGH: "Input format is not supported.",
97
- ErrorSeverity.CRITICAL: "Unable to process the provided input."
98
- },
99
- ErrorCategory.SYSTEM: {
100
- ErrorSeverity.LOW: "Minor system issue detected.",
101
- ErrorSeverity.MEDIUM: "System performance may be affected.",
102
- ErrorSeverity.HIGH: "System error occurred. Some features may be unavailable.",
103
- ErrorSeverity.CRITICAL: "Critical system error. Please refresh the page."
104
- },
105
- ErrorCategory.PERFORMANCE: {
106
- ErrorSeverity.LOW: "Performance is slightly degraded.",
107
- ErrorSeverity.MEDIUM: "Performance optimizations applied.",
108
- ErrorSeverity.HIGH: "Significant performance issues detected.",
109
- ErrorSeverity.CRITICAL: "System is running very slowly. Consider refreshing."
110
- }
111
- }
112
-
113
- def handle_error(
114
- self,
115
- error: Exception,
116
- category: ErrorCategory,
117
- severity: ErrorSeverity = ErrorSeverity.MEDIUM,
118
- context: Optional[Dict[str, Any]] = None,
119
- show_user_message: bool = True,
120
- recovery_action: Optional[Callable] = None
121
- ) -> bool:
122
- """
123
- Handle an error with appropriate logging, user notification, and recovery
124
-
125
- Returns:
126
- bool: True if error was handled successfully, False otherwise
127
- """
128
- try:
129
- # Log the error
130
- self._log_error(error, category, severity, context)
131
-
132
- # Record error statistics
133
- self._record_error_stats(category, severity)
134
-
135
- # Show user-friendly message
136
- if show_user_message:
137
- self._show_user_error_message(category, severity, context)
138
-
139
- # Attempt recovery
140
- recovery_success = self._attempt_recovery(error, category, severity, recovery_action)
141
-
142
- # Store error in history
143
- self._store_error_history(error, category, severity, context, recovery_success)
144
-
145
- return recovery_success
146
-
147
- except Exception as handler_error:
148
- # Error in error handler - log but don't recurse
149
- self.logger.critical(f"Error in error handler: {handler_error}")
150
- return False
151
-
152
- def _log_error(
153
- self,
154
- error: Exception,
155
- category: ErrorCategory,
156
- severity: ErrorSeverity,
157
- context: Optional[Dict[str, Any]] = None
158
- ):
159
- """Log error with appropriate level"""
160
- error_info = {
161
- 'error_type': type(error).__name__,
162
- 'error_message': str(error),
163
- 'category': category.value,
164
- 'severity': severity.value,
165
- 'context': context or {},
166
- 'traceback': traceback.format_exc(),
167
- 'timestamp': datetime.now().isoformat()
168
- }
169
-
170
- log_message = f"[{category.value.upper()}] {severity.value.upper()}: {error}"
171
-
172
- if severity == ErrorSeverity.CRITICAL:
173
- self.logger.critical(log_message, extra=error_info)
174
- elif severity == ErrorSeverity.HIGH:
175
- self.logger.error(log_message, extra=error_info)
176
- elif severity == ErrorSeverity.MEDIUM:
177
- self.logger.warning(log_message, extra=error_info)
178
- else:
179
- self.logger.info(log_message, extra=error_info)
180
-
181
- def _record_error_stats(self, category: ErrorCategory, severity: ErrorSeverity):
182
- """Record error statistics"""
183
- # Ensure error_stats is initialized
184
- if 'error_stats' not in st.session_state:
185
- st.session_state.error_stats = {
186
- 'total_errors': 0,
187
- 'errors_by_category': {},
188
- 'errors_by_severity': {},
189
- 'last_error_time': None
190
- }
191
-
192
- stats = st.session_state.error_stats
193
-
194
- stats['total_errors'] += 1
195
- stats['last_error_time'] = datetime.now()
196
-
197
- # Category stats
198
- if category.value not in stats['errors_by_category']:
199
- stats['errors_by_category'][category.value] = 0
200
- stats['errors_by_category'][category.value] += 1
201
-
202
- # Severity stats
203
- if severity.value not in stats['errors_by_severity']:
204
- stats['errors_by_severity'][severity.value] = 0
205
- stats['errors_by_severity'][severity.value] += 1
206
-
207
- def _show_user_error_message(
208
- self,
209
- category: ErrorCategory,
210
- severity: ErrorSeverity,
211
- context: Optional[Dict[str, Any]] = None
212
- ):
213
- """Show user-friendly error message"""
214
- message = self.error_messages.get(category, {}).get(
215
- severity,
216
- "An unexpected error occurred. Please try again."
217
- )
218
-
219
- # Add context-specific information
220
- if context and 'user_message' in context:
221
- message = context['user_message']
222
-
223
- # Show message based on severity
224
- if severity == ErrorSeverity.CRITICAL:
225
- st.error(f"🚨 {message}")
226
- elif severity == ErrorSeverity.HIGH:
227
- st.error(f"❌ {message}")
228
- elif severity == ErrorSeverity.MEDIUM:
229
- st.warning(f"⚠️ {message}")
230
- else:
231
- st.info(f"ℹ️ {message}")
232
-
233
- # Show recovery suggestions
234
- self._show_recovery_suggestions(category, severity)
235
-
236
- def _show_recovery_suggestions(self, category: ErrorCategory, severity: ErrorSeverity):
237
- """Show recovery suggestions to user"""
238
- suggestions = self._get_recovery_suggestions(category, severity)
239
-
240
- if suggestions and severity in [ErrorSeverity.HIGH, ErrorSeverity.CRITICAL]:
241
- with st.expander("💡 What can you do?"):
242
- for suggestion in suggestions:
243
- st.markdown(f"• {suggestion}")
244
-
245
- def _get_recovery_suggestions(self, category: ErrorCategory, severity: ErrorSeverity) -> List[str]:
246
- """Get recovery suggestions for error category and severity"""
247
- suggestions = {
248
- ErrorCategory.NETWORK: [
249
- "Check your internet connection",
250
- "Try refreshing the page",
251
- "Switch to offline mode if available",
252
- "Use mobile data if on WiFi (or vice versa)"
253
- ],
254
- ErrorCategory.AI_SERVICE: [
255
- "Try again in a few moments",
256
- "Use manual input instead of AI suggestions",
257
- "Check if the service is temporarily down",
258
- "Try a simpler request"
259
- ],
260
- ErrorCategory.STORAGE: [
261
- "Check available storage space",
262
- "Try saving again",
263
- "Clear browser cache",
264
- "Export your data as backup"
265
- ],
266
- ErrorCategory.VALIDATION: [
267
- "Review your input for errors",
268
- "Check required fields",
269
- "Ensure content meets guidelines",
270
- "Try a different format"
271
- ],
272
- ErrorCategory.USER_INPUT: [
273
- "Check for typos or formatting issues",
274
- "Ensure all required fields are filled",
275
- "Try uploading a different file",
276
- "Reduce file size if too large"
277
- ],
278
- ErrorCategory.SYSTEM: [
279
- "Refresh the page",
280
- "Clear browser cache",
281
- "Try a different browser",
282
- "Contact support if issue persists"
283
- ],
284
- ErrorCategory.PERFORMANCE: [
285
- "Close other browser tabs",
286
- "Check your internet speed",
287
- "Try using a faster connection",
288
- "Reduce image quality settings"
289
- ]
290
- }
291
-
292
- return suggestions.get(category, ["Try refreshing the page", "Contact support if issue persists"])
293
-
294
- def _attempt_recovery(
295
- self,
296
- error: Exception,
297
- category: ErrorCategory,
298
- severity: ErrorSeverity,
299
- custom_recovery: Optional[Callable] = None
300
- ) -> bool:
301
- """Attempt to recover from error"""
302
- try:
303
- # Try custom recovery first
304
- if custom_recovery:
305
- return custom_recovery(error, category, severity)
306
-
307
- # Use category-specific recovery
308
- recovery_func = self.recovery_strategies.get(category)
309
- if recovery_func:
310
- return recovery_func(error, severity)
311
-
312
- return False
313
-
314
- except Exception as recovery_error:
315
- self.logger.error(f"Recovery attempt failed: {recovery_error}")
316
- return False
317
-
318
- def _handle_network_error(self, error: Exception, severity: ErrorSeverity) -> bool:
319
- """Handle network-related errors"""
320
- if severity in [ErrorSeverity.HIGH, ErrorSeverity.CRITICAL]:
321
- # Enable offline mode
322
- st.session_state.offline_mode = True
323
- st.session_state.network_error_count = st.session_state.get('network_error_count', 0) + 1
324
-
325
- # Show offline indicator
326
- st.sidebar.error("🔌 Offline Mode Active")
327
-
328
- return True
329
-
330
- return False
331
-
332
- def _handle_ai_service_error(self, error: Exception, severity: ErrorSeverity) -> bool:
333
- """Handle AI service errors"""
334
- if severity >= ErrorSeverity.MEDIUM:
335
- # Disable AI features temporarily
336
- st.session_state.ai_service_disabled = True
337
- st.session_state.ai_fallback_mode = True
338
-
339
- # Set retry timer
340
- st.session_state.ai_retry_time = datetime.now().timestamp() + 300 # 5 minutes
341
-
342
- return True
343
-
344
- return False
345
-
346
- def _handle_storage_error(self, error: Exception, severity: ErrorSeverity) -> bool:
347
- """Handle storage-related errors"""
348
- if severity >= ErrorSeverity.MEDIUM:
349
- # Enable local-only storage
350
- st.session_state.local_storage_only = True
351
-
352
- # Queue for later sync
353
- if 'storage_queue' not in st.session_state:
354
- st.session_state.storage_queue = []
355
-
356
- return True
357
-
358
- return False
359
-
360
- def _handle_validation_error(self, error: Exception, severity: ErrorSeverity) -> bool:
361
- """Handle validation errors"""
362
- # Validation errors usually require user action
363
- return False
364
-
365
- def _handle_user_input_error(self, error: Exception, severity: ErrorSeverity) -> bool:
366
- """Handle user input errors"""
367
- # User input errors require user correction
368
- return False
369
-
370
- def _handle_system_error(self, error: Exception, severity: ErrorSeverity) -> bool:
371
- """Handle system errors"""
372
- if severity == ErrorSeverity.CRITICAL:
373
- # Suggest page refresh
374
- st.error("Critical system error. Please refresh the page.")
375
- if st.button("🔄 Refresh Page"):
376
- st.rerun()
377
- return True
378
-
379
- return False
380
-
381
- def _handle_performance_error(self, error: Exception, severity: ErrorSeverity) -> bool:
382
- """Handle performance-related errors"""
383
- if severity >= ErrorSeverity.MEDIUM:
384
- # Enable aggressive optimizations
385
- st.session_state.performance_mode = 'aggressive'
386
- st.session_state.connection_speed = 'slow_2g' # Force slow mode optimizations
387
-
388
- return True
389
-
390
- return False
391
-
392
- def _store_error_history(
393
- self,
394
- error: Exception,
395
- category: ErrorCategory,
396
- severity: ErrorSeverity,
397
- context: Optional[Dict[str, Any]],
398
- recovery_success: bool
399
- ):
400
- """Store error in history for analysis"""
401
- error_entry = {
402
- 'timestamp': datetime.now(),
403
- 'error_type': type(error).__name__,
404
- 'error_message': str(error),
405
- 'category': category.value,
406
- 'severity': severity.value,
407
- 'context': context or {},
408
- 'recovery_success': recovery_success,
409
- 'traceback': traceback.format_exc()
410
- }
411
-
412
- st.session_state.error_history.append(error_entry)
413
-
414
- # Keep only last 50 errors
415
- if len(st.session_state.error_history) > 50:
416
- st.session_state.error_history = st.session_state.error_history[-50:]
417
-
418
- def get_error_stats(self) -> Dict[str, Any]:
419
- """Get error statistics"""
420
- # Ensure error_stats is initialized
421
- if 'error_stats' not in st.session_state:
422
- st.session_state.error_stats = {
423
- 'total_errors': 0,
424
- 'errors_by_category': {},
425
- 'errors_by_severity': {},
426
- 'last_error_time': None
427
- }
428
- return st.session_state.error_stats.copy()
429
-
430
- def get_error_history(self) -> List[Dict[str, Any]]:
431
- """Get error history"""
432
- return st.session_state.error_history.copy()
433
-
434
- def clear_error_history(self):
435
- """Clear error history and reset stats"""
436
- st.session_state.error_history = []
437
- st.session_state.error_stats = {
438
- 'total_errors': 0,
439
- 'errors_by_category': {},
440
- 'errors_by_severity': {},
441
- 'last_error_time': None
442
- }
443
-
444
- def render_error_dashboard(self):
445
- """Render error monitoring dashboard"""
446
- st.subheader("🚨 Error Monitoring")
447
-
448
- stats = self.get_error_stats()
449
- history = self.get_error_history()
450
-
451
- # Error statistics
452
- col1, col2, col3, col4 = st.columns(4)
453
-
454
- with col1:
455
- st.metric("Total Errors", stats['total_errors'])
456
-
457
- with col2:
458
- last_error = stats.get('last_error_time')
459
- if last_error:
460
- time_since = datetime.now() - last_error
461
- st.metric("Last Error", f"{time_since.seconds // 60}m ago")
462
- else:
463
- st.metric("Last Error", "None")
464
-
465
- with col3:
466
- most_common_category = max(
467
- stats['errors_by_category'].items(),
468
- key=lambda x: x[1],
469
- default=("None", 0)
470
- )
471
- st.metric("Most Common", most_common_category[0].title())
472
-
473
- with col4:
474
- critical_errors = stats['errors_by_severity'].get('critical', 0)
475
- st.metric("Critical Errors", critical_errors)
476
-
477
- # Error breakdown
478
- if stats['total_errors'] > 0:
479
- col1, col2 = st.columns(2)
480
-
481
- with col1:
482
- st.write("**Errors by Category:**")
483
- for category, count in stats['errors_by_category'].items():
484
- percentage = (count / stats['total_errors']) * 100
485
- st.write(f"• {category.title()}: {count} ({percentage:.1f}%)")
486
-
487
- with col2:
488
- st.write("**Errors by Severity:**")
489
- for severity, count in stats['errors_by_severity'].items():
490
- percentage = (count / stats['total_errors']) * 100
491
- st.write(f"• {severity.title()}: {count} ({percentage:.1f}%)")
492
-
493
- # Recent errors
494
- if history:
495
- st.write("**Recent Errors:**")
496
- for error in history[-5:]: # Show last 5 errors
497
- with st.expander(f"{error['timestamp'].strftime('%H:%M:%S')} - {error['error_type']}"):
498
- st.write(f"**Category:** {error['category']}")
499
- st.write(f"**Severity:** {error['severity']}")
500
- st.write(f"**Message:** {error['error_message']}")
501
- st.write(f"**Recovery:** {'✅ Success' if error['recovery_success'] else '❌ Failed'}")
502
-
503
- # Clear errors button
504
- if st.button("🗑️ Clear Error History"):
505
- self.clear_error_history()
506
- st.success("Error history cleared!")
507
- st.rerun()
508
-
509
-
510
- def error_handler_decorator(
511
- category: ErrorCategory,
512
- severity: ErrorSeverity = ErrorSeverity.MEDIUM,
513
- show_user_message: bool = True,
514
- recovery_action: Optional[Callable] = None
515
- ):
516
- """Decorator for automatic error handling"""
517
- def decorator(func):
518
- @wraps(func)
519
- def wrapper(*args, **kwargs):
520
- try:
521
- return func(*args, **kwargs)
522
- except Exception as e:
523
- handler = ErrorHandler()
524
- handler.handle_error(
525
- e,
526
- category,
527
- severity,
528
- context={'function': func.__name__},
529
- show_user_message=show_user_message,
530
- recovery_action=recovery_action
531
- )
532
- # Re-raise if critical
533
- if severity == ErrorSeverity.CRITICAL:
534
- raise
535
- return None
536
- return wrapper
537
- return decorator
538
-
539
-
540
- def safe_execute(
541
- func: Callable,
542
- category: ErrorCategory,
543
- severity: ErrorSeverity = ErrorSeverity.MEDIUM,
544
- default_return: Any = None,
545
- context: Optional[Dict[str, Any]] = None
546
- ) -> Any:
547
- """Safely execute a function with error handling"""
548
- try:
549
- return func()
550
- except Exception as e:
551
- handler = ErrorHandler()
552
- handler.handle_error(e, category, severity, context)
553
- return default_return
554
-
555
-
556
- # Global error handler instance
557
- global_error_handler = ErrorHandler()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/utils/performance_dashboard.py DELETED
@@ -1,468 +0,0 @@
1
- """
2
- Performance monitoring dashboard for the Corpus Collection Engine
3
- """
4
-
5
- import streamlit as st
6
- import plotly.graph_objects as go
7
- import plotly.express as px
8
- import pandas as pd
9
- from datetime import datetime, timedelta
10
- from typing import Dict, List, Any
11
- import logging
12
-
13
- from corpus_collection_engine.utils.performance_optimizer import PerformanceOptimizer
14
-
15
-
16
- class PerformanceDashboard:
17
- """Dashboard for monitoring application performance"""
18
-
19
- def __init__(self):
20
- self.logger = logging.getLogger(__name__)
21
- self.optimizer = PerformanceOptimizer()
22
-
23
- # Initialize performance tracking
24
- if 'performance_history' not in st.session_state:
25
- st.session_state.performance_history = []
26
-
27
- if 'connection_history' not in st.session_state:
28
- st.session_state.connection_history = []
29
-
30
- def render_dashboard(self):
31
- """Render the complete performance dashboard"""
32
- st.header("📊 Performance Dashboard")
33
-
34
- # Current performance overview
35
- self._render_current_performance()
36
-
37
- # Performance metrics over time
38
- self._render_performance_trends()
39
-
40
- # Connection quality analysis
41
- self._render_connection_analysis()
42
-
43
- # Optimization recommendations
44
- self._render_optimization_recommendations()
45
-
46
- # Performance settings
47
- self._render_performance_settings()
48
-
49
- def _render_current_performance(self):
50
- """Render current performance metrics"""
51
- st.subheader("🚀 Current Performance")
52
-
53
- # Get current stats
54
- stats = self.optimizer.get_optimization_stats()
55
-
56
- col1, col2, col3, col4 = st.columns(4)
57
-
58
- with col1:
59
- connection_speed = stats.get('connection_speed', 'unknown')
60
- connection_emoji = {
61
- 'slow_2g': '🐌',
62
- '2g': '🚶',
63
- '3g': '🚗',
64
- '4g': '🚀',
65
- 'unknown': '❓'
66
- }.get(connection_speed, '❓')
67
-
68
- st.metric(
69
- "Connection Speed",
70
- f"{connection_emoji} {connection_speed.upper()}",
71
- help="Detected connection speed"
72
- )
73
-
74
- with col2:
75
- optimization_level = stats.get('optimization_level', 'default')
76
- st.metric(
77
- "Optimization Level",
78
- optimization_level.title(),
79
- help="Current optimization level applied"
80
- )
81
-
82
- with col3:
83
- performance_metrics = st.session_state.get('performance_metrics', {})
84
- avg_load_time = sum(performance_metrics.values()) / len(performance_metrics) if performance_metrics else 0
85
-
86
- st.metric(
87
- "Avg Load Time",
88
- f"{avg_load_time:.2f}s",
89
- help="Average operation load time"
90
- )
91
-
92
- with col4:
93
- optimizations = stats.get('optimizations_applied', {})
94
- active_optimizations = sum(1 for opt in optimizations.values() if opt)
95
-
96
- st.metric(
97
- "Active Optimizations",
98
- f"{active_optimizations}/{len(optimizations)}",
99
- help="Number of active performance optimizations"
100
- )
101
-
102
- # Detailed optimization status
103
- with st.expander("🔧 Optimization Details"):
104
- optimizations = stats.get('optimizations_applied', {})
105
-
106
- col1, col2 = st.columns(2)
107
-
108
- with col1:
109
- st.write("**Active Optimizations:**")
110
- for opt_name, is_active in optimizations.items():
111
- status = "✅" if is_active else "❌"
112
- st.write(f"{status} {opt_name.replace('_', ' ').title()}")
113
-
114
- with col2:
115
- st.write("**Performance Metrics:**")
116
- performance_metrics = st.session_state.get('performance_metrics', {})
117
- for operation, duration in performance_metrics.items():
118
- st.write(f"⏱️ {operation}: {duration:.3f}s")
119
-
120
- def _render_performance_trends(self):
121
- """Render performance trends over time"""
122
- st.subheader("📈 Performance Trends")
123
-
124
- performance_history = st.session_state.get('performance_history', [])
125
-
126
- if not performance_history:
127
- st.info("No performance data available yet. Use the app to generate performance metrics.")
128
- return
129
-
130
- # Create DataFrame from history
131
- df = pd.DataFrame(performance_history)
132
-
133
- if len(df) > 0:
134
- # Performance over time chart
135
- fig = go.Figure()
136
-
137
- for metric in df.columns:
138
- if metric != 'timestamp':
139
- fig.add_trace(go.Scatter(
140
- x=df['timestamp'],
141
- y=df[metric],
142
- mode='lines+markers',
143
- name=metric.replace('_', ' ').title(),
144
- line=dict(width=2)
145
- ))
146
-
147
- fig.update_layout(
148
- title="Performance Metrics Over Time",
149
- xaxis_title="Time",
150
- yaxis_title="Duration (seconds)",
151
- hovermode='x unified',
152
- height=400
153
- )
154
-
155
- st.plotly_chart(fig, use_container_width=True)
156
-
157
- # Performance statistics
158
- col1, col2 = st.columns(2)
159
-
160
- with col1:
161
- st.write("**Performance Statistics:**")
162
- for metric in df.columns:
163
- if metric != 'timestamp':
164
- avg_val = df[metric].mean()
165
- max_val = df[metric].max()
166
- min_val = df[metric].min()
167
-
168
- st.write(f"**{metric.replace('_', ' ').title()}:**")
169
- st.write(f" - Average: {avg_val:.3f}s")
170
- st.write(f" - Max: {max_val:.3f}s")
171
- st.write(f" - Min: {min_val:.3f}s")
172
-
173
- with col2:
174
- # Performance distribution
175
- if len(df) > 1:
176
- metric_to_plot = st.selectbox(
177
- "Select metric for distribution:",
178
- [col for col in df.columns if col != 'timestamp']
179
- )
180
-
181
- if metric_to_plot:
182
- fig_hist = px.histogram(
183
- df,
184
- x=metric_to_plot,
185
- title=f"Distribution of {metric_to_plot.replace('_', ' ').title()}",
186
- nbins=20
187
- )
188
- fig_hist.update_layout(height=300)
189
- st.plotly_chart(fig_hist, use_container_width=True)
190
-
191
- def _render_connection_analysis(self):
192
- """Render connection quality analysis"""
193
- st.subheader("📡 Connection Analysis")
194
-
195
- connection_history = st.session_state.get('connection_history', [])
196
-
197
- if not connection_history:
198
- st.info("No connection data available yet.")
199
- return
200
-
201
- # Connection speed distribution
202
- connection_df = pd.DataFrame(connection_history)
203
-
204
- if len(connection_df) > 0:
205
- col1, col2 = st.columns(2)
206
-
207
- with col1:
208
- # Connection speed pie chart
209
- speed_counts = connection_df['speed'].value_counts()
210
-
211
- fig_pie = px.pie(
212
- values=speed_counts.values,
213
- names=speed_counts.index,
214
- title="Connection Speed Distribution"
215
- )
216
- st.plotly_chart(fig_pie, use_container_width=True)
217
-
218
- with col2:
219
- # Connection quality over time
220
- fig_line = px.line(
221
- connection_df,
222
- x='timestamp',
223
- y='quality_score',
224
- title="Connection Quality Over Time",
225
- markers=True
226
- )
227
- fig_line.update_layout(height=300)
228
- st.plotly_chart(fig_line, use_container_width=True)
229
-
230
- # Connection statistics
231
- st.write("**Connection Statistics:**")
232
- avg_quality = connection_df['quality_score'].mean()
233
- connection_stability = connection_df['speed'].nunique()
234
-
235
- col1, col2, col3 = st.columns(3)
236
-
237
- with col1:
238
- st.metric("Average Quality", f"{avg_quality:.1f}/10")
239
-
240
- with col2:
241
- st.metric("Connection Changes", connection_stability)
242
-
243
- with col3:
244
- current_speed = connection_df['speed'].iloc[-1] if len(connection_df) > 0 else 'unknown'
245
- st.metric("Current Speed", current_speed.upper())
246
-
247
- def _render_optimization_recommendations(self):
248
- """Render optimization recommendations"""
249
- st.subheader("💡 Optimization Recommendations")
250
-
251
- stats = self.optimizer.get_optimization_stats()
252
- connection_speed = stats.get('connection_speed', 'unknown')
253
- performance_metrics = st.session_state.get('performance_metrics', {})
254
-
255
- recommendations = []
256
-
257
- # Connection-based recommendations
258
- if connection_speed in ['slow_2g', '2g']:
259
- recommendations.extend([
260
- "🔧 **Enable Aggressive Image Compression**: Reduce image quality to 30-50% for faster loading",
261
- "📱 **Use Offline Mode**: Work offline when connection is very slow",
262
- "⚡ **Minimize Uploads**: Upload smaller files or compress before uploading",
263
- "🎯 **Focus on Text**: Prioritize text-based activities over image-heavy ones"
264
- ])
265
- elif connection_speed == '3g':
266
- recommendations.extend([
267
- "🖼️ **Moderate Image Optimization**: Balance quality and speed",
268
- "📊 **Lazy Load Content**: Load content progressively",
269
- "🔄 **Enable Sync**: Use sync features when connection improves"
270
- ])
271
-
272
- # Performance-based recommendations
273
- if performance_metrics:
274
- slow_operations = [op for op, duration in performance_metrics.items() if duration > 2.0]
275
- if slow_operations:
276
- recommendations.append(f"⚠️ **Optimize Slow Operations**: {', '.join(slow_operations)} are taking longer than expected")
277
-
278
- # General recommendations
279
- recommendations.extend([
280
- "💾 **Clear Cache**: Clear browser cache if experiencing issues",
281
- "🔄 **Restart App**: Refresh the page to reset optimizations",
282
- "📊 **Monitor Usage**: Check this dashboard regularly for performance insights"
283
- ])
284
-
285
- if recommendations:
286
- for rec in recommendations:
287
- st.markdown(rec)
288
- else:
289
- st.success("✅ Performance is optimal! No recommendations at this time.")
290
-
291
- def _render_performance_settings(self):
292
- """Render performance settings and controls"""
293
- st.subheader("⚙️ Performance Settings")
294
-
295
- col1, col2 = st.columns(2)
296
-
297
- with col1:
298
- st.write("**Manual Optimization Controls:**")
299
-
300
- # Force optimization level
301
- optimization_levels = ['auto', 'minimal', 'moderate', 'aggressive']
302
- current_level = st.session_state.get('manual_optimization_level', 'auto')
303
-
304
- new_level = st.selectbox(
305
- "Force Optimization Level:",
306
- optimization_levels,
307
- index=optimization_levels.index(current_level),
308
- help="Override automatic optimization detection"
309
- )
310
-
311
- if new_level != current_level:
312
- st.session_state.manual_optimization_level = new_level
313
- st.success(f"Optimization level set to: {new_level}")
314
-
315
- # Image quality override
316
- quality_levels = {
317
- 'Auto': None,
318
- 'High (85%)': 85,
319
- 'Medium (70%)': 70,
320
- 'Low (50%)': 50,
321
- 'Very Low (30%)': 30
322
- }
323
-
324
- quality_choice = st.selectbox(
325
- "Image Quality Override:",
326
- list(quality_levels.keys()),
327
- help="Override automatic image quality optimization"
328
- )
329
-
330
- if quality_levels[quality_choice] is not None:
331
- st.session_state.manual_image_quality = quality_levels[quality_choice]
332
-
333
- with col2:
334
- st.write("**Performance Actions:**")
335
-
336
- # Clear performance data
337
- if st.button("🗑️ Clear Performance Data"):
338
- st.session_state.performance_history = []
339
- st.session_state.connection_history = []
340
- st.session_state.performance_metrics = {}
341
- st.success("Performance data cleared!")
342
- st.rerun()
343
-
344
- # Export performance data
345
- if st.button("📊 Export Performance Data"):
346
- self._export_performance_data()
347
-
348
- # Reset optimizations
349
- if st.button("🔄 Reset Optimizations"):
350
- st.session_state.performance_initialized = False
351
- st.session_state.manual_optimization_level = 'auto'
352
- if 'manual_image_quality' in st.session_state:
353
- del st.session_state.manual_image_quality
354
- st.success("Optimizations reset!")
355
- st.rerun()
356
-
357
- # Performance test
358
- if st.button("🧪 Run Performance Test"):
359
- self._run_performance_test()
360
-
361
- def _export_performance_data(self):
362
- """Export performance data as JSON"""
363
- import json
364
-
365
- export_data = {
366
- 'performance_history': st.session_state.get('performance_history', []),
367
- 'connection_history': st.session_state.get('connection_history', []),
368
- 'performance_metrics': st.session_state.get('performance_metrics', {}),
369
- 'optimization_stats': self.optimizer.get_optimization_stats(),
370
- 'export_timestamp': datetime.now().isoformat()
371
- }
372
-
373
- json_str = json.dumps(export_data, indent=2, default=str)
374
-
375
- st.download_button(
376
- label="📥 Download Performance Data",
377
- data=json_str,
378
- file_name=f"performance_data_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json",
379
- mime="application/json"
380
- )
381
-
382
- def _run_performance_test(self):
383
- """Run a simple performance test"""
384
- import time
385
-
386
- with st.spinner("Running performance test..."):
387
- # Test various operations
388
- test_results = {}
389
-
390
- # Test image optimization
391
- start_time = time.time()
392
- from PIL import Image
393
- test_image = Image.new('RGB', (800, 600), color='red')
394
- self.optimizer.optimize_image(test_image, 'default')
395
- test_results['image_optimization'] = time.time() - start_time
396
-
397
- # Test JSON compression
398
- start_time = time.time()
399
- test_data = {'test': 'data' * 1000}
400
- self.optimizer.compress_json_data(test_data)
401
- test_results['json_compression'] = time.time() - start_time
402
-
403
- # Test lazy loading
404
- start_time = time.time()
405
- test_content = list(range(100))
406
- self.optimizer.lazy_load_content(test_content)
407
- test_results['lazy_loading'] = time.time() - start_time
408
-
409
- # Display results
410
- st.success("Performance test completed!")
411
-
412
- col1, col2, col3 = st.columns(3)
413
-
414
- with col1:
415
- st.metric("Image Optimization", f"{test_results['image_optimization']:.3f}s")
416
-
417
- with col2:
418
- st.metric("JSON Compression", f"{test_results['json_compression']:.3f}s")
419
-
420
- with col3:
421
- st.metric("Lazy Loading", f"{test_results['lazy_loading']:.3f}s")
422
-
423
- # Store test results
424
- if 'performance_metrics' not in st.session_state:
425
- st.session_state.performance_metrics = {}
426
-
427
- st.session_state.performance_metrics.update(test_results)
428
-
429
- def record_performance_metric(self, operation: str, duration: float):
430
- """Record a performance metric"""
431
- # Store in current metrics
432
- if 'performance_metrics' not in st.session_state:
433
- st.session_state.performance_metrics = {}
434
-
435
- st.session_state.performance_metrics[operation] = duration
436
-
437
- # Add to history
438
- if 'performance_history' not in st.session_state:
439
- st.session_state.performance_history = []
440
-
441
- # Create history entry
442
- history_entry = {
443
- 'timestamp': datetime.now(),
444
- operation: duration
445
- }
446
-
447
- st.session_state.performance_history.append(history_entry)
448
-
449
- # Keep only last 100 entries
450
- if len(st.session_state.performance_history) > 100:
451
- st.session_state.performance_history = st.session_state.performance_history[-100:]
452
-
453
- def record_connection_quality(self, speed: str, quality_score: float):
454
- """Record connection quality measurement"""
455
- if 'connection_history' not in st.session_state:
456
- st.session_state.connection_history = []
457
-
458
- connection_entry = {
459
- 'timestamp': datetime.now(),
460
- 'speed': speed,
461
- 'quality_score': quality_score
462
- }
463
-
464
- st.session_state.connection_history.append(connection_entry)
465
-
466
- # Keep only last 50 entries
467
- if len(st.session_state.connection_history) > 50:
468
- st.session_state.connection_history = st.session_state.connection_history[-50:]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/utils/performance_optimizer.py DELETED
@@ -1,716 +0,0 @@
1
- """
2
- Performance optimization utilities for low-bandwidth environments
3
- """
4
-
5
- import streamlit as st
6
- from typing import Dict, Any, Tuple, List
7
- import base64
8
- import io
9
- from PIL import Image
10
- import logging
11
- import time
12
- import gzip
13
- import json
14
-
15
- from corpus_collection_engine.config import PWA_CONFIG
16
-
17
-
18
- class PerformanceOptimizer:
19
- """Utilities for optimizing performance in low-bandwidth environments"""
20
-
21
- def __init__(self):
22
- self.logger = logging.getLogger(__name__)
23
- self.config = PWA_CONFIG
24
-
25
- # Performance thresholds
26
- self.bandwidth_thresholds = {
27
- 'slow_2g': 0.05, # 50 Kbps
28
- '2g': 0.25, # 250 Kbps
29
- '3g': 1.5, # 1.5 Mbps
30
- '4g': 10.0 # 10 Mbps
31
- }
32
-
33
- # Optimization settings
34
- self.optimization_settings = {
35
- 'image_quality': {
36
- 'slow_2g': 30,
37
- '2g': 50,
38
- '3g': 70,
39
- '4g': 85,
40
- 'default': 85
41
- },
42
- 'image_max_size': {
43
- 'slow_2g': (400, 300),
44
- '2g': (600, 450),
45
- '3g': (800, 600),
46
- '4g': (1200, 900),
47
- 'default': (800, 600)
48
- },
49
- 'lazy_loading_threshold': {
50
- 'slow_2g': 1,
51
- '2g': 3,
52
- '3g': 5,
53
- '4g': 10,
54
- 'default': 5
55
- }
56
- }
57
-
58
- # Initialize performance state
59
- if 'performance_initialized' not in st.session_state:
60
- st.session_state.performance_initialized = False
61
- st.session_state.connection_speed = 'unknown'
62
- st.session_state.optimization_level = 'default'
63
-
64
- def initialize_performance_optimization(self):
65
- """Initialize performance optimization"""
66
- if st.session_state.performance_initialized:
67
- return
68
-
69
- try:
70
- # Inject performance monitoring and optimization scripts
71
- self._inject_performance_monitoring()
72
-
73
- # Apply initial optimizations
74
- self._apply_initial_optimizations()
75
-
76
- st.session_state.performance_initialized = True
77
- self.logger.info("Performance optimization initialized")
78
-
79
- except Exception as e:
80
- self.logger.error(f"Performance optimization initialization failed: {e}")
81
-
82
- def _inject_performance_monitoring(self):
83
- """Inject performance monitoring scripts"""
84
-
85
- monitoring_script = """
86
- <script>
87
- // Connection speed detection
88
- function detectConnectionSpeed() {
89
- if ('connection' in navigator) {
90
- const connection = navigator.connection || navigator.mozConnection || navigator.webkitConnection;
91
-
92
- const effectiveType = connection.effectiveType;
93
- const downlink = connection.downlink; // Mbps
94
-
95
- console.log('Connection detected:', effectiveType, downlink + ' Mbps');
96
-
97
- // Send to Streamlit
98
- window.parent.postMessage({
99
- type: 'CONNECTION_SPEED',
100
- effectiveType: effectiveType,
101
- downlink: downlink
102
- }, '*');
103
-
104
- // Apply optimizations based on connection
105
- applyConnectionOptimizations(effectiveType, downlink);
106
- } else {
107
- console.log('Network Information API not supported');
108
- // Fallback speed test
109
- performSpeedTest();
110
- }
111
- }
112
-
113
- // Simple speed test fallback
114
- function performSpeedTest() {
115
- const startTime = performance.now();
116
- const testImage = new Image();
117
-
118
- testImage.onload = function() {
119
- const endTime = performance.now();
120
- const duration = endTime - startTime;
121
- const imageSize = 50000; // Approximate size in bytes
122
- const speed = (imageSize * 8) / (duration / 1000) / 1000000; // Mbps
123
-
124
- let effectiveType = 'unknown';
125
- if (speed < 0.1) effectiveType = 'slow-2g';
126
- else if (speed < 0.5) effectiveType = '2g';
127
- else if (speed < 2) effectiveType = '3g';
128
- else effectiveType = '4g';
129
-
130
- console.log('Speed test result:', speed.toFixed(2) + ' Mbps', effectiveType);
131
-
132
- window.parent.postMessage({
133
- type: 'CONNECTION_SPEED',
134
- effectiveType: effectiveType,
135
- downlink: speed
136
- }, '*');
137
-
138
- applyConnectionOptimizations(effectiveType, speed);
139
- };
140
-
141
- testImage.onerror = function() {
142
- console.log('Speed test failed, assuming slow connection');
143
- applyConnectionOptimizations('2g', 0.25);
144
- };
145
-
146
- testImage.src = 'data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/2wBDAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/wAARCAABAAEDASIAAhEBAxEB/8QAFQABAQAAAAAAAAAAAAAAAAAAAAv/xAAUEAEAAAAAAAAAAAAAAAAAAAAA/8QAFQEBAQAAAAAAAAAAAAAAAAAAAAX/xAAUEQEAAAAAAAAAAAAAAAAAAAAA/9oADAMBAAIRAxEAPwA/8A';
147
- }
148
-
149
- // Apply optimizations based on connection speed
150
- function applyConnectionOptimizations(effectiveType, downlink) {
151
- const body = document.body;
152
-
153
- // Add connection class to body
154
- body.className = body.className.replace(/connection-\\w+/g, '');
155
- body.classList.add('connection-' + effectiveType);
156
-
157
- // Apply specific optimizations
158
- if (effectiveType === 'slow-2g' || effectiveType === '2g') {
159
- // Aggressive optimizations for slow connections
160
- applySlowConnectionOptimizations();
161
- } else if (effectiveType === '3g') {
162
- // Moderate optimizations
163
- applyModerateOptimizations();
164
- } else {
165
- // Minimal optimizations for fast connections
166
- applyMinimalOptimizations();
167
- }
168
- }
169
-
170
- function applySlowConnectionOptimizations() {
171
- console.log('Applying slow connection optimizations');
172
-
173
- // Reduce image quality
174
- const images = document.querySelectorAll('img');
175
- images.forEach(img => {
176
- if (!img.dataset.optimized) {
177
- img.style.filter = 'blur(0.5px)';
178
- img.loading = 'lazy';
179
- img.dataset.optimized = 'true';
180
- }
181
- });
182
-
183
- // Disable animations
184
- const style = document.createElement('style');
185
- style.textContent = `
186
- *, *::before, *::after {
187
- animation-duration: 0.01ms !important;
188
- animation-iteration-count: 1 !important;
189
- transition-duration: 0.01ms !important;
190
- }
191
- .stSpinner > div {
192
- display: none !important;
193
- }
194
- `;
195
- document.head.appendChild(style);
196
-
197
- // Compress text rendering
198
- document.body.style.textRendering = 'optimizeSpeed';
199
- document.body.style.fontDisplay = 'swap';
200
- }
201
-
202
- function applyModerateOptimizations() {
203
- console.log('Applying moderate optimizations');
204
-
205
- // Lazy load images
206
- const images = document.querySelectorAll('img');
207
- images.forEach(img => {
208
- img.loading = 'lazy';
209
- });
210
-
211
- // Reduce animation duration
212
- const style = document.createElement('style');
213
- style.textContent = `
214
- * {
215
- animation-duration: 0.3s !important;
216
- transition-duration: 0.2s !important;
217
- }
218
- `;
219
- document.head.appendChild(style);
220
- }
221
-
222
- function applyMinimalOptimizations() {
223
- console.log('Applying minimal optimizations');
224
-
225
- // Just enable lazy loading
226
- const images = document.querySelectorAll('img');
227
- images.forEach(img => {
228
- img.loading = 'lazy';
229
- });
230
- }
231
-
232
- // Monitor performance
233
- function monitorPerformance() {
234
- if ('performance' in window) {
235
- const navigation = performance.getEntriesByType('navigation')[0];
236
- if (navigation) {
237
- const loadTime = navigation.loadEventEnd - navigation.fetchStart;
238
- console.log('Page load time:', loadTime + 'ms');
239
-
240
- if (loadTime > 5000) {
241
- console.log('Slow page load detected, applying additional optimizations');
242
- applySlowConnectionOptimizations();
243
- }
244
- }
245
- }
246
- }
247
-
248
- // Initialize monitoring
249
- detectConnectionSpeed();
250
-
251
- // Monitor performance after page load
252
- window.addEventListener('load', monitorPerformance);
253
-
254
- // Re-check connection periodically
255
- setInterval(detectConnectionSpeed, 60000); // Every minute
256
- </script>
257
-
258
- <style>
259
- /* Base optimizations for all connections */
260
- img {
261
- max-width: 100%;
262
- height: auto;
263
- loading: lazy;
264
- }
265
-
266
- /* Slow connection optimizations */
267
- .connection-slow-2g img,
268
- .connection-2g img {
269
- max-height: 300px;
270
- object-fit: cover;
271
- filter: blur(0.5px);
272
- }
273
-
274
- .connection-slow-2g .stImage,
275
- .connection-2g .stImage {
276
- max-height: 300px;
277
- }
278
-
279
- /* Disable heavy animations on slow connections */
280
- .connection-slow-2g *,
281
- .connection-2g * {
282
- animation-duration: 0.01ms !important;
283
- transition-duration: 0.01ms !important;
284
- }
285
-
286
- /* Optimize text rendering */
287
- .connection-slow-2g,
288
- .connection-2g {
289
- text-rendering: optimizeSpeed;
290
- font-display: swap;
291
- }
292
-
293
- /* Progressive enhancement for faster connections */
294
- .connection-4g .stImage img {
295
- transition: transform 0.3s ease;
296
- }
297
-
298
- .connection-4g .stImage img:hover {
299
- transform: scale(1.02);
300
- }
301
-
302
- /* Loading indicators for slow connections */
303
- .connection-slow-2g .stSpinner,
304
- .connection-2g .stSpinner {
305
- display: none !important;
306
- }
307
-
308
- /* Bandwidth indicator */
309
- .bandwidth-indicator {
310
- position: fixed;
311
- top: 10px;
312
- right: 10px;
313
- background: rgba(0, 0, 0, 0.7);
314
- color: white;
315
- padding: 4px 8px;
316
- border-radius: 4px;
317
- font-size: 12px;
318
- z-index: 9999;
319
- display: none;
320
- }
321
-
322
- .connection-slow-2g .bandwidth-indicator,
323
- .connection-2g .bandwidth-indicator {
324
- display: block;
325
- background: #ff4444;
326
- }
327
-
328
- .connection-3g .bandwidth-indicator {
329
- display: block;
330
- background: #ff9800;
331
- }
332
- </style>
333
-
334
- <div class="bandwidth-indicator" id="bandwidth-indicator">
335
- 📡 Optimizing for your connection...
336
- </div>
337
- """
338
-
339
- st.components.v1.html(monitoring_script, height=0)
340
-
341
- def _apply_initial_optimizations(self):
342
- """Apply initial performance optimizations"""
343
-
344
- # Streamlit-specific optimizations
345
- optimization_css = """
346
- <style>
347
- /* Streamlit performance optimizations */
348
- .stApp {
349
- max-width: 1200px;
350
- margin: 0 auto;
351
- }
352
-
353
- /* Optimize form rendering */
354
- .stForm {
355
- border: none;
356
- padding: 0;
357
- }
358
-
359
- /* Optimize button rendering */
360
- .stButton > button {
361
- transition: background-color 0.1s ease;
362
- }
363
-
364
- /* Optimize text area rendering */
365
- .stTextArea textarea {
366
- resize: vertical;
367
- }
368
-
369
- /* Optimize file uploader */
370
- .stFileUploader {
371
- border: 2px dashed #ccc;
372
- border-radius: 8px;
373
- padding: 20px;
374
- text-align: center;
375
- }
376
-
377
- /* Optimize metrics display */
378
- .stMetric {
379
- background: #f8f9fa;
380
- padding: 12px;
381
- border-radius: 6px;
382
- border: 1px solid #e9ecef;
383
- }
384
-
385
- /* Optimize expander */
386
- .streamlit-expanderHeader {
387
- font-weight: 600;
388
- }
389
-
390
- /* Optimize columns */
391
- .stColumn {
392
- padding: 0 8px;
393
- }
394
-
395
- /* Optimize sidebar */
396
- .stSidebar {
397
- background: #f8f9fa;
398
- }
399
-
400
- /* Loading optimizations */
401
- .stSpinner {
402
- text-align: center;
403
- padding: 20px;
404
- }
405
-
406
- /* Mobile optimizations */
407
- @media (max-width: 768px) {
408
- .stApp {
409
- padding: 1rem 0.5rem;
410
- }
411
-
412
- .stColumn {
413
- padding: 0 4px;
414
- }
415
-
416
- .stButton > button {
417
- width: 100%;
418
- margin: 4px 0;
419
- }
420
- }
421
- </style>
422
- """
423
-
424
- st.components.v1.html(optimization_css, height=0)
425
-
426
- def optimize_image(self, image: Image.Image, connection_speed: str = 'default') -> Image.Image:
427
- """Optimize image based on connection speed"""
428
- try:
429
- # Get optimization settings for connection speed
430
- quality = self.optimization_settings['image_quality'].get(connection_speed, 85)
431
- max_size = self.optimization_settings['image_max_size'].get(connection_speed, (800, 600))
432
-
433
- # Create a copy to avoid modifying original
434
- optimized_image = image.copy()
435
-
436
- # Resize if necessary
437
- if optimized_image.size[0] > max_size[0] or optimized_image.size[1] > max_size[1]:
438
- optimized_image.thumbnail(max_size, Image.Resampling.LANCZOS)
439
-
440
- # Convert to RGB if necessary (for JPEG compression)
441
- if optimized_image.mode in ('RGBA', 'LA', 'P'):
442
- # Create white background
443
- background = Image.new('RGB', optimized_image.size, (255, 255, 255))
444
- if optimized_image.mode == 'P':
445
- optimized_image = optimized_image.convert('RGBA')
446
- background.paste(optimized_image, mask=optimized_image.split()[-1] if optimized_image.mode == 'RGBA' else None)
447
- optimized_image = background
448
-
449
- return optimized_image
450
-
451
- except Exception as e:
452
- self.logger.error(f"Error optimizing image: {e}")
453
- return image
454
-
455
- def compress_image_to_base64(self, image: Image.Image, connection_speed: str = 'default') -> str:
456
- """Compress image to base64 with connection-appropriate quality"""
457
- try:
458
- # Optimize image first
459
- optimized_image = self.optimize_image(image, connection_speed)
460
-
461
- # Get quality setting
462
- quality = self.optimization_settings['image_quality'].get(connection_speed, 85)
463
-
464
- # Compress to bytes
465
- buffer = io.BytesIO()
466
- optimized_image.save(buffer, format="JPEG", quality=quality, optimize=True)
467
-
468
- # Convert to base64
469
- img_bytes = buffer.getvalue()
470
- img_base64 = base64.b64encode(img_bytes).decode()
471
-
472
- # Log compression results
473
- original_size = len(base64.b64encode(self._image_to_bytes(image)).decode())
474
- compressed_size = len(img_base64)
475
- compression_ratio = (1 - compressed_size / original_size) * 100 if original_size > 0 else 0
476
-
477
- self.logger.info(f"Image compressed: {original_size} -> {compressed_size} bytes ({compression_ratio:.1f}% reduction)")
478
-
479
- return img_base64
480
-
481
- except Exception as e:
482
- self.logger.error(f"Error compressing image: {e}")
483
- # Fallback to basic conversion
484
- return self._image_to_base64_basic(image)
485
-
486
- def _image_to_bytes(self, image: Image.Image) -> bytes:
487
- """Convert image to bytes"""
488
- buffer = io.BytesIO()
489
- image.save(buffer, format="PNG")
490
- return buffer.getvalue()
491
-
492
- def _image_to_base64_basic(self, image: Image.Image) -> str:
493
- """Basic image to base64 conversion without optimization"""
494
- buffer = io.BytesIO()
495
- image.save(buffer, format="JPEG", quality=85)
496
- return base64.b64encode(buffer.getvalue()).decode()
497
-
498
- def compress_json_data(self, data: Dict[str, Any]) -> str:
499
- """Compress JSON data for transmission"""
500
- try:
501
- # Convert to JSON string
502
- json_str = json.dumps(data, separators=(',', ':'), ensure_ascii=False)
503
-
504
- # Compress with gzip
505
- compressed = gzip.compress(json_str.encode('utf-8'))
506
-
507
- # Convert to base64 for transmission
508
- compressed_b64 = base64.b64encode(compressed).decode()
509
-
510
- # Log compression results
511
- original_size = len(json_str.encode('utf-8'))
512
- compressed_size = len(compressed)
513
- compression_ratio = (1 - compressed_size / original_size) * 100 if original_size > 0 else 0
514
-
515
- self.logger.info(f"JSON compressed: {original_size} -> {compressed_size} bytes ({compression_ratio:.1f}% reduction)")
516
-
517
- return compressed_b64
518
-
519
- except Exception as e:
520
- self.logger.error(f"Error compressing JSON data: {e}")
521
- return json.dumps(data)
522
-
523
- def decompress_json_data(self, compressed_data: str) -> Dict[str, Any]:
524
- """Decompress JSON data"""
525
- try:
526
- # Decode from base64
527
- compressed_bytes = base64.b64decode(compressed_data)
528
-
529
- # Decompress
530
- decompressed_bytes = gzip.decompress(compressed_bytes)
531
-
532
- # Parse JSON
533
- json_str = decompressed_bytes.decode('utf-8')
534
- return json.loads(json_str)
535
-
536
- except Exception as e:
537
- self.logger.error(f"Error decompressing JSON data: {e}")
538
- # Fallback to direct JSON parsing
539
- try:
540
- return json.loads(compressed_data)
541
- except Exception:
542
- return {}
543
-
544
- def render_performance_indicator(self):
545
- """Render performance and connection indicator"""
546
- connection_speed = st.session_state.get('connection_speed', 'unknown')
547
-
548
- if connection_speed in ['slow_2g', '2g']:
549
- st.info("📡 Slow connection detected. App optimized for your bandwidth.")
550
- elif connection_speed == '3g':
551
- st.info("📡 Moderate connection detected. Some optimizations applied.")
552
-
553
- # Performance tips for slow connections
554
- if connection_speed in ['slow_2g', '2g']:
555
- with st.expander("💡 Tips for Better Performance"):
556
- st.markdown("""
557
- **Optimizations Applied:**
558
- - Images automatically compressed
559
- - Animations disabled
560
- - Lazy loading enabled
561
- - Text rendering optimized
562
-
563
- **Tips for Better Experience:**
564
- - Use WiFi when available
565
- - Close other apps/tabs
566
- - Upload smaller images when possible
567
- - Work offline when connection is very slow
568
- """)
569
-
570
- def lazy_load_content(self, content_list: List[Any], page_size: int = None) -> Tuple[List[Any], bool]:
571
- """Implement lazy loading for content lists"""
572
- if not content_list:
573
- return [], False
574
-
575
- # Determine page size based on connection speed
576
- connection_speed = st.session_state.get('connection_speed', 'default')
577
- if page_size is None:
578
- page_size = self.optimization_settings['lazy_loading_threshold'].get(connection_speed, 5)
579
-
580
- # Get current page from session state
581
- page_key = f"lazy_load_page_{id(content_list)}"
582
- current_page = st.session_state.get(page_key, 0)
583
-
584
- # Calculate slice
585
- start_idx = current_page * page_size
586
- end_idx = start_idx + page_size
587
-
588
- # Get current page content
589
- current_content = content_list[start_idx:end_idx]
590
- has_more = end_idx < len(content_list)
591
-
592
- # Load more button
593
- if has_more:
594
- if st.button(f"📄 Load More ({len(content_list) - end_idx} remaining)", key=f"load_more_{id(content_list)}"):
595
- st.session_state[page_key] = current_page + 1
596
- st.rerun()
597
-
598
- return current_content, has_more
599
-
600
- def optimize_streamlit_config(self):
601
- """Apply Streamlit-specific optimizations"""
602
-
603
- # Inject Streamlit optimizations
604
- streamlit_optimizations = """
605
- <script>
606
- // Optimize Streamlit rendering
607
- function optimizeStreamlit() {
608
- // Disable unnecessary Streamlit features for performance
609
- const style = document.createElement('style');
610
- style.textContent = `
611
- /* Hide Streamlit branding for performance */
612
- .stDeployButton {
613
- display: none;
614
- }
615
-
616
- /* Optimize form rendering */
617
- .stForm {
618
- border: none;
619
- box-shadow: none;
620
- }
621
-
622
- /* Optimize button hover effects */
623
- .stButton > button:hover {
624
- transform: none;
625
- box-shadow: none;
626
- }
627
-
628
- /* Optimize text input focus */
629
- .stTextInput > div > div > input:focus {
630
- box-shadow: 0 0 0 1px #FF6B35;
631
- }
632
-
633
- /* Optimize selectbox */
634
- .stSelectbox > div > div {
635
- border-radius: 4px;
636
- }
637
-
638
- /* Optimize progress bars */
639
- .stProgress > div > div {
640
- transition: width 0.1s ease;
641
- }
642
- `;
643
- document.head.appendChild(style);
644
- }
645
-
646
- // Apply optimizations when DOM is ready
647
- if (document.readyState === 'loading') {
648
- document.addEventListener('DOMContentLoaded', optimizeStreamlit);
649
- } else {
650
- optimizeStreamlit();
651
- }
652
-
653
- // Re-apply optimizations when Streamlit updates the page
654
- const observer = new MutationObserver(function(mutations) {
655
- let shouldOptimize = false;
656
- mutations.forEach(function(mutation) {
657
- if (mutation.type === 'childList' && mutation.addedNodes.length > 0) {
658
- shouldOptimize = true;
659
- }
660
- });
661
-
662
- if (shouldOptimize) {
663
- setTimeout(optimizeStreamlit, 100);
664
- }
665
- });
666
-
667
- observer.observe(document.body, {
668
- childList: true,
669
- subtree: true
670
- });
671
- </script>
672
- """
673
-
674
- st.components.v1.html(streamlit_optimizations, height=0)
675
-
676
- def get_optimization_stats(self) -> Dict[str, Any]:
677
- """Get current optimization statistics"""
678
- return {
679
- 'connection_speed': st.session_state.get('connection_speed', 'unknown'),
680
- 'optimization_level': st.session_state.get('optimization_level', 'default'),
681
- 'performance_initialized': st.session_state.get('performance_initialized', False),
682
- 'optimizations_applied': {
683
- 'image_compression': True,
684
- 'lazy_loading': True,
685
- 'animation_reduction': st.session_state.get('connection_speed') in ['slow_2g', '2g'],
686
- 'text_optimization': True
687
- }
688
- }
689
-
690
- def measure_performance(self, operation_name: str):
691
- """Context manager for measuring operation performance"""
692
- return PerformanceMeasurement(operation_name, self.logger)
693
-
694
-
695
- class PerformanceMeasurement:
696
- """Context manager for measuring performance"""
697
-
698
- def __init__(self, operation_name: str, logger):
699
- self.operation_name = operation_name
700
- self.logger = logger
701
- self.start_time = None
702
-
703
- def __enter__(self):
704
- self.start_time = time.time()
705
- return self
706
-
707
- def __exit__(self, exc_type, exc_val, exc_tb):
708
- if self.start_time:
709
- duration = time.time() - self.start_time
710
- self.logger.info(f"Performance: {self.operation_name} took {duration:.3f}s")
711
-
712
- # Store in session state for analytics
713
- if 'performance_metrics' not in st.session_state:
714
- st.session_state.performance_metrics = {}
715
-
716
- st.session_state.performance_metrics[self.operation_name] = duration
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/corpus_collection_engine/utils/session_manager.py DELETED
@@ -1,482 +0,0 @@
1
- """
2
- Session management utilities for the Corpus Collection Engine
3
- """
4
-
5
- import streamlit as st
6
- import uuid
7
- import json
8
- from datetime import datetime, timedelta
9
- from typing import Dict, Any, Optional, List
10
- import logging
11
-
12
- from corpus_collection_engine.models.data_models import UserContribution, ActivityType
13
- from corpus_collection_engine.services.storage_service import StorageService
14
- from corpus_collection_engine.utils.error_handler import global_error_handler, ErrorCategory, ErrorSeverity
15
-
16
-
17
- class SessionManager:
18
- """Manages user sessions and state across the application"""
19
-
20
- def __init__(self):
21
- self.logger = logging.getLogger(__name__)
22
- self.storage_service = StorageService()
23
- self._initialize_session()
24
-
25
- def _initialize_session(self):
26
- """Initialize session state variables"""
27
- # Core session data
28
- if 'session_id' not in st.session_state:
29
- st.session_state.session_id = str(uuid.uuid4())
30
-
31
- if 'user_id' not in st.session_state:
32
- st.session_state.user_id = f"user_{str(uuid.uuid4())[:8]}"
33
-
34
- if 'session_start_time' not in st.session_state:
35
- st.session_state.session_start_time = datetime.now()
36
-
37
- # User preferences
38
- if 'user_preferences' not in st.session_state:
39
- st.session_state.user_preferences = {
40
- 'preferred_language': 'en',
41
- 'preferred_region': None,
42
- 'theme': 'light',
43
- 'notifications_enabled': True,
44
- 'auto_save': True
45
- }
46
-
47
- # Activity tracking
48
- if 'activity_history' not in st.session_state:
49
- st.session_state.activity_history = []
50
-
51
- if 'current_activity_start' not in st.session_state:
52
- st.session_state.current_activity_start = None
53
-
54
- # Contribution tracking
55
- if 'session_contributions' not in st.session_state:
56
- st.session_state.session_contributions = []
57
-
58
- if 'total_session_contributions' not in st.session_state:
59
- st.session_state.total_session_contributions = 0
60
-
61
- # Progress tracking
62
- if 'session_progress' not in st.session_state:
63
- st.session_state.session_progress = {
64
- 'activities_started': set(),
65
- 'activities_completed': set(),
66
- 'languages_used': set(),
67
- 'regions_contributed': set(),
68
- 'achievements_unlocked': set(),
69
- 'streak_days': 0,
70
- 'total_time_spent': timedelta()
71
- }
72
-
73
- # Application state
74
- if 'app_state' not in st.session_state:
75
- st.session_state.app_state = {
76
- 'privacy_consent_given': False,
77
- 'onboarding_completed': False,
78
- 'tutorial_completed': False,
79
- 'first_contribution_made': False,
80
- 'feedback_given': False
81
- }
82
-
83
- # Error and performance tracking
84
- if 'session_errors' not in st.session_state:
85
- st.session_state.session_errors = []
86
-
87
- if 'performance_metrics' not in st.session_state:
88
- st.session_state.performance_metrics = {}
89
-
90
- def get_session_id(self) -> str:
91
- """Get current session ID"""
92
- return st.session_state.session_id
93
-
94
- def get_user_id(self) -> str:
95
- """Get current user ID"""
96
- return st.session_state.user_id
97
-
98
- def start_activity(self, activity_type: ActivityType) -> None:
99
- """Record the start of an activity"""
100
- try:
101
- start_time = datetime.now()
102
- st.session_state.current_activity_start = start_time
103
- st.session_state.session_progress['activities_started'].add(activity_type.value)
104
-
105
- # Add to activity history
106
- activity_record = {
107
- 'activity_type': activity_type.value,
108
- 'start_time': start_time,
109
- 'end_time': None,
110
- 'duration': None,
111
- 'completed': False,
112
- 'contributions_made': 0
113
- }
114
-
115
- st.session_state.activity_history.append(activity_record)
116
-
117
- self.logger.info(f"Activity started: {activity_type.value}")
118
-
119
- except Exception as e:
120
- global_error_handler.handle_error(
121
- e,
122
- ErrorCategory.SYSTEM,
123
- ErrorSeverity.LOW,
124
- context={'component': 'session_manager', 'action': 'start_activity'}
125
- )
126
-
127
- def complete_activity(self, activity_type: ActivityType, contributions_made: int = 0) -> None:
128
- """Record the completion of an activity"""
129
- try:
130
- end_time = datetime.now()
131
- st.session_state.session_progress['activities_completed'].add(activity_type.value)
132
-
133
- # Update the most recent activity record
134
- if st.session_state.activity_history:
135
- last_activity = st.session_state.activity_history[-1]
136
- if last_activity['activity_type'] == activity_type.value and not last_activity['completed']:
137
- last_activity['end_time'] = end_time
138
- last_activity['completed'] = True
139
- last_activity['contributions_made'] = contributions_made
140
-
141
- if st.session_state.current_activity_start:
142
- duration = end_time - st.session_state.current_activity_start
143
- last_activity['duration'] = duration
144
- st.session_state.session_progress['total_time_spent'] += duration
145
-
146
- st.session_state.current_activity_start = None
147
-
148
- # Check for achievements
149
- self._check_achievements()
150
-
151
- self.logger.info(f"Activity completed: {activity_type.value}")
152
-
153
- except Exception as e:
154
- global_error_handler.handle_error(
155
- e,
156
- ErrorCategory.SYSTEM,
157
- ErrorSeverity.LOW,
158
- context={'component': 'session_manager', 'action': 'complete_activity'}
159
- )
160
-
161
- def record_contribution(self, contribution: UserContribution) -> None:
162
- """Record a user contribution in the session"""
163
- try:
164
- # Add to session contributions
165
- contribution_record = {
166
- 'id': contribution.contribution_id,
167
- 'activity_type': contribution.activity_type,
168
- 'language': contribution.language,
169
- 'region': contribution.region,
170
- 'timestamp': contribution.timestamp,
171
- 'content_type': contribution.content_type
172
- }
173
-
174
- st.session_state.session_contributions.append(contribution_record)
175
- st.session_state.total_session_contributions += 1
176
-
177
- # Update progress tracking
178
- if contribution.language:
179
- st.session_state.session_progress['languages_used'].add(contribution.language)
180
-
181
- if contribution.region:
182
- st.session_state.session_progress['regions_contributed'].add(contribution.region)
183
-
184
- # Mark first contribution milestone
185
- if not st.session_state.app_state['first_contribution_made']:
186
- st.session_state.app_state['first_contribution_made'] = True
187
- self._unlock_achievement('first_contribution')
188
-
189
- # Check for other achievements
190
- self._check_achievements()
191
-
192
- self.logger.info(f"Contribution recorded: {contribution.contribution_id}")
193
-
194
- except Exception as e:
195
- global_error_handler.handle_error(
196
- e,
197
- ErrorCategory.SYSTEM,
198
- ErrorSeverity.MEDIUM,
199
- context={'component': 'session_manager', 'action': 'record_contribution'}
200
- )
201
-
202
- def update_user_preferences(self, preferences: Dict[str, Any]) -> None:
203
- """Update user preferences"""
204
- try:
205
- st.session_state.user_preferences.update(preferences)
206
-
207
- # Save preferences to storage for persistence
208
- self.storage_service.save_user_preferences(
209
- st.session_state.user_id,
210
- st.session_state.user_preferences
211
- )
212
-
213
- self.logger.info("User preferences updated")
214
-
215
- except Exception as e:
216
- global_error_handler.handle_error(
217
- e,
218
- ErrorCategory.SYSTEM,
219
- ErrorSeverity.LOW,
220
- context={'component': 'session_manager', 'action': 'update_preferences'}
221
- )
222
-
223
- def get_session_summary(self) -> Dict[str, Any]:
224
- """Get a summary of the current session"""
225
- try:
226
- current_time = datetime.now()
227
- session_duration = current_time - st.session_state.session_start_time
228
-
229
- # Calculate active time (time spent in activities)
230
- active_time = st.session_state.session_progress['total_time_spent']
231
- if st.session_state.current_activity_start:
232
- active_time += current_time - st.session_state.current_activity_start
233
-
234
- return {
235
- 'session_id': st.session_state.session_id,
236
- 'user_id': st.session_state.user_id,
237
- 'session_duration': session_duration,
238
- 'active_time': active_time,
239
- 'activities_started': len(st.session_state.session_progress['activities_started']),
240
- 'activities_completed': len(st.session_state.session_progress['activities_completed']),
241
- 'total_contributions': st.session_state.total_session_contributions,
242
- 'languages_used': list(st.session_state.session_progress['languages_used']),
243
- 'regions_contributed': list(st.session_state.session_progress['regions_contributed']),
244
- 'achievements_unlocked': list(st.session_state.session_progress['achievements_unlocked']),
245
- 'completion_rate': self._calculate_completion_rate(),
246
- 'engagement_score': self._calculate_engagement_score()
247
- }
248
-
249
- except Exception as e:
250
- global_error_handler.handle_error(
251
- e,
252
- ErrorCategory.SYSTEM,
253
- ErrorSeverity.LOW,
254
- context={'component': 'session_manager', 'action': 'get_summary'}
255
- )
256
- return {}
257
-
258
- def _calculate_completion_rate(self) -> float:
259
- """Calculate activity completion rate"""
260
- started = len(st.session_state.session_progress['activities_started'])
261
- completed = len(st.session_state.session_progress['activities_completed'])
262
-
263
- if started == 0:
264
- return 0.0
265
-
266
- return (completed / started) * 100
267
-
268
- def _calculate_engagement_score(self) -> float:
269
- """Calculate user engagement score"""
270
- try:
271
- score = 0.0
272
-
273
- # Base score for participation
274
- score += min(st.session_state.total_session_contributions * 10, 50)
275
-
276
- # Bonus for activity diversity
277
- activities_tried = len(st.session_state.session_progress['activities_started'])
278
- score += min(activities_tried * 15, 60)
279
-
280
- # Bonus for language diversity
281
- languages_used = len(st.session_state.session_progress['languages_used'])
282
- score += min(languages_used * 10, 30)
283
-
284
- # Bonus for completion rate
285
- completion_rate = self._calculate_completion_rate()
286
- score += completion_rate * 0.6
287
-
288
- # Bonus for achievements
289
- achievements = len(st.session_state.session_progress['achievements_unlocked'])
290
- score += min(achievements * 5, 25)
291
-
292
- return min(score, 100.0) # Cap at 100
293
-
294
- except Exception:
295
- return 0.0
296
-
297
- def _check_achievements(self) -> None:
298
- """Check and unlock achievements based on current progress"""
299
- try:
300
- progress = st.session_state.session_progress
301
-
302
- # Contribution milestones
303
- contributions = st.session_state.total_session_contributions
304
- if contributions >= 5 and 'contributor' not in progress['achievements_unlocked']:
305
- self._unlock_achievement('contributor')
306
-
307
- if contributions >= 10 and 'active_contributor' not in progress['achievements_unlocked']:
308
- self._unlock_achievement('active_contributor')
309
-
310
- if contributions >= 25 and 'super_contributor' not in progress['achievements_unlocked']:
311
- self._unlock_achievement('super_contributor')
312
-
313
- # Activity diversity
314
- activities_completed = len(progress['activities_completed'])
315
- if activities_completed >= 2 and 'explorer' not in progress['achievements_unlocked']:
316
- self._unlock_achievement('explorer')
317
-
318
- if activities_completed >= 4 and 'cultural_ambassador' not in progress['achievements_unlocked']:
319
- self._unlock_achievement('cultural_ambassador')
320
-
321
- # Language diversity
322
- languages_used = len(progress['languages_used'])
323
- if languages_used >= 2 and 'polyglot' not in progress['achievements_unlocked']:
324
- self._unlock_achievement('polyglot')
325
-
326
- if languages_used >= 3 and 'language_champion' not in progress['achievements_unlocked']:
327
- self._unlock_achievement('language_champion')
328
-
329
- # Regional diversity
330
- regions = len(progress['regions_contributed'])
331
- if regions >= 2 and 'regional_expert' not in progress['achievements_unlocked']:
332
- self._unlock_achievement('regional_expert')
333
-
334
- except Exception as e:
335
- self.logger.error(f"Error checking achievements: {e}")
336
-
337
- def _unlock_achievement(self, achievement_id: str) -> None:
338
- """Unlock an achievement"""
339
- try:
340
- st.session_state.session_progress['achievements_unlocked'].add(achievement_id)
341
-
342
- # Show achievement notification
343
- achievement_info = self._get_achievement_info(achievement_id)
344
- st.success(f"🏆 Achievement Unlocked: {achievement_info['title']}!")
345
- st.info(achievement_info['description'])
346
-
347
- self.logger.info(f"Achievement unlocked: {achievement_id}")
348
-
349
- except Exception as e:
350
- self.logger.error(f"Error unlocking achievement {achievement_id}: {e}")
351
-
352
- def _get_achievement_info(self, achievement_id: str) -> Dict[str, str]:
353
- """Get information about an achievement"""
354
- achievements = {
355
- 'first_contribution': {
356
- 'title': 'First Steps',
357
- 'description': 'Made your first contribution to preserving Indian culture!'
358
- },
359
- 'contributor': {
360
- 'title': 'Contributor',
361
- 'description': 'Made 5 contributions - you\'re making a difference!'
362
- },
363
- 'active_contributor': {
364
- 'title': 'Active Contributor',
365
- 'description': 'Made 10 contributions - your dedication is inspiring!'
366
- },
367
- 'super_contributor': {
368
- 'title': 'Super Contributor',
369
- 'description': 'Made 25 contributions - you\'re a cultural preservation hero!'
370
- },
371
- 'explorer': {
372
- 'title': 'Cultural Explorer',
373
- 'description': 'Completed 2 different activities - exploring our rich heritage!'
374
- },
375
- 'cultural_ambassador': {
376
- 'title': 'Cultural Ambassador',
377
- 'description': 'Completed all activities - you\'re a true cultural ambassador!'
378
- },
379
- 'polyglot': {
380
- 'title': 'Polyglot',
381
- 'description': 'Contributed in 2 languages - celebrating linguistic diversity!'
382
- },
383
- 'language_champion': {
384
- 'title': 'Language Champion',
385
- 'description': 'Contributed in 3+ languages - preserving multilingual heritage!'
386
- },
387
- 'regional_expert': {
388
- 'title': 'Regional Expert',
389
- 'description': 'Contributed from multiple regions - showcasing India\'s diversity!'
390
- }
391
- }
392
-
393
- return achievements.get(achievement_id, {
394
- 'title': 'Unknown Achievement',
395
- 'description': 'You\'ve accomplished something special!'
396
- })
397
-
398
- def save_session_data(self) -> None:
399
- """Save session data to persistent storage"""
400
- try:
401
- session_data = {
402
- 'session_id': st.session_state.session_id,
403
- 'user_id': st.session_state.user_id,
404
- 'session_start_time': st.session_state.session_start_time.isoformat(),
405
- 'user_preferences': st.session_state.user_preferences,
406
- 'session_progress': {
407
- 'activities_started': list(st.session_state.session_progress['activities_started']),
408
- 'activities_completed': list(st.session_state.session_progress['activities_completed']),
409
- 'languages_used': list(st.session_state.session_progress['languages_used']),
410
- 'regions_contributed': list(st.session_state.session_progress['regions_contributed']),
411
- 'achievements_unlocked': list(st.session_state.session_progress['achievements_unlocked']),
412
- 'total_time_spent': str(st.session_state.session_progress['total_time_spent'])
413
- },
414
- 'app_state': st.session_state.app_state,
415
- 'total_contributions': st.session_state.total_session_contributions
416
- }
417
-
418
- self.storage_service.save_session_data(session_data)
419
- self.logger.info("Session data saved successfully")
420
-
421
- except Exception as e:
422
- global_error_handler.handle_error(
423
- e,
424
- ErrorCategory.STORAGE,
425
- ErrorSeverity.MEDIUM,
426
- context={'component': 'session_manager', 'action': 'save_session'}
427
- )
428
-
429
- def load_session_data(self, session_id: str) -> bool:
430
- """Load session data from persistent storage"""
431
- try:
432
- session_data = self.storage_service.load_session_data(session_id)
433
-
434
- if session_data:
435
- # Restore session state
436
- st.session_state.session_id = session_data['session_id']
437
- st.session_state.user_id = session_data['user_id']
438
- st.session_state.session_start_time = datetime.fromisoformat(session_data['session_start_time'])
439
- st.session_state.user_preferences = session_data['user_preferences']
440
- st.session_state.app_state = session_data['app_state']
441
- st.session_state.total_session_contributions = session_data.get('total_contributions', 0)
442
-
443
- # Restore progress (convert lists back to sets)
444
- progress_data = session_data['session_progress']
445
- st.session_state.session_progress.update({
446
- 'activities_started': set(progress_data['activities_started']),
447
- 'activities_completed': set(progress_data['activities_completed']),
448
- 'languages_used': set(progress_data['languages_used']),
449
- 'regions_contributed': set(progress_data['regions_contributed']),
450
- 'achievements_unlocked': set(progress_data['achievements_unlocked'])
451
- })
452
-
453
- self.logger.info(f"Session data loaded successfully: {session_id}")
454
- return True
455
-
456
- return False
457
-
458
- except Exception as e:
459
- global_error_handler.handle_error(
460
- e,
461
- ErrorCategory.STORAGE,
462
- ErrorSeverity.MEDIUM,
463
- context={'component': 'session_manager', 'action': 'load_session'}
464
- )
465
- return False
466
-
467
- def cleanup_session(self) -> None:
468
- """Clean up session data on exit"""
469
- try:
470
- # Save final session data
471
- self.save_session_data()
472
-
473
- # Log session summary
474
- summary = self.get_session_summary()
475
- self.logger.info(f"Session ended: {json.dumps(summary, default=str)}")
476
-
477
- except Exception as e:
478
- self.logger.error(f"Error during session cleanup: {e}")
479
-
480
-
481
- # Global session manager instance
482
- session_manager = SessionManager()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
intern_project/data/corpus_collection.db DELETED
Binary file (53.2 kB)
 
intern_project/main.py DELETED
@@ -1,6 +0,0 @@
1
- def main():
2
- print("Hello from intern-project!")
3
-
4
-
5
- if __name__ == "__main__":
6
- main()