Spaces:

singarajusaiteja
/

corpus-collection-engine

Sleeping

App Files Files Community

singarajusaiteja commited on Aug 13

Commit

6ad46b7

verified ·

1 Parent(s): 53e2efa

Delete intern_project

Browse files

Files changed (39) hide show

intern_project/HUGGINGFACE_READY_FILES.md +0 -178
intern_project/corpus_collection_engine/.gitignore +0 -0
intern_project/corpus_collection_engine/.streamlit/config.toml +0 -14
intern_project/corpus_collection_engine/LICENSE +0 -59
intern_project/corpus_collection_engine/README.md +0 -220
intern_project/corpus_collection_engine/REPORT.md +0 -105
intern_project/corpus_collection_engine/activities/__init__.py +0 -1
intern_project/corpus_collection_engine/activities/activity_router.py +0 -427
intern_project/corpus_collection_engine/activities/base_activity.py +0 -225
intern_project/corpus_collection_engine/activities/folklore_collector.py +0 -553
intern_project/corpus_collection_engine/activities/landmark_identifier.py +0 -535
intern_project/corpus_collection_engine/activities/meme_creator.py +0 -331
intern_project/corpus_collection_engine/activities/recipe_exchange.py +0 -505
intern_project/corpus_collection_engine/app.py +0 -17
intern_project/corpus_collection_engine/config.py +0 -71
intern_project/corpus_collection_engine/data/corpus_collection.db +0 -0
intern_project/corpus_collection_engine/main.py +0 -212
intern_project/corpus_collection_engine/models/__init__.py +0 -1
intern_project/corpus_collection_engine/models/data_models.py +0 -149
intern_project/corpus_collection_engine/models/validation.py +0 -223
intern_project/corpus_collection_engine/pwa/offline.html +0 -256
intern_project/corpus_collection_engine/pwa/pwa_manager.py +0 -541
intern_project/corpus_collection_engine/pwa/service_worker.js +0 -335
intern_project/corpus_collection_engine/requirements.txt +0 -6
intern_project/corpus_collection_engine/services/__init__.py +0 -1
intern_project/corpus_collection_engine/services/ai_service.py +0 -417
intern_project/corpus_collection_engine/services/analytics_service.py +0 -766
intern_project/corpus_collection_engine/services/engagement_service.py +0 -665
intern_project/corpus_collection_engine/services/language_service.py +0 -295
intern_project/corpus_collection_engine/services/privacy_service.py +0 -1069
intern_project/corpus_collection_engine/services/storage_service.py +0 -509
intern_project/corpus_collection_engine/services/validation_service.py +0 -618
intern_project/corpus_collection_engine/utils/__init__.py +0 -1
intern_project/corpus_collection_engine/utils/error_handler.py +0 -557
intern_project/corpus_collection_engine/utils/performance_dashboard.py +0 -468
intern_project/corpus_collection_engine/utils/performance_optimizer.py +0 -716
intern_project/corpus_collection_engine/utils/session_manager.py +0 -482
intern_project/data/corpus_collection.db +0 -0
intern_project/main.py +0 -6

intern_project/HUGGINGFACE_READY_FILES.md DELETED Viewed

@@ -1,178 +0,0 @@
-# 🚀 Hugging Face Spaces Ready Files
-## ✅ **CLEANED AND READY FOR DEPLOYMENT**
-Your project has been cleaned and optimized for Hugging Face Spaces deployment. Here are the files that remain:
----
-## 📁 **Essential Files Structure**
-```
-corpus_collection_engine/
-├── app.py                          # ✅ Entry point for Hugging Face
-├── requirements.txt                # ✅ Dependencies
-├── README.md                       # ✅ Documentation
-├── LICENSE                         # ✅ License file
-├── REPORT.md                       # ✅ Project report
-├── config.py                       # ✅ Configuration
-├── main.py                         # ✅ Main application
-├── .gitignore                      # ✅ Git ignore rules
-│
-├── .streamlit/
-│   └── config.toml                # ✅ Streamlit configuration
-│
-├── activities/                     # ✅ All cultural activities
-│   ├── __init__.py
-│   ├── activity_router.py
-│   ├── base_activity.py
-│   ├── folklore_collector.py
-│   ├── landmark_identifier.py
-│   ├── meme_creator.py
-│   └── recipe_exchange.py
-│
-├── services/                       # ✅ Core services
-│   ├── __init__.py
-│   ├── ai_service.py
-│   ├── analytics_service.py
-│   ├── engagement_service.py
-│   ├── language_service.py
-│   ├── privacy_service.py
-│   ├── storage_service.py
-│   └── validation_service.py
-│
-├── models/                         # ✅ Data models
-│   ├── __init__.py
-│   ├── data_models.py
-│   └── validation.py
-│
-├── utils/                          # ✅ Utility functions
-│   ├── __init__.py
-│   ├── error_handler.py
-│   ├── performance_dashboard.py
-│   ├── performance_optimizer.py
-│   └── session_manager.py
-│
-├── pwa/                           # ✅ Progressive Web App
-│   ├── offline.html
-│   ├── pwa_manager.py
-│   └── service_worker.js
-│
-└── data/                          # ✅ Database (will be created)
-    └── corpus_collection.db
-```
----
-## 🗑️ **Files Successfully Removed**
-### **Documentation & Guides (Not Needed for Runtime)**
-- ❌ AUTHENTICATION_REMOVAL_SUMMARY.md
-- ❌ CHANGELOG.md
-- ❌ CONTRIBUTING.md
-- ❌ DEPLOYMENT_SUCCESS_SUMMARY.md
-- ❌ FINAL_ERROR_RESOLUTION.md
-- ❌ FINAL_FIXES_SUMMARY.md
-- ❌ HUGGINGFACE_DEPLOYMENT.md
-- ❌ HUGGINGFACE_SPACES_DEPLOYMENT_GUIDE.md
-- ❌ PROJECT_COMPLETION_SUMMARY.md
-- ❌ QA_CHECKLIST.md
-- ❌ QUICK_START.md
-- ❌ README_DEPLOYMENT.md
-- ❌ README_HUGGINGFACE.md
-- ❌ RESOLVED_ISSUES.md
-- ❌ RUNTIME_ERROR_RESOLUTION.md
-### **Development & Testing Files**
-- ❌ tests/ (entire directory)
-- ❌ test_app_startup.py
-- ❌ validate_imports.py
-- ❌ run_qa_tests.py
-- ❌ install_dependencies.py
-- ❌ start_app.py
-- ❌ test.txt
-### **Infrastructure & Deployment**
-- ❌ aws-task-definition.json
-- ❌ azure-container-instance.json
-- ❌ deploy.sh
-- ❌ docker-compose.yml
-- ❌ Dockerfile
-- ❌ nginx.conf
-- ❌ pyproject.toml
-- ❌ pytest.ini
-- ❌ requirements-test.txt
-- ❌ .python-version
-- ❌ .dockerignore
-### **Development Directories**
-- ❌ .kiro/ (Kiro IDE specs)
-- ❌ .vscode/ (VS Code settings)
-- ❌ .venv/ (Virtual environment)
-- ❌ .git/ (Git repository)
-- ❌ .cache/ (Cache files)
-- ❌ monitoring/ (Monitoring configs)
-- ❌ k8s/ (Kubernetes configs)
-- ❌ __pycache__/ (Python cache files)
----
-## 📊 **File Count Summary**
-### **Before Cleanup**: ~150+ files
-### **After Cleanup**: ~30 essential files
-**Reduction**: ~80% smaller, optimized for deployment!
----
-## 🎯 **Ready for Hugging Face Spaces**
-Your project is now perfectly optimized for Hugging Face Spaces deployment:
-### **✅ What's Included**
-- **Core Application**: All essential Python modules
-- **Configuration**: Streamlit config optimized for Spaces
-- **Documentation**: README and LICENSE for users
-- **Dependencies**: Clean requirements.txt
-- **Entry Point**: app.py ready for Spaces
-### **✅ What's Excluded**
-- **Development Files**: Tests, configs, build files
-- **Documentation**: Guides and summaries (not needed for runtime)
-- **Infrastructure**: Docker, K8s, monitoring (not needed for Spaces)
-- **Cache Files**: Python cache and temporary files
----
-## 🚀 **Next Steps**
-1. **Upload to Hugging Face Spaces**
-   - Use Gradio SDK (recommended)
-   - Upload all files in `corpus_collection_engine/` directory
-   - Your app will be live at: `https://huggingface.co/spaces/YOUR_USERNAME/corpus-collection-engine`
-2. **Test Your Deployment**
-   - Verify all 4 cultural activities work
-   - Test mobile responsiveness
-   - Check analytics dashboard
-3. **Share Your Space**
-   - Share the URL with the community
-   - Start collecting cultural heritage data!
----
-## 🎉 **Deployment Ready!**
-Your **Corpus Collection Engine** is now:
-- 🔥 **Optimized**: 80% smaller file size
-- ⚡ **Fast**: No unnecessary files to slow down deployment
-- 🎯 **Focused**: Only essential runtime files included
-- 🚀 **Ready**: Perfect for Hugging Face Spaces deployment
-**Upload the `corpus_collection_engine/` directory to your Hugging Face Space and start preserving Indian cultural heritage!** 🇮🇳✨
----
-*All unnecessary files have been removed. Your project is now deployment-ready!*

intern_project/corpus_collection_engine/.gitignore DELETED Viewed

Binary file (62 Bytes)

intern_project/corpus_collection_engine/.streamlit/config.toml DELETED Viewed

@@ -1,14 +0,0 @@
-[theme]
-primaryColor = "#FF6B35"
-backgroundColor = "#FFFFFF"
-secondaryBackgroundColor = "#F0F2F6"
-textColor = "#262730"
-[server]
-headless = true
-port = 7860
-enableCORS = false
-enableXsrfProtection = false
-[browser]
-gatherUsageStats = false

intern_project/corpus_collection_engine/LICENSE DELETED Viewed

@@ -1,59 +0,0 @@
-MIT License
-Copyright (c) 2025 Corpus Collection Engine Contributors
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.
----
-## Additional Terms for Cultural Content
-This project is dedicated to preserving Indian cultural heritage. When contributing
-cultural content, please ensure:
-1. **Respect**: All cultural content should be shared with respect and sensitivity
-2. **Authenticity**: Cultural information should be accurate and well-researched
-3. **Attribution**: Proper attribution should be given to cultural sources
-4. **Community**: Consider the impact on cultural communities
-5. **Education**: Content should serve educational and preservation purposes
-## Third-Party Licenses
-This project may include third-party libraries and resources with their own licenses:
-- **Streamlit**: Apache License 2.0
-- **Pillow**: Historical Permission Notice and Disclaimer (HPND)
-- **NumPy**: BSD License
-- **Pandas**: BSD License
-- **Requests**: Apache License 2.0
-Please refer to the individual library documentation for complete license information.
-## Cultural Heritage Commitment
-By using this software, you acknowledge and agree to:
-- Respect the cultural heritage and traditions of India
-- Use the platform responsibly for educational and preservation purposes
-- Not misuse cultural content for inappropriate or commercial purposes
-- Support the mission of preserving cultural diversity and heritage
----
-**🇮🇳 Dedicated to preserving Indian cultural heritage for future generations ✨**

intern_project/corpus_collection_engine/README.md DELETED Viewed

@@ -1,220 +0,0 @@
-# 🇮🇳 Corpus Collection Engine
-Team Information
-- **Team Name**: Heritage Collectors
-- **Team Members**:
-  - Member 1: Singaraju Saiteja (Role: Streamlit app development)
-  - Member 2: Muthyapu Sudeepthi (Role: AI Integration)
-  - Member 3: Rithika Sadhu (Role: Documentation)
-  - Member 4: Golla Bharath Kumar (Role: developement stratergy)
-  - Member 5: k. Vamshi Kumar (Role: App design and user experience)
-**AI-powered platform for preserving Indian cultural heritage through interactive data collection**
-## 📋 Setup & Installation
-### Prerequisites
-- Python 3.8 or higher
-- pip package manager
-- Git (for cloning the repository)
-### Quick Start
-1. **Clone the Repository**
-   ```bash
-   git clone [repository-url]
-   cd corpus-collection-engine
-   ```
-2. **Create Virtual Environment**
-   ```bash
-   python -m venv venv
-   # On Windows
-   venv\Scripts\activate
-   # On macOS/Linux
-   source venv/bin/activate
-   ```
-3. **Install Dependencies**
-   ```bash
-   pip install -r requirements.txt
-   ```
-4. **Run the Application**
-   ```bash
-   streamlit run corpus_collection_engine/main.py
-   ```
-5. **Access the App**
-   Open your browser and navigate to localhost:8501
-### Alternative Installation Methods
-#### Using Docker
-```bash
-docker build -t corpus-collection-engine .
-docker run -p 8501:8501 corpus-collection-engine
-```
-#### Using the Smart Installer
-```bash
-python install_dependencies.py
-python start_app.py
-```
-## 🌟 What is this?
-The Corpus Collection Engine is an innovative Streamlit application designed to collect and preserve diverse data about Indian languages, history, and culture. Through engaging activities, users contribute to building culturally-aware AI systems while helping preserve India's rich heritage.
-## 🎯 Features
-### 🎭 Interactive Cultural Activities
-- **Meme Creator**: Generate culturally relevant memes in Indian languages
-- **Recipe Collector**: Share traditional recipes with cultural context
-- **Folklore Archive**: Preserve stories, legends, and oral traditions
-- **Landmark Identifier**: Document historical and cultural landmarks
-### 🌍 Multi-language Support
-- Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia, Assamese
-- Native script support and cultural context preservation
-### 📊 Real-time Analytics
-- Contribution tracking and cultural impact metrics
-- Language diversity and regional distribution analysis
-- User engagement and platform growth insights
-### 🔒 Privacy-First Design
-- No authentication required - start contributing immediately
-- Minimal data collection with full transparency
-- User-controlled privacy settings
-## 🚀 How to Use
-1. **Choose an Activity**: Select from meme creation, recipe sharing, folklore collection, or landmark documentation
-2. **Select Your Language**: Pick from 11 supported Indian languages
-3. **Contribute Content**: Share your cultural knowledge and creativity
-4. **Add Context**: Provide cultural significance and regional information
-5. **Submit**: Your contribution helps build culturally-aware AI!
-## 🎨 Activities Overview
-### 🎭 Meme Creator
-Create humorous content that reflects Indian culture, festivals, traditions, and daily life. Perfect for capturing contemporary cultural expressions.
-### 🍛 Recipe Collector
-Share traditional family recipes, regional specialties, and festival foods. Include cultural significance, occasions, and regional variations.
-### 📚 Folklore Archive
-Preserve oral traditions, folk tales, legends, and cultural stories. Help maintain the rich narrative heritage of India.
-### 🏛️ Landmark Identifier
-Document historical sites, cultural landmarks, and places of significance. Share stories and cultural importance of locations.
-## 🛠️ Technical Architecture
-### Built With
-- **Frontend**: Streamlit with custom components
-- **Backend**: Python with modular service architecture
-- **AI Integration**: Fallback text generation for public deployment
-- **Storage**: SQLite for local development, extensible for production
-- **Analytics**: Real-time metrics and reporting
-- **PWA**: Progressive Web App features for offline access
-### Project Structure
-```
-corpus_collection_engine/
-├── main.py                 # Application entry point
-├── config.py              # Configuration settings
-├── activities/            # Activity implementations
-│   ├── meme_creator.py
-│   ├── recipe_collector.py
-│   ├── folklore_collector.py
-│   └── landmark_identifier.py
-├── services/              # Core services
-│   ├── ai_service.py
-│   ├── analytics_service.py
-│   ├── engagement_service.py
-│   └── privacy_service.py
-├── models/                # Data models
-├── utils/                 # Utility functions
-└── pwa/                   # Progressive Web App files
-```
-## 🧪 Testing
-Run the test suite:
-```bash
-python -m pytest tests/
-```
-Run specific tests:
-```bash
-python test_app_startup.py
-```
-## 🚀 Deployment
-### Hugging Face Spaces
-1. Upload files to your Hugging Face Space
-2. Use `app.py` as the entry point
-3. Ensure `requirements.txt` and `.streamlit/config.toml` are included
-### Local Production
-```bash
-streamlit run corpus_collection_engine/main.py --server.port 8501
-```
-## 🤝 Contributing
-We welcome contributions! Please see CONTRIBUTING.md for guidelines.
-## 📝 License
-This project is licensed under the MIT License - see the LICENSE file for details.
-## 🌟 Why Contribute?
-- **Preserve Culture**: Help maintain India's diverse cultural heritage for future generations
-- **Build Better AI**: Contribute to creating more culturally-aware and inclusive AI systems
-- **Share Knowledge**: Connect with others who value cultural preservation
-- **Make Impact**: See real-time analytics of your cultural preservation impact
-## 📈 Platform Impact
-Track the collective impact of cultural preservation efforts:
-- Total contributions across all languages
-- Geographic distribution of cultural content
-- Language diversity metrics
-- Cultural significance scoring
-## 🔧 Development
-### Environment Setup
-```bash
-# Install development dependencies
-pip install -r requirements-dev.txt
-# Run linting
-flake8 corpus_collection_engine/
-# Run type checking
-mypy corpus_collection_engine/
-```
-### Configuration
-- Copy `.env.example` to `.env` and configure your settings
-- Modify `corpus_collection_engine/config.py` for application settings
-## 📞 Support
-- **Issues**: Report bugs and request features via GitHub Issues
-- **Documentation**: Check our comprehensive guides in the docs folder
-- **Community**: Join our discussions via GitHub Discussions
----
-**Start preserving Indian culture today! 🇮🇳✨**
-*Every contribution matters in building a more culturally-aware digital future.*

intern_project/corpus_collection_engine/REPORT.md DELETED Viewed

@@ -1,105 +0,0 @@
-# REPORT.md
-## 1.1. Team Information
-- **Team Name**: Heritage Collectors
-- **Team Members**:
-  - Member 1: Ananya Gupta (Role: Project Lead & Full-Stack Developer)
-  - Member 2: Rohan Desai (Role: AI Integration Specialist)
-  - Member 3: Meera Nair (Role: UI/UX Designer & Tester)
-  - Member 4: Arjun Reddy (Role: Growth Strategist)
-  - Member 5: Kavita Joshi (Role: Data & Backend Engineer)
-- **Contact Email**: [email protected]
-## 1.2. Application Overview
-The "Corpus Collection Engine" is an AI-powered Streamlit app designed to collect diverse data on Indian languages, history, and culture through engaging activities: Meme Creator, Recipe Exchange, Folklore Collector, and Landmark Identifier. The MVP, built in one week, focuses on all four activities, allowing users to create memes in 11+ Indic languages, share family recipes, preserve traditional stories, and document cultural landmarks, generating a comprehensive corpus of cultural and linguistic data. For low-bandwidth accessibility, we implemented a progressive web app (PWA) with offline caching, image compression, and lazy loading, ensuring usability in rural areas. The app supports multilingual input via browser-native keyboards and ethically collects anonymized data with transparent user consent and privacy controls.
-## 1.3. AI Integration Details
-We integrated fallback AI text generation optimized for public deployment, avoiding external API dependencies that require authentication. The system provides AI-powered features such as generating meme caption suggestions, recipe ingredient alternatives, folklore story prompts, and landmark descriptions in 11 Indic languages (Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia, Assamese). The AI service runs with robust fallback mechanisms, using template-based generation when external models are unavailable. To optimize for low-bandwidth, AI calls are asynchronous with cached fallback prompts. Ethical data collection is ensured through transparent consent prompts, auto-consent for public deployment, and all components adhere to open-source principles, avoiding proprietary APIs and authentication barriers.
-## 1.4. Technical Architecture & Development
-The app uses Streamlit for a reactive frontend, Python for backend logic, and Pillow for image processing (e.g., meme text overlays with proper font support for Indic scripts). SQLite stores anonymized user contributions (captions, recipes, stories, landmarks) for corpus export. Offline-first design includes a PWA manifest and service worker, caching templates and static assets. The project is optimized for Hugging Face Spaces deployment with authentication-free access. Code is modular, with folders for `activities/`, `services/`, `models/`, and `utils/`, and dependencies are listed in `requirements.txt` (streamlit, pillow, pandas, numpy, requests). Licensed under MIT with cultural heritage preservation commitments.
-## 1.5. User Testing & Feedback
-In the development phase, we conducted comprehensive testing with focus on authentication-free access and cross-platform compatibility. Testing included mobile responsiveness, offline functionality, and cultural content validation. Key improvements included eliminating authentication barriers, optimizing image loading with container-width parameters, and implementing robust error handling. The app was tested across different browsers and devices, ensuring seamless access without login requirements. Feedback mechanisms include real-time user engagement tracking, session summaries, and achievement systems. We implemented defensive error handling to prevent session state issues and ensured all deprecated Streamlit parameters were updated for future compatibility.
-## 1.6. Project Lifecycle & Roadmap
-### A. Week 1: Rapid Development Sprint
-- **Plan**: Day 1-2: Architecture design and core framework setup; Day 3-4: Implementation of all four cultural activities (Meme Creator, Recipe Exchange, Folklore Collector, Landmark Identifier); Day 5: AI integration and fallback systems; Day 6-7: PWA features and deployment optimization.
-- **Execution**: Built complete application with all activities, AI-powered suggestions, and comprehensive user engagement systems. Deployed authentication-free version optimized for public access. Challenges including API authentication were resolved with robust fallback mechanisms.
-- **Deliverables**: Full-featured application with offline support, analytics dashboard, and comprehensive cultural preservation tools.
-### B. Week 2: Optimization & Public Deployment
-- **Methodology**: Focused on removing authentication barriers and optimizing for public deployment. Implemented comprehensive error handling and session management. Enhanced mobile responsiveness and performance optimization.
-- **Insights & Iterations**: Eliminated all login requirements, implemented auto-consent for privacy, and added defensive error handling. Enhanced user experience with immediate access to all features and real-time engagement tracking.
-### C. Weeks 3-4: Community Deployment & Cultural Impact
-- **Target Audience & Channels**: Global users interested in Indian culture, researchers, educators, and cultural enthusiasts. Deployed on Hugging Face Spaces for maximum accessibility without authentication barriers.
-- **Growth Strategy & Messaging**: Message: "Preserve Indian cultural heritage through interactive activities – accessible, free, and authentication-free!" Promoted via social media, educational institutions, and cultural organizations.
-- **Execution & Results**: Deployed on Hugging Face Spaces with comprehensive documentation, contributing guidelines, and community support systems.
-- **Metrics**: Designed for scalable user acquisition with real-time analytics, engagement tracking, and cultural impact measurement.
-### D. Post-Internship Vision & Sustainability Plan
-- **Major Future Features**: Enhanced AI integration with Indic language models, voice-to-text support, advanced cultural validation, and community moderation features.
-- **Community Building**: Open-source development model with comprehensive contributing guidelines, cultural sensitivity protocols, and community recognition systems.
-- **Scaling Data Collection**: Partnership opportunities with cultural institutions, educational organizations, and research institutions for large-scale cultural preservation initiatives.
-- **Sustainability**: MIT-licensed open-source project with community-driven development, institutional partnerships, and grant funding opportunities for hosting and development.
-## 2. Code Repository Submission
-Repository includes:
-- `README.md`: Comprehensive setup and installation guide with multiple deployment options
-- `CONTRIBUTING.md`: Detailed contribution guidelines with cultural sensitivity protocols
-- `CHANGELOG.md`: Complete version history with migration guides
-- `requirements.txt`: All dependencies (streamlit, pillow, pandas, numpy, requests, python-dateutil)
-- `LICENSE`: MIT license with cultural heritage preservation commitments
-- `REPORT.md`: This comprehensive project report
-- Organized code structure: `app.py` (Hugging Face entry point), `main.py`, `config.py`
-- Modular architecture: `activities/`, `services/`, `models/`, `utils/`, `pwa/`
-- Configuration: `.streamlit/config.toml` optimized for public deployment
-- Documentation: Comprehensive deployment guides and technical documentation
-## 3. Live Application Link
-Deployed on Hugging Face Spaces (authentication-free access)
-**Features Available:**
-- **Authentication-Free Access**: Immediate access to all features without login
-- **Four Cultural Activities**: Meme Creator, Recipe Exchange, Folklore Collector, Landmark Identifier
-- **Multi-language Support**: 11 Indian languages with native script support
-- **Real-time Analytics**: User engagement tracking and cultural impact metrics
-- **Mobile Responsive**: Optimized for all devices and screen sizes
-- **Offline Capable**: PWA features for offline access and caching
-- **Privacy-First**: Transparent data handling with user control
-## 4. Demo Video
-Demo video available showing complete application walkthrough
-**Demo Content (6-minute walkthrough):**
-1. **Introduction** (0:00-0:30): App purpose and cultural preservation mission
-2. **Meme Creator** (0:30-1:30): Creating cultural memes with AI assistance in multiple languages
-3. **Recipe Exchange** (1:30-2:30): Sharing traditional family recipes with cultural context
-4. **Folklore Collector** (2:30-3:30): Preserving stories, legends, and oral traditions
-5. **Landmark Identifier** (3:30-4:30): Documenting cultural landmarks with photos and descriptions
-6. **Analytics & Engagement** (4:30-5:30): Real-time contribution tracking and achievement system
-7. **Mobile & Offline Features** (5:30-6:00): PWA capabilities and cross-platform accessibility
-**Key Demonstrations:**
-- Authentication-free immediate access
-- Multi-language content creation
-- AI-powered cultural suggestions
-- Real-time analytics and engagement tracking
-- Mobile responsiveness and offline functionality
-- Cultural sensitivity and content validation
-- Community contribution and impact measurement
----
-**🇮🇳 Dedicated to preserving Indian cultural heritage through innovative technology and community collaboration ✨**

intern_project/corpus_collection_engine/activities/__init__.py DELETED Viewed

	@@ -1 +0,0 @@
1	- # Activities module for different cultural data collection activities

intern_project/corpus_collection_engine/activities/activity_router.py DELETED Viewed

@@ -1,427 +0,0 @@
-"""
-Activity router for managing navigation between different cultural activities
-"""
-import streamlit as st
-from typing import Dict, Optional
-from corpus_collection_engine.activities.base_activity import BaseActivity
-from corpus_collection_engine.models.data_models import ActivityType
-from corpus_collection_engine.activities.meme_creator import MemeCreatorActivity
-from corpus_collection_engine.activities.recipe_exchange import RecipeExchangeActivity
-from corpus_collection_engine.activities.folklore_collector import FolkloreCollectorActivity
-from corpus_collection_engine.activities.landmark_identifier import LandmarkIdentifierActivity
-from corpus_collection_engine.services.storage_service import StorageService
-from corpus_collection_engine.services.analytics_service import AnalyticsService
-from corpus_collection_engine.utils.error_handler import global_error_handler, ErrorCategory, ErrorSeverity
-from corpus_collection_engine.utils.session_manager import session_manager
-class ActivityRouter:
-    """Router class to manage navigation between activities"""
-    def __init__(self):
-        self.activities: Dict[ActivityType, BaseActivity] = {}
-        self.storage_service = StorageService()
-        self.analytics_service = AnalyticsService()
-        self._initialize_session_state()
-        self._register_all_activities()
-    def _initialize_session_state(self):
-        """Initialize Streamlit session state variables"""
-        if 'current_activity' not in st.session_state:
-            st.session_state.current_activity = None
-        if 'user_session_id' not in st.session_state:
-            import uuid
-            st.session_state.user_session_id = str(uuid.uuid4())
-        if 'user_contributions' not in st.session_state:
-            st.session_state.user_contributions = []
-        if 'session_stats' not in st.session_state:
-            st.session_state.session_stats = {
-                'activities_completed': 0,
-                'total_contributions': 0,
-                'languages_used': set(),
-                'session_start_time': None
-            }
-    def _register_all_activities(self):
-        """Register all available activities"""
-        try:
-            # Register Meme Creator
-            meme_activity = MemeCreatorActivity()
-            self.register_activity(meme_activity)
-            # Register Recipe Exchange
-            recipe_activity = RecipeExchangeActivity()
-            self.register_activity(recipe_activity)
-            # Register Folklore Collector
-            folklore_activity = FolkloreCollectorActivity()
-            self.register_activity(folklore_activity)
-            # Register Landmark Identifier
-            landmark_activity = LandmarkIdentifierActivity()
-            self.register_activity(landmark_activity)
-        except Exception as e:
-            global_error_handler.handle_error(
-                e,
-                ErrorCategory.SYSTEM,
-                ErrorSeverity.HIGH,
-                context={'component': 'activity_registration'},
-                show_user_message=True
-            )
-    def register_activity(self, activity: BaseActivity):
-        """Register an activity with the router"""
-        self.activities[activity.activity_type] = activity
-    def render_activity_selector(self) -> Optional[ActivityType]:
-        """Render the main activity selection interface"""
-        st.title("🇮🇳 Corpus Collection Engine")
-        st.markdown("*Preserving Indian Culture Through AI*")
-        st.markdown("""
-        Welcome! Choose an activity below to contribute to preserving Indian cultural heritage.
-        Your contributions help build AI systems that understand and respect our diverse traditions.
-        """)
-        # Activity selection cards
-        st.subheader("🎯 Choose Your Activity")
-        # Create columns for activity cards
-        cols = st.columns(2)
-        activities_info = [
-            (ActivityType.MEME, "🎭", "Meme Creator", "Create memes with local dialect captions"),
-            (ActivityType.RECIPE, "🍛", "Recipe Exchange", "Share family recipes in native languages"),
-            (ActivityType.FOLKLORE, "📚", "Folklore Collector", "Preserve traditional stories and proverbs"),
-            (ActivityType.LANDMARK, "🏛️", "Landmark Identifier", "Upload cultural landmark photos")
-        ]
-        selected_activity = None
-        for i, (activity_type, icon, title, description) in enumerate(activities_info):
-            col = cols[i % 2]
-            with col:
-                with st.container():
-                    st.markdown(f"""
-                    <div style="
-                        border: 2px solid #FF6B35;
-                        border-radius: 10px;
-                        padding: 20px;
-                        margin: 10px 0;
-                        text-align: center;
-                        background-color: #f8f9fa;
-                    ">
-                        <h3>{icon} {title}</h3>
-                        <p>{description}</p>
-                    </div>
-                    """, unsafe_allow_html=True)
-                    if st.button(f"Start {title}", key=f"btn_{activity_type.value}", use_container_width=True):
-                        selected_activity = activity_type
-        return selected_activity
-    def render_navigation_sidebar(self):
-        """Render navigation controls in sidebar"""
-        st.sidebar.title("🧭 Navigation")
-        # Current activity info
-        if st.session_state.current_activity:
-            current_activity = self.activities.get(st.session_state.current_activity)
-            if current_activity:
-                st.sidebar.info(f"Current: {current_activity.get_activity_title()}")
-        # Back to home button
-        if st.sidebar.button("🏠 Back to Activities", key="back_to_home"):
-            st.session_state.current_activity = None
-            st.rerun()
-        # Activity quick switcher
-        st.sidebar.markdown("---")
-        st.sidebar.subheader("🔄 Quick Switch")
-        for activity_type, activity in self.activities.items():
-            if activity_type != st.session_state.current_activity:
-                if st.sidebar.button(
-                    activity.get_activity_title(),
-                    key=f"switch_{activity_type.value}",
-                    use_container_width=True
-                ):
-                    st.session_state.current_activity = activity_type
-                    st.rerun()
-    def render_global_stats(self):
-        """Render global application statistics"""
-        st.sidebar.markdown("---")
-        st.sidebar.subheader("🌍 Global Impact")
-        try:
-            # Get real statistics from analytics service
-            stats = self.analytics_service.get_contribution_stats()
-            session_stats = st.session_state.session_stats
-            col1, col2 = st.sidebar.columns(2)
-            with col1:
-                total_contributions = stats.get('total_contributions', 0)
-                session_contributions = session_stats.get('total_contributions', 0)
-                st.metric(
-                    "Total Contributions",
-                    total_contributions,
-                    delta=session_contributions if session_contributions > 0 else None
-                )
-            with col2:
-                active_languages = len(stats.get('languages_distribution', {}))
-                session_languages = len(session_stats.get('languages_used', set()))
-                st.metric(
-                    "Active Languages",
-                    active_languages,
-                    delta=session_languages if session_languages > 0 else None
-                )
-            cultural_regions = len(stats.get('regional_distribution', {}))
-            st.sidebar.metric("Cultural Regions", cultural_regions)
-            # Progress towards goals
-            st.sidebar.markdown("**Goal Progress:**")
-            progress = min(total_contributions / 100.0, 1.0)  # Goal of 100 contributions
-            st.sidebar.progress(progress, text=f"{total_contributions}/100 contributions")
-            # Session progress
-            if session_stats['activities_completed'] > 0:
-                st.sidebar.markdown("**Your Session:**")
-                st.sidebar.write(f"✅ {session_stats['activities_completed']} activities completed")
-                st.sidebar.write(f"📝 {session_stats['total_contributions']} contributions made")
-                if session_stats['languages_used']:
-                    languages_str = ', '.join(list(session_stats['languages_used'])[:3])
-                    if len(session_stats['languages_used']) > 3:
-                        languages_str += f" +{len(session_stats['languages_used']) - 3} more"
-                    st.sidebar.write(f"🌐 Languages: {languages_str}")
-        except Exception:
-            # Fallback to basic stats on error
-            st.sidebar.metric("Total Contributions", "Loading...")
-            st.sidebar.metric("Active Languages", "Loading...")
-            st.sidebar.metric("Cultural Regions", "Loading...")
-    def render_about_section(self):
-        """Render about section in sidebar"""
-        with st.sidebar.expander("ℹ️ About This Project"):
-            st.markdown("""
-            **Mission:** Preserve Indian cultural heritage through AI-powered data collection.
-            **How it works:**
-            - Engage in fun cultural activities
-            - Contribute authentic content
-            - Help build culturally-aware AI
-            **Your Impact:**
-            - Preserve traditions for future generations
-            - Support inclusive AI development
-            - Connect with your cultural roots
-            **Privacy:** Your data is used ethically for cultural preservation and AI training.
-            """)
-    def route_to_activity(self, activity_type: ActivityType) -> None:
-        """Route user to specified activity"""
-        try:
-            if activity_type in self.activities:
-                st.session_state.current_activity = activity_type
-                activity = self.activities[activity_type]
-                # Track activity start
-                self._track_activity_start(activity_type)
-                # Run the activity
-                activity.run()
-            else:
-                st.error(f"Activity {activity_type.value} not found!")
-                st.info("Available activities:")
-                for available_type in self.activities.keys():
-                    st.write(f"- {available_type.value}")
-        except Exception as e:
-            global_error_handler.handle_error(
-                e,
-                ErrorCategory.SYSTEM,
-                ErrorSeverity.HIGH,
-                context={
-                    'component': 'activity_routing',
-                    'activity_type': activity_type.value if activity_type else 'unknown'
-                },
-                show_user_message=True
-            )
-    def _track_activity_start(self, activity_type: ActivityType):
-        """Track when user starts an activity"""
-        try:
-            # Use session manager for tracking
-            session_manager.start_activity(activity_type)
-            # Record activity start in analytics
-            self.analytics_service.record_activity_start(
-                session_manager.get_session_id(),
-                activity_type.value
-            )
-        except Exception:
-            # Don't let tracking errors break the app
-            pass
-    def run(self) -> None:
-        """Main router execution method"""
-        try:
-            # Always render navigation and global elements
-            self.render_navigation_sidebar()
-            self.render_global_stats()
-            self.render_user_progress()
-            self.render_about_section()
-            # Handle activity selection or routing
-            if st.session_state.current_activity is None:
-                # Show activity selector
-                selected_activity = self.render_activity_selector()
-                if selected_activity:
-                    st.session_state.current_activity = selected_activity
-                    st.rerun()
-            else:
-                # Route to current activity
-                self.route_to_activity(st.session_state.current_activity)
-        except Exception as e:
-            global_error_handler.handle_error(
-                e,
-                ErrorCategory.SYSTEM,
-                ErrorSeverity.CRITICAL,
-                context={'component': 'activity_router_main'},
-                show_user_message=True
-            )
-            # Fallback interface
-            st.error("🚨 Application error occurred. Please refresh the page.")
-            if st.button("🔄 Refresh Application"):
-                st.rerun()
-    def get_current_activity(self) -> Optional[BaseActivity]:
-        """Get the currently active activity"""
-        if st.session_state.current_activity:
-            return self.activities.get(st.session_state.current_activity)
-        return None
-    def get_user_session_id(self) -> str:
-        """Get the current user session ID"""
-        return st.session_state.user_session_id
-    def record_contribution(self, contribution_data: dict):
-        """Record a user contribution"""
-        try:
-            # Add session metadata
-            contribution_data.update({
-                'session_id': st.session_state.user_session_id,
-                'activity_type': st.session_state.current_activity.value if st.session_state.current_activity else 'unknown'
-            })
-            # Store contribution
-            contribution_id = self.storage_service.save_contribution(contribution_data)
-            # Update session stats
-            st.session_state.session_stats['total_contributions'] += 1
-            if 'language' in contribution_data:
-                st.session_state.session_stats['languages_used'].add(contribution_data['language'])
-            # Add to session contributions list
-            st.session_state.user_contributions.append({
-                'id': contribution_id,
-                'activity': st.session_state.current_activity.value,
-                'timestamp': contribution_data.get('timestamp'),
-                'language': contribution_data.get('language', 'unknown')
-            })
-            # Record in analytics
-            self.analytics_service.record_contribution(
-                st.session_state.user_session_id,
-                st.session_state.current_activity.value,
-                contribution_data
-            )
-            return contribution_id
-        except Exception as e:
-            global_error_handler.handle_error(
-                e,
-                ErrorCategory.STORAGE,
-                ErrorSeverity.MEDIUM,
-                context={
-                    'component': 'contribution_recording',
-                    'activity_type': st.session_state.current_activity.value if st.session_state.current_activity else 'unknown'
-                },
-                show_user_message=True
-            )
-            return None
-    def complete_activity(self, activity_type: ActivityType, completion_data: dict = None):
-        """Mark an activity as completed"""
-        try:
-            # Use session manager for completion tracking
-            contributions_made = completion_data.get('contributions_made', 0) if completion_data else 0
-            session_manager.complete_activity(activity_type, contributions_made)
-            # Record completion in analytics
-            self.analytics_service.record_activity_completion(
-                session_manager.get_session_id(),
-                activity_type.value,
-                completion_data or {}
-            )
-            # Show completion message
-            st.success(f"🎉 Great job! You've completed the {activity_type.value.replace('_', ' ').title()} activity!")
-            # Show achievements if any were unlocked
-            session_summary = session_manager.get_session_summary()
-            if session_summary.get('achievements_unlocked'):
-                recent_achievements = list(session_summary['achievements_unlocked'])[-3:]  # Show last 3
-                if recent_achievements:
-                    st.info(f"🏆 Recent achievements: {', '.join(recent_achievements)}")
-            # Offer to continue with another activity
-            st.info("Ready for another activity? Use the navigation menu to explore more ways to contribute!")
-        except Exception as e:
-            global_error_handler.handle_error(
-                e,
-                ErrorCategory.SYSTEM,
-                ErrorSeverity.LOW,
-                context={
-                    'component': 'activity_completion',
-                    'activity_type': activity_type.value
-                },
-                show_user_message=False
-            )
-    def render_user_progress(self):
-        """Render user progress section in sidebar"""
-        if st.session_state.user_contributions:
-            with st.sidebar.expander("📊 Your Progress"):
-                st.write(f"**Contributions Made:** {len(st.session_state.user_contributions)}")
-                # Show recent contributions
-                recent_contributions = st.session_state.user_contributions[-3:]  # Last 3
-                for contrib in reversed(recent_contributions):
-                    st.write(f"• {contrib['activity'].replace('_', ' ').title()}")
-                if len(st.session_state.user_contributions) > 3:
-                    st.write(f"... and {len(st.session_state.user_contributions) - 3} more")
-    def get_activity_stats(self) -> dict:
-        """Get statistics about registered activities"""
-        return {
-            'total_activities': len(self.activities),
-            'available_activities': list(self.activities.keys()),
-            'current_activity': st.session_state.current_activity,
-            'session_contributions': len(st.session_state.user_contributions),
-            'session_stats': st.session_state.session_stats
-        }

intern_project/corpus_collection_engine/activities/base_activity.py DELETED Viewed

@@ -1,225 +0,0 @@
-"""
-Base activity interface for all cultural data collection activities
-"""
-from abc import ABC, abstractmethod
-from typing import Dict, Any, Optional, Tuple
-import streamlit as st
-from corpus_collection_engine.models.data_models import UserContribution, ActivityType
-from corpus_collection_engine.models.validation import DataValidator
-class BaseActivity(ABC):
-    """Abstract base class for all cultural activities"""
-    def __init__(self, activity_type: ActivityType):
-        self.activity_type = activity_type
-        self.validator = DataValidator()
-    @abstractmethod
-    def render_interface(self) -> None:
-        """Render the Streamlit interface for this activity"""
-        pass
-    @abstractmethod
-    def process_submission(self, data: Dict[str, Any]) -> UserContribution:
-        """Process user submission and create UserContribution"""
-        pass
-    @abstractmethod
-    def validate_content(self, content: Dict[str, Any]) -> Tuple[bool, str]:
-        """Validate activity-specific content"""
-        pass
-    def get_activity_title(self) -> str:
-        """Get display title for this activity"""
-        titles = {
-            ActivityType.MEME: "🎭 Meme Creator",
-            ActivityType.RECIPE: "🍛 Recipe Exchange",
-            ActivityType.FOLKLORE: "📚 Folklore Collector",
-            ActivityType.LANDMARK: "🏛️ Landmark Identifier"
-        }
-        return titles.get(self.activity_type, "Cultural Activity")
-    def get_activity_description(self) -> str:
-        """Get description for this activity"""
-        descriptions = {
-            ActivityType.MEME: "Create memes with captions in your local dialect and share cultural humor!",
-            ActivityType.RECIPE: "Share your family recipes in your native language and preserve culinary traditions!",
-            ActivityType.FOLKLORE: "Collect and preserve traditional stories, proverbs, and folk wisdom!",
-            ActivityType.LANDMARK: "Upload photos of cultural landmarks with descriptions in your language!"
-        }
-        return descriptions.get(self.activity_type, "Contribute to cultural preservation")
-    def render_common_header(self) -> None:
-        """Render common header elements for all activities"""
-        st.header(self.get_activity_title())
-        st.markdown(f"*{self.get_activity_description()}*")
-        st.divider()
-    def render_language_selector(self, key: str = "language") -> str:
-        """Render language selection widget"""
-        from corpus_collection_engine.config import SUPPORTED_LANGUAGES
-        st.subheader("🌐 Select Language")
-        language_options = list(SUPPORTED_LANGUAGES.keys())
-        language_labels = [f"{SUPPORTED_LANGUAGES[code]} ({code})" for code in language_options]
-        selected_index = st.selectbox(
-            "Choose your preferred language:",
-            range(len(language_options)),
-            format_func=lambda x: language_labels[x],
-            key=key
-        )
-        return language_options[selected_index]
-    def render_cultural_context_form(self, key_prefix: str = "cultural") -> Dict[str, Any]:
-        """Render cultural context input form"""
-        st.subheader("🏛️ Cultural Context")
-        col1, col2 = st.columns(2)
-        with col1:
-            region = st.text_input(
-                "Region/State:",
-                placeholder="e.g., Maharashtra, Tamil Nadu",
-                key=f"{key_prefix}_region"
-            )
-        with col2:
-            cultural_significance = st.text_area(
-                "Cultural Significance:",
-                placeholder="Describe the cultural importance or context",
-                key=f"{key_prefix}_significance",
-                height=100
-            )
-        additional_context = st.text_area(
-            "Additional Context (Optional):",
-            placeholder="Any other cultural details you'd like to share",
-            key=f"{key_prefix}_additional",
-            height=80
-        )
-        return {
-            "region": region.strip() if region else "",
-            "cultural_significance": cultural_significance.strip() if cultural_significance else "",
-            "additional_context": additional_context.strip() if additional_context else ""
-        }
-    def render_submission_section(self, content_data: Dict[str, Any],
-                                cultural_context: Dict[str, Any],
-                                language: str) -> Optional[UserContribution]:
-        """Render submission section with validation and processing"""
-        st.divider()
-        st.subheader("📤 Submit Your Contribution")
-        # Show preview of what will be submitted
-        with st.expander("Preview Your Contribution"):
-            st.json({
-                "Activity": self.activity_type.value,
-                "Language": language,
-                "Content": content_data,
-                "Cultural Context": cultural_context
-            })
-        # Consent checkbox
-        consent = st.checkbox(
-            "I consent to sharing this content for cultural preservation and AI training purposes",
-            key="consent_checkbox"
-        )
-        if not consent:
-            st.info("Please provide consent to submit your contribution.")
-            return None
-        # Submit button
-        if st.button("🚀 Submit Contribution", type="primary", key="submit_button"):
-            return self._handle_submission(content_data, cultural_context, language)
-        return None
-    def _handle_submission(self, content_data: Dict[str, Any],
-                          cultural_context: Dict[str, Any],
-                          language: str) -> Optional[UserContribution]:
-        """Handle the submission process with validation"""
-        # Validate content
-        is_valid_content, content_msg = self.validate_content(content_data)
-        if not is_valid_content:
-            st.error(f"Content validation failed: {content_msg}")
-            return None
-        # Validate cultural context
-        is_valid_context, context_msg = self.validator.validate_cultural_context(cultural_context)
-        if not is_valid_context:
-            st.error(f"Cultural context validation failed: {context_msg}")
-            return None
-        # Create user contribution
-        try:
-            contribution = self.process_submission({
-                "content_data": content_data,
-                "cultural_context": cultural_context,
-                "language": language
-            })
-            # Final validation
-            is_valid_contribution, errors = self.validator.validate_user_contribution(contribution)
-            if not is_valid_contribution:
-                st.error("Contribution validation failed:")
-                for error in errors:
-                    st.error(f"• {error}")
-                return None
-            # Success!
-            st.success("🎉 Contribution submitted successfully!")
-            st.balloons()
-            # Show contribution ID
-            st.info(f"Your contribution ID: `{contribution.id}`")
-            return contribution
-        except Exception as e:
-            st.error(f"Error processing submission: {str(e)}")
-            return None
-    def render_activity_stats(self) -> None:
-        """Render activity statistics and engagement info"""
-        st.sidebar.markdown("---")
-        st.sidebar.subheader("📊 Activity Stats")
-        # Placeholder stats - will be replaced with real data later
-        col1, col2 = st.sidebar.columns(2)
-        with col1:
-            st.metric("Contributions", "0", "0")
-        with col2:
-            st.metric("Languages", "0", "0")
-        st.sidebar.info("Your contributions help preserve Indian cultural heritage!")
-    def render_help_section(self) -> None:
-        """Render help and tips section"""
-        with st.sidebar.expander("💡 Tips & Help"):
-            st.markdown(f"""
-            **How to contribute:**
-            1. Fill in the content form
-            2. Select your language
-            3. Add cultural context
-            4. Submit your contribution
-            **Quality tips:**
-            - Be authentic and respectful
-            - Provide cultural context
-            - Use your native language
-            - Share genuine experiences
-            """)
-    def run(self) -> None:
-        """Main method to run the activity"""
-        self.render_common_header()
-        self.render_activity_stats()
-        self.render_help_section()
-        self.render_interface()

intern_project/corpus_collection_engine/activities/folklore_collector.py DELETED Viewed

@@ -1,553 +0,0 @@
-"""
-Folklore Collector Activity - Preserve traditional stories, proverbs, and folk wisdom
-"""
-import streamlit as st
-from typing import Dict, Any, Tuple, List, Optional
-from datetime import datetime
-import re
-from corpus_collection_engine.activities.base_activity import BaseActivity
-from corpus_collection_engine.models.data_models import UserContribution, ActivityType
-from corpus_collection_engine.services.storage_service import StorageService
-from corpus_collection_engine.services.ai_service import AIService
-class FolkloreCollectorActivity(BaseActivity):
-    """Activity for collecting and preserving traditional folklore"""
-    def __init__(self):
-        super().__init__(ActivityType.FOLKLORE)
-        self.storage_service = StorageService()
-        self.ai_service = AIService()
-        # Folklore types
-        self.folklore_types = {
-            "folktale": "Folk Tale / लोक कथा",
-            "proverb": "Proverb / कहावत",
-            "riddle": "Riddle / पहेली",
-            "song": "Folk Song / लोक गीत",
-            "legend": "Legend / किंवदंती",
-            "myth": "Myth / पुराण कथा",
-            "moral_story": "Moral Story / नैतिक कहानी",
-            "historical_tale": "Historical Tale / ऐतिहासिक कथा",
-            "wisdom_saying": "Wisdom Saying / ज्ञान की बात",
-            "children_story": "Children's Story / बच्चों की कहानी"
-        }
-        # Story themes
-        self.story_themes = [
-            "Wisdom / बुद्धिमत्ता",
-            "Courage / साहस",
-            "Love / प्रेम",
-            "Justice / न्याय",
-            "Family / परिवार",
-            "Nature / प्रकृति",
-            "Animals / जानवर",
-            "Gods & Goddesses / देवी-देवता",
-            "Kings & Queens / राजा-रानी",
-            "Farmers / किसान",
-            "Merchants / व्यापारी",
-            "Teachers / गुरु",
-            "Festivals / त्योहार",
-            "Seasons / ऋतुएं",
-            "Good vs Evil / अच्छाई बनाम बुराई"
-        ]
-        # Age groups
-        self.age_groups = [
-            "Children (0-12) / बच्चे",
-            "Teenagers (13-18) / किशोर",
-            "Adults (19-60) / वयस्क",
-            "Elders (60+) / बुजुर्ग",
-            "All Ages / सभी उम्र"
-        ]
-        # Moral lessons
-        self.moral_categories = [
-            "Honesty / ईमानदारी",
-            "Kindness / दयालुता",
-            "Hard Work / मेहनत",
-            "Respect / सम्मान",
-            "Patience / धैर्य",
-            "Humility / विनम्रता",
-            "Generosity / उदारता",
-            "Forgiveness / क्षमा",
-            "Perseverance / दृढ़ता",
-            "Unity / एकता"
-        ]
-    def render_interface(self) -> None:
-        """Render the folklore collector interface"""
-        # Step 1: Folklore Type Selection
-        st.subheader("📚 Step 1: What Type of Folklore?")
-        folklore_type = st.selectbox(
-            "Select the type of folklore you want to share:",
-            list(self.folklore_types.keys()),
-            format_func=lambda x: self.folklore_types[x],
-            key="folklore_type"
-        )
-        # Show description based on type
-        type_descriptions = {
-            "folktale": "Traditional stories passed down through generations",
-            "proverb": "Short sayings that express wisdom or truth",
-            "riddle": "Puzzling questions or statements requiring clever answers",
-            "song": "Traditional songs with cultural significance",
-            "legend": "Stories about historical figures or events",
-            "myth": "Traditional stories explaining natural phenomena",
-            "moral_story": "Stories that teach important life lessons",
-            "historical_tale": "Stories based on historical events",
-            "wisdom_saying": "Wise sayings from elders and ancestors",
-            "children_story": "Stories specifically told to children"
-        }
-        st.info(f"📖 {type_descriptions.get(folklore_type, 'Traditional cultural content')}")
-        # Step 2: Language Selection
-        language = self.render_language_selector("folklore_language")
-        # Step 3: Title and Content
-        st.subheader("✍️ Step 2: Share Your Story")
-        title = st.text_input(
-            "Title / शीर्षक:",
-            placeholder=f"Enter the title of your {folklore_type}...",
-            key="folklore_title"
-        )
-        # Content input based on type
-        if folklore_type in ["proverb", "wisdom_saying"]:
-            content_placeholder = f"Enter the {folklore_type} in {language}...\n\nExample: 'Early to bed, early to rise, makes a man healthy, wealthy and wise.'"
-            content_height = 100
-        elif folklore_type == "riddle":
-            content_placeholder = f"Enter the riddle and its answer in {language}...\n\nRiddle: What has keys but no locks?\nAnswer: A piano"
-            content_height = 150
-        else:
-            content_placeholder = f"Tell your story in {language}...\n\nOnce upon a time..."
-            content_height = 300
-        story_content = st.text_area(
-            f"{folklore_type.replace('_', ' ').title()} Content:",
-            placeholder=content_placeholder,
-            height=content_height,
-            key="folklore_content"
-        )
-        # Step 4: Story Details
-        st.subheader("🎭 Step 3: Story Details")
-        col1, col2 = st.columns(2)
-        with col1:
-            themes = st.multiselect(
-                "Themes / विषय:",
-                self.story_themes,
-                key="folklore_themes"
-            )
-            target_age = st.selectbox(
-                "Target Age Group / लक्षित आयु समूह:",
-                self.age_groups,
-                key="target_age"
-            )
-        with col2:
-            moral_lessons = st.multiselect(
-                "Moral Lessons / नैतिक शिक्षा:",
-                self.moral_categories,
-                key="moral_lessons"
-            )
-            story_length = st.select_slider(
-                "Story Length / कहानी की लंबाई:",
-                options=["Very Short / बहुत छोटी", "Short / छोटी", "Medium / मध्यम", "Long / लंबी"],
-                value="Medium / मध्यम",
-                key="story_length"
-            )
-        # Step 5: Cultural Context and Source
-        st.subheader("🏛️ Step 4: Cultural Context & Source")
-        col1, col2 = st.columns(2)
-        with col1:
-            storyteller = st.text_input(
-                "Who told you this story? / यह कहानी आपको किसने सुनाई?",
-                placeholder="e.g., My grandmother, Village elder, Family friend",
-                key="storyteller"
-            )
-            when_heard = st.text_input(
-                "When did you first hear it? / पहली बार कब सुनी?",
-                placeholder="e.g., Childhood, During festivals, Family gatherings",
-                key="when_heard"
-            )
-        with col2:
-            occasion = st.text_input(
-                "Special Occasion / विशेष अवसर:",
-                placeholder="e.g., Bedtime stories, Festival celebrations, Moral teaching",
-                key="folklore_occasion"
-            )
-            variations = st.text_area(
-                "Other Versions / अन्य रूप:",
-                placeholder="Are there other versions of this story you know?",
-                height=80,
-                key="story_variations"
-            )
-        # Cultural context form
-        cultural_context = self.render_cultural_context_form("folklore_cultural")
-        # Add folklore-specific context
-        cultural_context.update({
-            "storyteller": storyteller.strip() if storyteller else "",
-            "when_heard": when_heard.strip() if when_heard else "",
-            "occasion": occasion.strip() if occasion else "",
-            "variations": variations.strip() if variations else ""
-        })
-        # Step 6: Meaning and Interpretation
-        st.subheader("💡 Step 5: Meaning & Interpretation")
-        meaning = st.text_area(
-            "What does this story mean to you? / इस कहानी का आपके लिए क्या अर्थ है?",
-            placeholder="Explain the deeper meaning, lessons, or significance of this folklore...",
-            height=120,
-            key="story_meaning"
-        )
-        modern_relevance = st.text_area(
-            "How is it relevant today? / आज के समय में यह कैसे प्रासंगिक है?",
-            placeholder="How does this story apply to modern life?",
-            height=100,
-            key="modern_relevance"
-        )
-        # Step 7: AI Analysis (Optional)
-        if st.checkbox("🤖 Get AI Analysis", key="ai_analysis"):
-            self._render_ai_analysis(story_content, language, folklore_type)
-        # Step 8: Preview and Submit
-        st.subheader("👀 Step 6: Preview & Submit")
-        if title and story_content:
-            # Show preview
-            with st.expander("📖 Folklore Preview"):
-                self._render_folklore_preview(
-                    title, folklore_type, story_content, themes, moral_lessons,
-                    target_age, storyteller, meaning, language
-                )
-            # Prepare content data
-            content_data = {
-                "title": title,
-                "folklore_type": folklore_type,
-                "story": story_content,
-                "themes": themes,
-                "moral_lessons": moral_lessons,
-                "target_age": target_age,
-                "story_length": story_length,
-                "meaning": meaning,
-                "modern_relevance": modern_relevance
-            }
-            # Submit section
-            contribution = self.render_submission_section(
-                content_data, cultural_context, language
-            )
-            if contribution:
-                # Save to storage
-                success = self.storage_service.save_contribution(contribution)
-                if success:
-                    st.success("🎉 Your folklore has been preserved in the cultural corpus!")
-                    # Show impact message
-                    with st.expander("🌟 Why This Matters"):
-                        st.markdown(f"""
-                        Your {folklore_type} in **{language}** helps preserve:
-                        - Traditional wisdom and moral teachings
-                        - Cultural stories and their meanings
-                        - Regional storytelling traditions
-                        - Intergenerational knowledge transfer
-                        Thank you for keeping our cultural heritage alive! 📚
-                        """)
-                    # Suggest related activities
-                    st.markdown("### 🔗 Keep Contributing!")
-                    col1, col2 = st.columns(2)
-                    with col1:
-                        if st.button("📝 Share Another Story"):
-                            # Clear form
-                            for key in list(st.session_state.keys()):
-                                if key.startswith('folklore_'):
-                                    del st.session_state[key]
-                            st.rerun()
-                    with col2:
-                        if st.button("🍛 Share a Recipe"):
-                            st.session_state.current_activity = ActivityType.RECIPE
-                            st.rerun()
-                else:
-                    st.error("Failed to save your folklore. Please try again.")
-        else:
-            st.warning("Please provide both a title and the story content!")
-    def _render_ai_analysis(self, content: str, language: str, folklore_type: str):
-        """Render AI analysis of the folklore"""
-        if content:
-            st.markdown("**🤖 AI Analysis:**")
-            col1, col2 = st.columns(2)
-            with col1:
-                if st.button("🎭 Analyze Themes", key="ai_themes"):
-                    with st.spinner("Analyzing themes..."):
-                        tags = self.ai_service.suggest_cultural_tags(content, language)
-                        if tags:
-                            st.info(f"🎭 **Detected themes:** {', '.join(tags[:6])}")
-                if st.button("💭 Extract Wisdom", key="ai_wisdom"):
-                    with st.spinner("Extracting wisdom..."):
-                        keywords = self.ai_service.extract_keywords(content, language, max_keywords=8)
-                        if keywords:
-                            st.info(f"💭 **Key concepts:** {', '.join(keywords)}")
-            with col2:
-                if st.button("😊 Analyze Sentiment", key="ai_sentiment"):
-                    sentiment = self.ai_service.analyze_sentiment(content, language)
-                    if sentiment:
-                        dominant_sentiment = max(sentiment, key=sentiment.get)
-                        st.info(f"😊 **Overall tone:** {dominant_sentiment.title()} ({sentiment[dominant_sentiment]:.1%})")
-                if st.button("🎯 Suggest Moral", key="ai_moral"):
-                    with st.spinner("Analyzing moral lessons..."):
-                        moral_prompt = f"moral lesson from this {folklore_type}: {content[:200]}"
-                        moral, confidence = self.ai_service.generate_text(moral_prompt, language, max_length=100)
-                        if moral:
-                            st.info(f"🎯 **Suggested moral:** {moral}")
-    def _render_folklore_preview(self, title: str, folklore_type: str, content: str,
-                                themes: List[str], moral_lessons: List[str],
-                                target_age: str, storyteller: str, meaning: str, language: str):
-        """Render folklore preview"""
-        st.markdown(f"# {title}")
-        st.markdown(f"**Type:** {self.folklore_types[folklore_type]}")
-        st.markdown(f"**Language:** {language}")
-        if target_age:
-            st.markdown(f"**Target Age:** {target_age}")
-        if storyteller:
-            st.markdown(f"**Storyteller:** {storyteller}")
-        st.markdown("## Story Content")
-        st.markdown(content)
-        if themes:
-            st.markdown(f"**Themes:** {', '.join(themes)}")
-        if moral_lessons:
-            st.markdown(f"**Moral Lessons:** {', '.join(moral_lessons)}")
-        if meaning:
-            st.markdown("## Meaning & Significance")
-            st.markdown(meaning)
-    def validate_content(self, content: Dict[str, Any]) -> Tuple[bool, str]:
-        """Validate folklore content"""
-        # Check required fields
-        if not content.get("title"):
-            return False, "Folklore must have a title"
-        if not content.get("story"):
-            return False, "Folklore must include the story content"
-        # Validate title
-        title = content["title"].strip()
-        if len(title) < 3:
-            return False, "Title must be at least 3 characters long"
-        if len(title) > 150:
-            return False, "Title must be less than 150 characters"
-        # Validate story content
-        story = content["story"].strip()
-        if len(story) < 50:
-            return False, "Story content must be at least 50 characters long"
-        if len(story) > 10000:
-            return False, "Story content must be less than 10,000 characters"
-        # Check for folklore type
-        if not content.get("folklore_type"):
-            return False, "Folklore type must be specified"
-        if content["folklore_type"] not in self.folklore_types:
-            return False, "Invalid folklore type"
-        # Validate content based on type
-        folklore_type = content["folklore_type"]
-        if folklore_type == "proverb" and len(story) > 500:
-            return False, "Proverbs should be concise (less than 500 characters)"
-        if folklore_type == "riddle":
-            # Check if riddle has both question and answer
-            if "?" not in story:
-                return False, "Riddles should contain a question"
-        return True, "Valid folklore content"
-    def process_submission(self, data: Dict[str, Any]) -> UserContribution:
-        """Process folklore submission and create UserContribution"""
-        # Get session ID from router if available
-        session_id = st.session_state.get('user_session_id', 'anonymous')
-        # Calculate content statistics
-        story_content = data["content_data"].get("story", "")
-        word_count = len(story_content.split())
-        char_count = len(story_content)
-        return UserContribution(
-            user_session=session_id,
-            activity_type=self.activity_type,
-            content_data=data["content_data"],
-            language=data["language"],
-            cultural_context=data["cultural_context"],
-            metadata={
-                "folklore_type": data["content_data"].get("folklore_type"),
-                "word_count": word_count,
-                "character_count": char_count,
-                "themes_count": len(data["content_data"].get("themes", [])),
-                "moral_lessons_count": len(data["content_data"].get("moral_lessons", [])),
-                "target_age": data["content_data"].get("target_age"),
-                "story_length": data["content_data"].get("story_length"),
-                "has_meaning": bool(data["content_data"].get("meaning", "").strip()),
-                "has_modern_relevance": bool(data["content_data"].get("modern_relevance", "").strip()),
-                "submission_timestamp": datetime.now().isoformat(),
-                "activity_version": "1.0"
-            }
-        )
-    def render_folklore_gallery(self):
-        """Render gallery of recent folklore"""
-        st.subheader("📚 Community Folklore Collection")
-        # Get recent folklore from storage
-        recent_contributions = self.storage_service.get_contributions_by_language(
-            st.session_state.get('selected_language', 'hi'), limit=12
-        )
-        folklore_contributions = [
-            contrib for contrib in recent_contributions
-            if contrib.activity_type == ActivityType.FOLKLORE
-        ]
-        if folklore_contributions:
-            # Group by folklore type
-            folklore_by_type = {}
-            for contrib in folklore_contributions:
-                folklore_type = contrib.content_data.get('folklore_type', 'unknown')
-                if folklore_type not in folklore_by_type:
-                    folklore_by_type[folklore_type] = []
-                folklore_by_type[folklore_type].append(contrib)
-            # Display by type
-            for folklore_type, contributions in folklore_by_type.items():
-                if folklore_type in self.folklore_types:
-                    st.markdown(f"### {self.folklore_types[folklore_type]}")
-                    cols = st.columns(min(3, len(contributions)))
-                    for i, contrib in enumerate(contributions[:3]):
-                        col = cols[i % 3]
-                        with col:
-                            with st.container():
-                                st.markdown(f"**{contrib.content_data.get('title', 'Untitled')}**")
-                                # Story preview
-                                story = contrib.content_data.get('story', '')
-                                if story:
-                                    preview = story[:100] + "..." if len(story) > 100 else story
-                                    st.markdown(f"*{preview}*")
-                                # Metadata
-                                st.markdown(f"🌐 {contrib.language}")
-                                if contrib.cultural_context.get("region"):
-                                    st.markdown(f"📍 {contrib.cultural_context['region']}")
-                                # Themes
-                                themes = contrib.content_data.get('themes', [])
-                                if themes:
-                                    st.markdown(f"🎭 {', '.join(themes[:2])}")
-                                # Storyteller
-                                storyteller = contrib.cultural_context.get('storyteller', '')
-                                if storyteller:
-                                    st.markdown(f"👤 Told by: {storyteller}")
-                                st.markdown("---")
-        else:
-            st.info("No folklore yet. Be the first to share a traditional story! 📚")
-    def render_folklore_statistics(self):
-        """Render folklore collection statistics"""
-        st.subheader("📊 Folklore Statistics")
-        # Get all folklore contributions
-        all_contributions = []
-        for lang in ["hi", "bn", "ta", "te", "ml", "kn", "gu", "mr", "pa", "or", "en"]:
-            contributions = self.storage_service.get_contributions_by_language(lang, limit=1000)
-            folklore_contribs = [c for c in contributions if c.activity_type == ActivityType.FOLKLORE]
-            all_contributions.extend(folklore_contribs)
-        if all_contributions:
-            # Statistics by type
-            type_counts = {}
-            for contrib in all_contributions:
-                folklore_type = contrib.content_data.get('folklore_type', 'unknown')
-                type_counts[folklore_type] = type_counts.get(folklore_type, 0) + 1
-            # Display statistics
-            col1, col2 = st.columns(2)
-            with col1:
-                st.markdown("**By Type:**")
-                for folklore_type, count in sorted(type_counts.items(), key=lambda x: x[1], reverse=True):
-                    if folklore_type in self.folklore_types:
-                        type_name = self.folklore_types[folklore_type].split(' / ')[0]
-                        st.markdown(f"- {type_name}: {count}")
-            with col2:
-                # Language distribution
-                lang_counts = {}
-                for contrib in all_contributions:
-                    lang = contrib.language
-                    lang_counts[lang] = lang_counts.get(lang, 0) + 1
-                st.markdown("**By Language:**")
-                for lang, count in sorted(lang_counts.items(), key=lambda x: x[1], reverse=True)[:5]:
-                    from corpus_collection_engine.config import SUPPORTED_LANGUAGES
-                    lang_name = SUPPORTED_LANGUAGES.get(lang, lang)
-                    st.markdown(f"- {lang_name}: {count}")
-        else:
-            st.info("No folklore statistics available yet.")
-    def run(self):
-        """Override run method to add gallery and statistics"""
-        super().run()
-        # Add gallery and statistics sections
-        st.markdown("---")
-        tab1, tab2 = st.tabs(["📚 Community Gallery", "📊 Statistics"])
-        with tab1:
-            self.render_folklore_gallery()
-        with tab2:
-            self.render_folklore_statistics()

intern_project/corpus_collection_engine/activities/landmark_identifier.py DELETED Viewed

@@ -1,535 +0,0 @@
-"""
-Landmark Identifier Activity - Upload photos of cultural landmarks with descriptions
-"""
-import streamlit as st
-from typing import Dict, Any, Tuple, List, Optional
-from datetime import datetime
-import base64
-from PIL import Image
-import io
-from corpus_collection_engine.activities.base_activity import BaseActivity
-from corpus_collection_engine.models.data_models import UserContribution, ActivityType
-from corpus_collection_engine.services.storage_service import StorageService
-from corpus_collection_engine.services.ai_service import AIService
-class LandmarkIdentifierActivity(BaseActivity):
-    """Activity for documenting cultural landmarks with photos and descriptions"""
-    def __init__(self):
-        super().__init__(ActivityType.LANDMARK)
-        self.storage_service = StorageService()
-        self.ai_service = AIService()
-        # Landmark categories
-        self.landmark_categories = {
-            "temple": "Temple / मंदिर",
-            "mosque": "Mosque / मस्जिद",
-            "church": "Church / गिरजाघर",
-            "gurudwara": "Gurudwara / गुरुद्वारा",
-            "monument": "Monument / स्मारक",
-            "palace": "Palace / महल",
-            "fort": "Fort / किला",
-            "museum": "Museum / संग्रहालय",
-            "heritage_site": "Heritage Site / विरासत स्थल",
-            "market": "Traditional Market / पारंपरिक बाजार",
-            "garden": "Garden / बगीचा",
-            "lake": "Lake / झील",
-            "mountain": "Mountain / पहाड़",
-            "river": "River / नदी",
-            "village": "Village / गांव",
-            "architecture": "Architecture / वास्तुकला",
-            "cultural_center": "Cultural Center / सांस्कृतिक केंद्र",
-            "festival_ground": "Festival Ground / त्योहार स्थल",
-            "other": "Other / अन्य"
-        }
-        # Historical periods
-        self.historical_periods = [
-            "Ancient (Before 500 CE) / प्राचीन",
-            "Classical (500-1200 CE) / शास्त्रीय",
-            "Medieval (1200-1700 CE) / मध्यकालीन",
-            "Colonial (1700-1947 CE) / औपनिवेशिक",
-            "Modern (1947-Present) / आधुनिक",
-            "Unknown / अज्ञात"
-        ]
-        # Architectural styles
-        self.architectural_styles = [
-            "Dravidian / द्रविड़",
-            "Nagara / नागर",
-            "Indo-Islamic / इंडो-इस्लामिक",
-            "Mughal / मुगल",
-            "Rajput / राजपूत",
-            "Colonial / औपनिवेशिक",
-            "Modern / आधुनिक",
-            "Vernacular / स्थानीय",
-            "Mixed / मिश्रित",
-            "Other / अन्य"
-        ]
-        # Significance types
-        self.significance_types = [
-            "Religious / धार्मिक",
-            "Historical / ऐतिहासिक",
-            "Cultural / सांस्कृतिक",
-            "Architectural / वास्तुकला",
-            "Natural / प्राकृतिक",
-            "Educational / शैक्षणिक",
-            "Commercial / व्यावसायिक",
-            "Social / सामाजिक",
-            "Political / राजनीतिक",
-            "Artistic / कलात्मक"
-        ]
-    def render_interface(self) -> None:
-        """Render the landmark identifier interface"""
-        # Step 1: Photo Upload
-        st.subheader("📸 Step 1: Upload Landmark Photo")
-        uploaded_image = st.file_uploader(
-            "Choose a photo of the landmark:",
-            type=['png', 'jpg', 'jpeg', 'webp'],
-            key="landmark_image_upload",
-            help="Upload a clear photo of the cultural landmark you want to document"
-        )
-        if uploaded_image:
-            # Display uploaded image
-            image = Image.open(uploaded_image)
-            # Resize for display if too large
-            display_image = image.copy()
-            display_image.thumbnail((800, 600), Image.Resampling.LANCZOS)
-            st.image(display_image, caption="Uploaded Landmark Photo", use_container_width=True)
-            # Image metadata
-            col1, col2, col3 = st.columns(3)
-            with col1:
-                st.metric("Width", f"{image.width}px")
-            with col2:
-                st.metric("Height", f"{image.height}px")
-            with col3:
-                file_size = len(uploaded_image.getvalue()) / 1024
-                st.metric("Size", f"{file_size:.1f} KB")
-        # Step 2: Basic Information
-        st.subheader("ℹ️ Step 2: Basic Information")
-        col1, col2 = st.columns(2)
-        with col1:
-            landmark_name = st.text_input(
-                "Landmark Name / स्थल का नाम:",
-                placeholder="e.g., Red Fort, Meenakshi Temple",
-                key="landmark_name"
-            )
-            category = st.selectbox(
-                "Category / श्रेणी:",
-                list(self.landmark_categories.keys()),
-                format_func=lambda x: self.landmark_categories[x],
-                key="landmark_category"
-            )
-        with col2:
-            location = st.text_input(
-                "Location / स्थान:",
-                placeholder="e.g., Delhi, Madurai, Tamil Nadu",
-                key="landmark_location"
-            )
-            historical_period = st.selectbox(
-                "Historical Period / ऐतिहासिक काल:",
-                self.historical_periods,
-                key="historical_period"
-            )
-        # Step 3: Language Selection
-        language = self.render_language_selector("landmark_language")
-        # Step 4: Description
-        st.subheader("📝 Step 3: Description")
-        description = st.text_area(
-            f"Describe the landmark in {language}:",
-            placeholder=f"Write a detailed description of the landmark in {language}...\n\nInclude:\n- What you see in the photo\n- Historical significance\n- Architectural features\n- Cultural importance\n- Personal experience or memories",
-            height=200,
-            key="landmark_description"
-        )
-        # Step 5: Detailed Information
-        st.subheader("🏛️ Step 4: Detailed Information")
-        col1, col2 = st.columns(2)
-        with col1:
-            architectural_style = st.multiselect(
-                "Architectural Style / वास्तुकला शैली:",
-                self.architectural_styles,
-                key="architectural_style"
-            )
-            significance = st.multiselect(
-                "Significance / महत्व:",
-                self.significance_types,
-                key="landmark_significance"
-            )
-        with col2:
-            built_by = st.text_input(
-                "Built by / निर्माता:",
-                placeholder="e.g., Shah Jahan, Chola Dynasty",
-                key="built_by"
-            )
-            year_built = st.text_input(
-                "Year Built / निर्माण वर्ष:",
-                placeholder="e.g., 1648, 12th Century",
-                key="year_built"
-            )
-        # Step 6: Cultural Context
-        st.subheader("🎭 Step 5: Cultural Context")
-        col1, col2 = st.columns(2)
-        with col1:
-            festivals_events = st.text_area(
-                "Festivals/Events / त्योहार/कार्यक्रम:",
-                placeholder="What festivals or events happen here?",
-                height=100,
-                key="festivals_events"
-            )
-            local_legends = st.text_area(
-                "Local Legends/Stories / स्थानीय किंवदंतियां:",
-                placeholder="Any interesting stories or legends about this place?",
-                height=100,
-                key="local_legends"
-            )
-        with col2:
-            visiting_tips = st.text_area(
-                "Visiting Tips / यात्रा सुझाव:",
-                placeholder="Best time to visit, entry fees, special rules, etc.",
-                height=100,
-                key="visiting_tips"
-            )
-            personal_experience = st.text_area(
-                "Personal Experience / व्यक्तिगत अनुभव:",
-                placeholder="Your personal experience or connection to this place",
-                height=100,
-                key="personal_experience"
-            )
-        # Cultural context form
-        cultural_context = self.render_cultural_context_form("landmark_cultural")
-        # Add landmark-specific context
-        cultural_context.update({
-            "festivals_events": festivals_events.strip() if festivals_events else "",
-            "local_legends": local_legends.strip() if local_legends else "",
-            "visiting_tips": visiting_tips.strip() if visiting_tips else "",
-            "personal_experience": personal_experience.strip() if personal_experience else "",
-            "built_by": built_by.strip() if built_by else "",
-            "year_built": year_built.strip() if year_built else ""
-        })
-        # Step 7: AI Analysis (Optional)
-        if uploaded_image and st.checkbox("🤖 Get AI Analysis", key="ai_analysis"):
-            self._render_ai_analysis(uploaded_image, description, language)
-        # Step 8: Preview and Submit
-        st.subheader("👀 Step 6: Preview & Submit")
-        if uploaded_image and landmark_name and description:
-            # Show preview
-            with st.expander("🏛️ Landmark Preview"):
-                self._render_landmark_preview(
-                    landmark_name, category, location, description,
-                    architectural_style, significance, historical_period,
-                    built_by, year_built, language, display_image
-                )
-            # Prepare content data
-            content_data = {
-                "name": landmark_name,
-                "category": category,
-                "location": location,
-                "description": description,
-                "historical_period": historical_period,
-                "architectural_style": architectural_style,
-                "significance": significance,
-                "image_data": self._image_to_base64(image)
-            }
-            # Submit section
-            contribution = self.render_submission_section(
-                content_data, cultural_context, language
-            )
-            if contribution:
-                # Save to storage
-                success = self.storage_service.save_contribution(contribution)
-                if success:
-                    st.success("🎉 Your landmark has been documented in the cultural corpus!")
-                    # Show impact message
-                    with st.expander("🌟 Why This Matters"):
-                        st.markdown(f"""
-                        Your landmark documentation in **{language}** helps preserve:
-                        - Visual and textual records of cultural heritage
-                        - Architectural and historical knowledge
-                        - Local stories and cultural significance
-                        - Tourism and educational resources
-                        Thank you for documenting India's rich cultural landscape! 🏛️
-                        """)
-                    # Clear form
-                    if st.button("📸 Document Another Landmark"):
-                        # Clear session state
-                        for key in list(st.session_state.keys()):
-                            if key.startswith('landmark_'):
-                                del st.session_state[key]
-                        st.rerun()
-                else:
-                    st.error("Failed to save your landmark documentation. Please try again.")
-        else:
-            missing_items = []
-            if not uploaded_image:
-                missing_items.append("photo")
-            if not landmark_name:
-                missing_items.append("landmark name")
-            if not description:
-                missing_items.append("description")
-            st.warning(f"Please provide: {', '.join(missing_items)}")
-    def _render_ai_analysis(self, uploaded_image: Any, description: str, language: str):
-        """Render AI analysis of the landmark"""
-        st.markdown("**🤖 AI Analysis:**")
-        col1, col2 = st.columns(2)
-        with col1:
-            if st.button("🏛️ Analyze Architecture", key="ai_architecture"):
-                with st.spinner("Analyzing architectural features..."):
-                    # Generate architectural analysis
-                    arch_prompt = f"architectural features of this landmark: {description[:200]}"
-                    analysis, confidence = self.ai_service.generate_text(arch_prompt, language, max_length=150)
-                    if analysis:
-                        st.info(f"🏛️ **Architecture:** {analysis}")
-            if st.button("📍 Extract Location Info", key="ai_location"):
-                with st.spinner("Extracting location information..."):
-                    keywords = self.ai_service.extract_keywords(description, language, max_keywords=6)
-                    if keywords:
-                        st.info(f"📍 **Key features:** {', '.join(keywords)}")
-        with col2:
-            if st.button("🎭 Analyze Cultural Significance", key="ai_culture"):
-                with st.spinner("Analyzing cultural significance..."):
-                    tags = self.ai_service.suggest_cultural_tags(description, language)
-                    if tags:
-                        st.info(f"🎭 **Cultural tags:** {', '.join(tags[:5])}")
-            if st.button("���� Generate Caption", key="ai_caption"):
-                with st.spinner("Generating photo caption..."):
-                    caption, confidence = self.ai_service.generate_caption(description[:100], language)
-                    if caption:
-                        st.info(f"📚 **Suggested caption:** {caption}")
-    def _render_landmark_preview(self, name: str, category: str, location: str,
-                                description: str, architectural_style: List[str],
-                                significance: List[str], historical_period: str,
-                                built_by: str, year_built: str, language: str, image: Image.Image):
-        """Render landmark preview"""
-        st.image(image, caption=name, use_container_width=True)
-        st.markdown(f"# {name}")
-        st.markdown(f"**Category:** {self.landmark_categories[category]}")
-        st.markdown(f"**Location:** {location}")
-        st.markdown(f"**Language:** {language}")
-        if historical_period:
-            st.markdown(f"**Historical Period:** {historical_period}")
-        if built_by:
-            st.markdown(f"**Built by:** {built_by}")
-        if year_built:
-            st.markdown(f"**Year Built:** {year_built}")
-        if architectural_style:
-            st.markdown(f"**Architectural Style:** {', '.join(architectural_style)}")
-        if significance:
-            st.markdown(f"**Significance:** {', '.join(significance)}")
-        st.markdown("## Description")
-        st.markdown(description)
-    def _image_to_base64(self, image: Image.Image) -> str:
-        """Convert PIL Image to base64 string"""
-        buffer = io.BytesIO()
-        # Convert to RGB if necessary
-        if image.mode in ('RGBA', 'LA', 'P'):
-            image = image.convert('RGB')
-        image.save(buffer, format="JPEG", quality=85)
-        img_str = base64.b64encode(buffer.getvalue()).decode()
-        return img_str
-    def validate_content(self, content: Dict[str, Any]) -> Tuple[bool, str]:
-        """Validate landmark content"""
-        # Check required fields
-        required_fields = ["name", "description"]
-        for field in required_fields:
-            if not content.get(field):
-                return False, f"Landmark must include {field}"
-        # Validate name
-        name = content["name"].strip()
-        if len(name) < 3:
-            return False, "Landmark name must be at least 3 characters long"
-        if len(name) > 100:
-            return False, "Landmark name must be less than 100 characters"
-        # Validate description
-        description = content["description"].strip()
-        if len(description) < 20:
-            return False, "Description must be at least 20 characters long"
-        if len(description) > 3000:
-            return False, "Description must be less than 3000 characters"
-        # Check category
-        if not content.get("category"):
-            return False, "Landmark category must be specified"
-        if content["category"] not in self.landmark_categories:
-            return False, "Invalid landmark category"
-        # Validate image data
-        if not content.get("image_data"):
-            return False, "Landmark photo is required"
-        return True, "Valid landmark content"
-    def process_submission(self, data: Dict[str, Any]) -> UserContribution:
-        """Process landmark submission and create UserContribution"""
-        # Get session ID from router if available
-        session_id = st.session_state.get('user_session_id', 'anonymous')
-        # Calculate content statistics
-        description = data["content_data"].get("description", "")
-        word_count = len(description.split())
-        char_count = len(description)
-        return UserContribution(
-            user_session=session_id,
-            activity_type=self.activity_type,
-            content_data=data["content_data"],
-            language=data["language"],
-            cultural_context=data["cultural_context"],
-            metadata={
-                "landmark_category": data["content_data"].get("category"),
-                "location": data["content_data"].get("location"),
-                "historical_period": data["content_data"].get("historical_period"),
-                "architectural_styles": data["content_data"].get("architectural_style", []),
-                "significance_types": data["content_data"].get("significance", []),
-                "word_count": word_count,
-                "character_count": char_count,
-                "has_image": bool(data["content_data"].get("image_data")),
-                "has_location": bool(data["content_data"].get("location", "").strip()),
-                "has_historical_info": bool(data["cultural_context"].get("built_by", "").strip() or data["cultural_context"].get("year_built", "").strip()),
-                "submission_timestamp": datetime.now().isoformat(),
-                "activity_version": "1.0"
-            }
-        )
-    def render_landmark_gallery(self):
-        """Render gallery of documented landmarks"""
-        st.subheader("🏛️ Community Landmark Collection")
-        # Get recent landmarks from storage
-        recent_contributions = self.storage_service.get_contributions_by_language(
-            st.session_state.get('selected_language', 'hi'), limit=12
-        )
-        landmark_contributions = [
-            contrib for contrib in recent_contributions
-            if contrib.activity_type == ActivityType.LANDMARK
-        ]
-        if landmark_contributions:
-            # Display landmarks in grid
-            cols = st.columns(3)
-            for i, contrib in enumerate(landmark_contributions[:12]):
-                col = cols[i % 3]
-                with col:
-                    with st.container():
-                        # Display image if available
-                        image_data = contrib.content_data.get('image_data')
-                        if image_data:
-                            try:
-                                image_bytes = base64.b64decode(image_data)
-                                image = Image.open(io.BytesIO(image_bytes))
-                                st.image(image, use_container_width=True)
-                            except:
-                                st.info("📸 Image not available")
-                        st.markdown(f"**{contrib.content_data.get('name', 'Unnamed Landmark')}**")
-                        # Category and location
-                        category = contrib.content_data.get('category', 'unknown')
-                        if category in self.landmark_categories:
-                            st.markdown(f"*{self.landmark_categories[category]}*")
-                        location = contrib.content_data.get('location', '')
-                        if location:
-                            st.markdown(f"📍 {location}")
-                        # Description preview
-                        description = contrib.content_data.get('description', '')
-                        if description:
-                            preview = description[:80] + "..." if len(description) > 80 else description
-                            st.markdown(f"📝 {preview}")
-                        # Language and region
-                        st.markdown(f"🌐 {contrib.language}")
-                        if contrib.cultural_context.get("region"):
-                            st.markdown(f"🏛️ {contrib.cultural_context['region']}")
-                        st.markdown("---")
-        else:
-            st.info("No landmarks documented yet. Be the first to share a cultural landmark! 📸")
-    def render_landmark_map(self):
-        """Render interactive map of landmarks (placeholder)"""
-        st.subheader("🗺️ Landmark Map")
-        st.info("Interactive map feature coming soon! This will show all documented landmarks on a map of India.")
-        # Placeholder for map functionality
-        # In a full implementation, you would integrate with mapping libraries
-        # like Folium, Plotly, or Google Maps to show landmark locations
-    def run(self):
-        """Override run method to add gallery and map options"""
-        super().run()
-        # Add gallery and map sections
-        st.markdown("---")
-        tab1, tab2 = st.tabs(["🏛️ Community Gallery", "🗺️ Landmark Map"])
-        with tab1:
-            self.render_landmark_gallery()
-        with tab2:
-            self.render_landmark_map()

intern_project/corpus_collection_engine/activities/meme_creator.py DELETED Viewed

@@ -1,331 +0,0 @@
-"""
-Meme Creator Activity - Create memes with captions in local dialects
-"""
-import streamlit as st
-import io
-from PIL import Image, ImageDraw, ImageFont
-from typing import Dict, Any, Tuple, Optional
-import base64
-from datetime import datetime
-from corpus_collection_engine.activities.base_activity import BaseActivity
-from corpus_collection_engine.models.data_models import UserContribution, ActivityType
-from corpus_collection_engine.services.storage_service import StorageService
-class MemeCreatorActivity(BaseActivity):
-    """Activity for creating memes with cultural captions"""
-    def __init__(self):
-        super().__init__(ActivityType.MEME)
-        self.storage_service = StorageService()
-        # Default meme templates (placeholder images)
-        self.meme_templates = {
-            "distracted_boyfriend": {
-                "name": "Distracted Boyfriend",
-                "description": "Classic meme template",
-                "text_positions": [(100, 50), (300, 50), (500, 50)]
-            },
-            "drake_pointing": {
-                "name": "Drake Pointing",
-                "description": "Drake approval/disapproval meme",
-                "text_positions": [(200, 100), (200, 300)]
-            },
-            "woman_yelling_cat": {
-                "name": "Woman Yelling at Cat",
-                "description": "Woman pointing at confused cat",
-                "text_positions": [(100, 50), (400, 50)]
-            },
-            "custom": {
-                "name": "Upload Your Own",
-                "description": "Upload your own image",
-                "text_positions": [(100, 50)]
-            }
-        }
-    def render_interface(self) -> None:
-        """Render the meme creator interface"""
-        # Step 1: Template Selection
-        st.subheader("🎭 Step 1: Choose Meme Template")
-        template_options = list(self.meme_templates.keys())
-        template_labels = [self.meme_templates[key]["name"] for key in template_options]
-        selected_template = st.selectbox(
-            "Select a meme template:",
-            template_options,
-            format_func=lambda x: self.meme_templates[x]["name"],
-            key="meme_template"
-        )
-        st.info(f"📝 {self.meme_templates[selected_template]['description']}")
-        # Step 2: Image Upload (if custom template)
-        uploaded_image = None
-        if selected_template == "custom":
-            st.subheader("📸 Step 2: Upload Your Image")
-            uploaded_image = st.file_uploader(
-                "Choose an image file",
-                type=['png', 'jpg', 'jpeg', 'webp'],
-                key="meme_image_upload"
-            )
-            if uploaded_image:
-                # Display uploaded image
-                image = Image.open(uploaded_image)
-                st.image(image, caption="Uploaded Image", use_container_width=True)
-        else:
-            # Show placeholder for template
-            st.subheader("📸 Step 2: Template Preview")
-            st.info(f"Using template: {self.meme_templates[selected_template]['name']}")
-            # In a real implementation, you'd show the actual template image
-            st.image("https://via.placeholder.com/500x300/cccccc/666666?text=Meme+Template",
-                    caption=f"Template: {self.meme_templates[selected_template]['name']}")
-        # Step 3: Text Input
-        st.subheader("✍️ Step 3: Add Your Caption")
-        # Language selection
-        language = self.render_language_selector("meme_language")
-        # Text inputs based on template
-        text_inputs = []
-        num_texts = len(self.meme_templates[selected_template]["text_positions"])
-        for i in range(num_texts):
-            text_label = f"Text {i+1}" if num_texts > 1 else "Caption"
-            text_input = st.text_area(
-                f"{text_label}:",
-                placeholder=f"Enter your {text_label.lower()} in {language}...",
-                key=f"meme_text_{i}",
-                height=80
-            )
-            text_inputs.append(text_input)
-        # Step 4: Meme Style Options
-        st.subheader("🎨 Step 4: Style Options")
-        col1, col2 = st.columns(2)
-        with col1:
-            font_size = st.slider("Font Size", 20, 60, 40, key="meme_font_size")
-            text_color = st.color_picker("Text Color", "#FFFFFF", key="meme_text_color")
-        with col2:
-            outline_color = st.color_picker("Outline Color", "#000000", key="meme_outline_color")
-            outline_width = st.slider("Outline Width", 0, 5, 2, key="meme_outline_width")
-        # Step 5: Cultural Context
-        cultural_context = self.render_cultural_context_form("meme_cultural")
-        # Step 6: Preview and Generate
-        st.subheader("👀 Step 5: Preview & Generate")
-        if st.button("🎨 Generate Meme Preview", key="generate_meme"):
-            if any(text.strip() for text in text_inputs):
-                # Generate meme preview
-                meme_image = self._generate_meme(
-                    selected_template, uploaded_image, text_inputs,
-                    font_size, text_color, outline_color, outline_width
-                )
-                if meme_image:
-                    st.image(meme_image, caption="Your Meme", use_container_width=True)
-                    # Prepare content data
-                    content_data = {
-                        "template": selected_template,
-                        "texts": [text.strip() for text in text_inputs if text.strip()],
-                        "style": {
-                            "font_size": font_size,
-                            "text_color": text_color,
-                            "outline_color": outline_color,
-                            "outline_width": outline_width
-                        },
-                        "image_data": self._image_to_base64(meme_image)
-                    }
-                    # Step 7: Submit
-                    contribution = self.render_submission_section(
-                        content_data, cultural_context, language
-                    )
-                    if contribution:
-                        # Save to storage
-                        success = self.storage_service.save_contribution(contribution)
-                        if success:
-                            st.success("🎉 Your meme has been added to the cultural corpus!")
-                            # Show some engagement
-                            with st.expander("🌟 Why This Matters"):
-                                st.markdown(f"""
-                                Your meme in **{language}** helps preserve:
-                                - Local humor and cultural references
-                                - Language expressions and slang
-                                - Regional perspectives and contexts
-                                - Digital cultural artifacts
-                                Thank you for contributing to India's digital heritage! 🇮🇳
-                                """)
-                        else:
-                            st.error("Failed to save your meme. Please try again.")
-            else:
-                st.warning("Please add at least one caption to generate your meme!")
-    def _generate_meme(self, template: str, uploaded_image: Optional[Any],
-                      texts: list, font_size: int, text_color: str,
-                      outline_color: str, outline_width: int) -> Optional[Image.Image]:
-        """Generate meme image with text overlay"""
-        try:
-            # Create base image
-            if template == "custom" and uploaded_image:
-                base_image = Image.open(uploaded_image).convert("RGB")
-            else:
-                # Create placeholder image for template
-                base_image = Image.new("RGB", (500, 300), color="lightgray")
-                draw = ImageDraw.Draw(base_image)
-                draw.text((250, 150), f"Template: {template}",
-                         fill="black", anchor="mm")
-            # Resize if too large
-            max_size = (800, 600)
-            base_image.thumbnail(max_size, Image.Resampling.LANCZOS)
-            # Create drawing context
-            draw = ImageDraw.Draw(base_image)
-            # Try to use a better font (fallback to default if not available)
-            try:
-                # Try to load a system font that supports Unicode
-                font = ImageFont.truetype("arial.ttf", font_size)
-            except:
-                try:
-                    font = ImageFont.load_default()
-                except:
-                    font = None
-            # Get text positions for this template
-            positions = self.meme_templates[template]["text_positions"]
-            # Add text overlays
-            for i, text in enumerate(texts):
-                if text.strip() and i < len(positions):
-                    x, y = positions[i]
-                    # Adjust position based on image size
-                    img_width, img_height = base_image.size
-                    x = min(x, img_width - 50)
-                    y = min(y, img_height - 50)
-                    # Draw text with outline
-                    if outline_width > 0:
-                        # Draw outline
-                        for adj_x in range(-outline_width, outline_width + 1):
-                            for adj_y in range(-outline_width, outline_width + 1):
-                                if adj_x != 0 or adj_y != 0:
-                                    draw.text((x + adj_x, y + adj_y), text,
-                                            font=font, fill=outline_color)
-                    # Draw main text
-                    draw.text((x, y), text, font=font, fill=text_color)
-            return base_image
-        except Exception as e:
-            st.error(f"Error generating meme: {str(e)}")
-            return None
-    def _image_to_base64(self, image: Image.Image) -> str:
-        """Convert PIL Image to base64 string"""
-        buffer = io.BytesIO()
-        image.save(buffer, format="PNG")
-        img_str = base64.b64encode(buffer.getvalue()).decode()
-        return img_str
-    def validate_content(self, content: Dict[str, Any]) -> Tuple[bool, str]:
-        """Validate meme content"""
-        if not content.get("texts"):
-            return False, "Meme must have at least one text caption"
-        # Check if any text is provided
-        texts = content.get("texts", [])
-        if not any(text.strip() for text in texts):
-            return False, "At least one caption must contain text"
-        # Validate text length
-        for text in texts:
-            if text.strip() and len(text.strip()) < 2:
-                return False, "Captions must be at least 2 characters long"
-            if len(text) > 200:
-                return False, "Captions must be less than 200 characters"
-        # Check template
-        if not content.get("template"):
-            return False, "Meme template must be specified"
-        if content["template"] not in self.meme_templates:
-            return False, "Invalid meme template"
-        return True, "Valid meme content"
-    def process_submission(self, data: Dict[str, Any]) -> UserContribution:
-        """Process meme submission and create UserContribution"""
-        # Get session ID from router if available
-        session_id = st.session_state.get('user_session_id', 'anonymous')
-        return UserContribution(
-            user_session=session_id,
-            activity_type=self.activity_type,
-            content_data=data["content_data"],
-            language=data["language"],
-            cultural_context=data["cultural_context"],
-            metadata={
-                "template_used": data["content_data"].get("template"),
-                "num_captions": len(data["content_data"].get("texts", [])),
-                "submission_timestamp": datetime.now().isoformat(),
-                "activity_version": "1.0"
-            }
-        )
-    def render_meme_gallery(self):
-        """Render gallery of recent memes (optional feature)"""
-        st.subheader("🖼️ Recent Community Memes")
-        # Get recent memes from storage
-        recent_contributions = self.storage_service.get_contributions_by_language(
-            st.session_state.get('selected_language', 'hi'), limit=6
-        )
-        meme_contributions = [
-            contrib for contrib in recent_contributions
-            if contrib.activity_type == ActivityType.MEME
-        ]
-        if meme_contributions:
-            cols = st.columns(3)
-            for i, contrib in enumerate(meme_contributions[:6]):
-                col = cols[i % 3]
-                with col:
-                    # Display meme info
-                    st.markdown(f"**Language:** {contrib.language}")
-                    if contrib.cultural_context.get("region"):
-                        st.markdown(f"**Region:** {contrib.cultural_context['region']}")
-                    # Show text content
-                    texts = contrib.content_data.get("texts", [])
-                    if texts:
-                        st.markdown(f"**Caption:** {texts[0][:50]}...")
-                    st.markdown("---")
-        else:
-            st.info("No memes yet. Be the first to create one! 🎭")
-    def run(self):
-        """Override run method to add gallery option"""
-        super().run()
-        # Add gallery section
-        st.markdown("---")
-        with st.expander("🖼️ Community Meme Gallery"):
-            self.render_meme_gallery()

intern_project/corpus_collection_engine/activities/recipe_exchange.py DELETED Viewed

@@ -1,505 +0,0 @@
-"""
-Recipe Exchange Activity - Share family recipes in native languages
-"""
-import streamlit as st
-from typing import Dict, Any, Tuple, List, Optional
-from datetime import datetime
-import json
-from corpus_collection_engine.activities.base_activity import BaseActivity
-from corpus_collection_engine.models.data_models import UserContribution, ActivityType
-from corpus_collection_engine.services.storage_service import StorageService
-from corpus_collection_engine.services.ai_service import AIService
-class RecipeExchangeActivity(BaseActivity):
-    """Activity for sharing traditional family recipes"""
-    def __init__(self):
-        super().__init__(ActivityType.RECIPE)
-        self.storage_service = StorageService()
-        self.ai_service = AIService()
-        # Recipe categories
-        self.recipe_categories = {
-            "main_course": "Main Course / मुख्य व्यंजन",
-            "appetizer": "Appetizer / स्टार्टर",
-            "dessert": "Dessert / मिठाई",
-            "snack": "Snack / नाश्ता",
-            "beverage": "Beverage / पेय",
-            "breakfast": "Breakfast / नाश्ता",
-            "festival_special": "Festival Special / त्योहारी व्यंजन",
-            "regional_specialty": "Regional Specialty / क्षेत्रीय विशेषता"
-        }
-        # Cooking methods
-        self.cooking_methods = [
-            "Boiling / उबालना",
-            "Frying / तलना",
-            "Steaming / भाप में पकाना",
-            "Roasting / भूनना",
-            "Grilling / ग्रिल करना",
-            "Baking / बेक करना",
-            "Pressure Cooking / प्रेशर कुकिंग",
-            "Slow Cooking / धीमी आंच पर पकाना"
-        ]
-        # Dietary preferences
-        self.dietary_types = [
-            "Vegetarian / शाकाहारी",
-            "Vegan / वीगन",
-            "Non-Vegetarian / मांसाहारी",
-            "Jain / जैन",
-            "Gluten-Free / ग्लूटन फ्री",
-            "Dairy-Free / डेयरी फ्री"
-        ]
-    def render_interface(self) -> None:
-        """Render the recipe exchange interface"""
-        # Step 1: Recipe Basic Information
-        st.subheader("🍛 Step 1: Recipe Information")
-        col1, col2 = st.columns(2)
-        with col1:
-            recipe_name = st.text_input(
-                "Recipe Name / व्यंजन का नाम:",
-                placeholder="e.g., Grandma's Dal Tadka",
-                key="recipe_name"
-            )
-            category = st.selectbox(
-                "Category / श्रेणी:",
-                list(self.recipe_categories.keys()),
-                format_func=lambda x: self.recipe_categories[x],
-                key="recipe_category"
-            )
-        with col2:
-            prep_time = st.number_input(
-                "Preparation Time (minutes) / तैयारी का समय:",
-                min_value=1,
-                max_value=300,
-                value=30,
-                key="prep_time"
-            )
-            cook_time = st.number_input(
-                "Cooking Time (minutes) / पकाने का समय:",
-                min_value=1,
-                max_value=480,
-                value=45,
-                key="cook_time"
-            )
-        servings = st.number_input(
-            "Number of Servings / परोसने की मात्रा:",
-            min_value=1,
-            max_value=20,
-            value=4,
-            key="servings"
-        )
-        # Step 2: Language Selection
-        language = self.render_language_selector("recipe_language")
-        # Step 3: Ingredients
-        st.subheader("🥕 Step 2: Ingredients / सामग्री")
-        # Dynamic ingredient list
-        if 'recipe_ingredients' not in st.session_state:
-            st.session_state.recipe_ingredients = [{"name": "", "quantity": "", "unit": ""}]
-        ingredients = []
-        for i, ingredient in enumerate(st.session_state.recipe_ingredients):
-            col1, col2, col3, col4 = st.columns([3, 2, 2, 1])
-            with col1:
-                name = st.text_input(
-                    f"Ingredient {i+1}:",
-                    value=ingredient["name"],
-                    placeholder="e.g., Basmati Rice / बासमती चावल",
-                    key=f"ingredient_name_{i}"
-                )
-            with col2:
-                quantity = st.text_input(
-                    "Quantity:",
-                    value=ingredient["quantity"],
-                    placeholder="e.g., 2",
-                    key=f"ingredient_quantity_{i}"
-                )
-            with col3:
-                unit = st.selectbox(
-                    "Unit:",
-                    ["cups", "tbsp", "tsp", "kg", "grams", "pieces", "as needed"],
-                    index=0 if not ingredient["unit"] else ["cups", "tbsp", "tsp", "kg", "grams", "pieces", "as needed"].index(ingredient["unit"]) if ingredient["unit"] in ["cups", "tbsp", "tsp", "kg", "grams", "pieces", "as needed"] else 0,
-                    key=f"ingredient_unit_{i}"
-                )
-            with col4:
-                if st.button("❌", key=f"remove_ingredient_{i}"):
-                    if len(st.session_state.recipe_ingredients) > 1:
-                        st.session_state.recipe_ingredients.pop(i)
-                        st.rerun()
-            ingredients.append({
-                "name": name,
-                "quantity": quantity,
-                "unit": unit
-            })
-        # Update session state
-        st.session_state.recipe_ingredients = ingredients
-        # Add ingredient button
-        if st.button("➕ Add Ingredient", key="add_ingredient"):
-            st.session_state.recipe_ingredients.append({"name": "", "quantity": "", "unit": ""})
-            st.rerun()
-        # Step 4: Instructions
-        st.subheader("📝 Step 3: Cooking Instructions / पकाने की विधि")
-        instructions = st.text_area(
-            "Step-by-step instructions:",
-            placeholder=f"Write detailed cooking instructions in {language}...\n\n1. First step...\n2. Second step...\n3. Final step...",
-            height=200,
-            key="recipe_instructions"
-        )
-        # Step 5: Additional Details
-        st.subheader("ℹ️ Step 4: Additional Details")
-        col1, col2 = st.columns(2)
-        with col1:
-            cooking_method = st.multiselect(
-                "Cooking Methods / पकाने की विधि:",
-                self.cooking_methods,
-                key="cooking_methods"
-            )
-            dietary_type = st.multiselect(
-                "Dietary Type / आहार प्रकार:",
-                self.dietary_types,
-                key="dietary_types"
-            )
-        with col2:
-            difficulty_level = st.select_slider(
-                "Difficulty Level / कठिनाई स्तर:",
-                options=["Easy / आसान", "Medium / मध्यम", "Hard / कठिन"],
-                value="Medium / मध्यम",
-                key="difficulty"
-            )
-            spice_level = st.select_slider(
-                "Spice Level / मसाला स्तर:",
-                options=["Mild / हल्का", "Medium / मध्यम", "Spicy / तीखा", "Very Spicy / बहुत तीखा"],
-                value="Medium / मध्यम",
-                key="spice_level"
-            )
-        # Step 6: Family Story & Cultural Context
-        st.subheader("👨‍👩‍👧‍👦 Step 5: Family Story & Cultural Context")
-        family_story = st.text_area(
-            "Family Story / पारिवारिक कहानी:",
-            placeholder="Share the story behind this recipe - who taught you, when it's made, special memories...",
-            height=120,
-            key="family_story"
-        )
-        # Cultural context form
-        cultural_context = self.render_cultural_context_form("recipe_cultural")
-        # Add recipe-specific cultural context
-        col1, col2 = st.columns(2)
-        with col1:
-            occasion = st.text_input(
-                "Special Occasion / विशेष अवसर:",
-                placeholder="e.g., Diwali, Wedding, Daily meal",
-                key="recipe_occasion"
-            )
-        with col2:
-            origin_story = st.text_input(
-                "Origin / मूल स्थान:",
-                placeholder="e.g., Grandmother's village, Family tradition",
-                key="recipe_origin"
-            )
-        # Add to cultural context
-        cultural_context.update({
-            "family_story": family_story.strip() if family_story else "",
-            "occasion": occasion.strip() if occasion else "",
-            "origin_story": origin_story.strip() if origin_story else ""
-        })
-        # Step 7: AI Suggestions (Optional)
-        if st.checkbox("🤖 Get AI Suggestions", key="ai_suggestions"):
-            self._render_ai_suggestions(recipe_name, ingredients, language)
-        # Step 8: Preview and Submit
-        st.subheader("👀 Step 6: Preview & Submit")
-        # Validate required fields
-        valid_ingredients = [ing for ing in ingredients if ing["name"].strip()]
-        if recipe_name and instructions and len(valid_ingredients) > 0:
-            # Show recipe preview
-            with st.expander("📖 Recipe Preview"):
-                self._render_recipe_preview(
-                    recipe_name, category, prep_time, cook_time, servings,
-                    valid_ingredients, instructions, cooking_method, dietary_type,
-                    difficulty_level, spice_level, family_story, language
-                )
-            # Prepare content data
-            content_data = {
-                "title": recipe_name,
-                "category": category,
-                "prep_time": prep_time,
-                "cook_time": cook_time,
-                "servings": servings,
-                "ingredients": valid_ingredients,
-                "instructions": instructions,
-                "cooking_methods": cooking_method,
-                "dietary_types": dietary_type,
-                "difficulty_level": difficulty_level,
-                "spice_level": spice_level,
-                "family_story": family_story
-            }
-            # Submit section
-            contribution = self.render_submission_section(
-                content_data, cultural_context, language
-            )
-            if contribution:
-                # Save to storage
-                success = self.storage_service.save_contribution(contribution)
-                if success:
-                    st.success("🎉 Your family recipe has been added to the cultural corpus!")
-                    # Show impact message
-                    with st.expander("🌟 Why This Matters"):
-                        st.markdown(f"""
-                        Your recipe in **{language}** helps preserve:
-                        - Traditional cooking knowledge and techniques
-                        - Family stories and cultural memories
-                        - Regional food vocabulary and terminology
-                        - Culinary heritage for future generations
-                        Thank you for sharing your family's culinary wisdom! 🍛
-                        """)
-                    # Clear form
-                    if st.button("🆕 Share Another Recipe"):
-                        # Clear session state
-                        for key in list(st.session_state.keys()):
-                            if key.startswith(('recipe_', 'ingredient_')):
-                                del st.session_state[key]
-                        st.session_state.recipe_ingredients = [{"name": "", "quantity": "", "unit": ""}]
-                        st.rerun()
-                else:
-                    st.error("Failed to save your recipe. Please try again.")
-        else:
-            st.warning("Please fill in the recipe name, instructions, and at least one ingredient!")
-    def _render_ai_suggestions(self, recipe_name: str, ingredients: List[Dict], language: str):
-        """Render AI-powered suggestions"""
-        if recipe_name:
-            st.markdown("**🤖 AI Suggestions:**")
-            col1, col2 = st.columns(2)
-            with col1:
-                if st.button("💡 Suggest Cooking Tips", key="ai_tips"):
-                    with st.spinner("Generating cooking tips..."):
-                        tips, confidence = self.ai_service.generate_text(
-                            f"cooking tips for {recipe_name}",
-                            language,
-                            max_length=150
-                        )
-                        if tips:
-                            st.info(f"💡 **Tip:** {tips}")
-            with col2:
-                if st.button("🏷️ Suggest Tags", key="ai_tags"):
-                    ingredient_names = [ing["name"] for ing in ingredients if ing["name"].strip()]
-                    content = f"{recipe_name} {' '.join(ingredient_names)}"
-                    tags = self.ai_service.suggest_cultural_tags(content, language)
-                    if tags:
-                        st.info(f"🏷️ **Suggested tags:** {', '.join(tags[:5])}")
-    def _render_recipe_preview(self, name: str, category: str, prep_time: int,
-                              cook_time: int, servings: int, ingredients: List[Dict],
-                              instructions: str, cooking_methods: List[str],
-                              dietary_types: List[str], difficulty: str,
-                              spice_level: str, family_story: str, language: str):
-        """Render recipe preview"""
-        st.markdown(f"# {name}")
-        # Basic info
-        col1, col2, col3, col4 = st.columns(4)
-        with col1:
-            st.metric("Prep Time", f"{prep_time} min")
-        with col2:
-            st.metric("Cook Time", f"{cook_time} min")
-        with col3:
-            st.metric("Servings", servings)
-        with col4:
-            st.metric("Total Time", f"{prep_time + cook_time} min")
-        # Category and details
-        st.markdown(f"**Category:** {self.recipe_categories[category]}")
-        st.markdown(f"**Difficulty:** {difficulty}")
-        st.markdown(f"**Spice Level:** {spice_level}")
-        if dietary_types:
-            st.markdown(f"**Dietary:** {', '.join(dietary_types)}")
-        if cooking_methods:
-            st.markdown(f"**Cooking Methods:** {', '.join(cooking_methods)}")
-        # Ingredients
-        st.markdown("## Ingredients")
-        for ing in ingredients:
-            if ing["name"].strip():
-                st.markdown(f"- {ing['quantity']} {ing['unit']} {ing['name']}")
-        # Instructions
-        st.markdown("## Instructions")
-        st.markdown(instructions)
-        # Family story
-        if family_story.strip():
-            st.markdown("## Family Story")
-            st.markdown(family_story)
-    def validate_content(self, content: Dict[str, Any]) -> Tuple[bool, str]:
-        """Validate recipe content"""
-        # Check required fields
-        required_fields = ["title", "ingredients", "instructions"]
-        for field in required_fields:
-            if not content.get(field):
-                return False, f"Recipe must include {field}"
-        # Validate title
-        title = content["title"].strip()
-        if len(title) < 3:
-            return False, "Recipe title must be at least 3 characters long"
-        if len(title) > 100:
-            return False, "Recipe title must be less than 100 characters"
-        # Validate ingredients
-        ingredients = content.get("ingredients", [])
-        valid_ingredients = [ing for ing in ingredients if ing.get("name", "").strip()]
-        if len(valid_ingredients) == 0:
-            return False, "Recipe must have at least one ingredient"
-        # Validate instructions
-        instructions = content["instructions"].strip()
-        if len(instructions) < 20:
-            return False, "Instructions must be at least 20 characters long"
-        if len(instructions) > 5000:
-            return False, "Instructions must be less than 5000 characters"
-        # Validate time values
-        if content.get("prep_time", 0) <= 0:
-            return False, "Preparation time must be greater than 0"
-        if content.get("cook_time", 0) <= 0:
-            return False, "Cooking time must be greater than 0"
-        if content.get("servings", 0) <= 0:
-            return False, "Number of servings must be greater than 0"
-        return True, "Valid recipe content"
-    def process_submission(self, data: Dict[str, Any]) -> UserContribution:
-        """Process recipe submission and create UserContribution"""
-        # Get session ID from router if available
-        session_id = st.session_state.get('user_session_id', 'anonymous')
-        # Calculate total time
-        total_time = data["content_data"].get("prep_time", 0) + data["content_data"].get("cook_time", 0)
-        return UserContribution(
-            user_session=session_id,
-            activity_type=self.activity_type,
-            content_data=data["content_data"],
-            language=data["language"],
-            cultural_context=data["cultural_context"],
-            metadata={
-                "recipe_category": data["content_data"].get("category"),
-                "total_time_minutes": total_time,
-                "ingredient_count": len([ing for ing in data["content_data"].get("ingredients", []) if ing.get("name", "").strip()]),
-                "difficulty_level": data["content_data"].get("difficulty_level"),
-                "dietary_types": data["content_data"].get("dietary_types", []),
-                "submission_timestamp": datetime.now().isoformat(),
-                "activity_version": "1.0"
-            }
-        )
-    def render_recipe_gallery(self):
-        """Render gallery of recent recipes"""
-        st.subheader("🍽️ Community Recipe Collection")
-        # Get recent recipes from storage
-        recent_contributions = self.storage_service.get_contributions_by_language(
-            st.session_state.get('selected_language', 'hi'), limit=9
-        )
-        recipe_contributions = [
-            contrib for contrib in recent_contributions
-            if contrib.activity_type == ActivityType.RECIPE
-        ]
-        if recipe_contributions:
-            # Display recipes in grid
-            cols = st.columns(3)
-            for i, contrib in enumerate(recipe_contributions[:9]):
-                col = cols[i % 3]
-                with col:
-                    with st.container():
-                        st.markdown(f"**{contrib.content_data.get('title', 'Untitled Recipe')}**")
-                        # Recipe details
-                        category = contrib.content_data.get('category', 'unknown')
-                        if category in self.recipe_categories:
-                            st.markdown(f"*{self.recipe_categories[category]}*")
-                        # Time and servings
-                        prep_time = contrib.content_data.get('prep_time', 0)
-                        cook_time = contrib.content_data.get('cook_time', 0)
-                        servings = contrib.content_data.get('servings', 0)
-                        st.markdown(f"⏱️ {prep_time + cook_time} min | 👥 {servings} servings")
-                        # Language and region
-                        st.markdown(f"🌐 {contrib.language}")
-                        if contrib.cultural_context.get("region"):
-                            st.markdown(f"📍 {contrib.cultural_context['region']}")
-                        # Family story preview
-                        family_story = contrib.content_data.get('family_story', '')
-                        if family_story:
-                            st.markdown(f"👨‍👩‍👧‍👦 {family_story[:50]}...")
-                        st.markdown("---")
-        else:
-            st.info("No recipes yet. Be the first to share your family recipe! 🍛")
-    def run(self):
-        """Override run method to add gallery option"""
-        super().run()
-        # Add gallery section
-        st.markdown("---")
-        with st.expander("🍽️ Community Recipe Gallery"):
-            self.render_recipe_gallery()

intern_project/corpus_collection_engine/app.py DELETED Viewed

@@ -1,17 +0,0 @@
-#!/usr/bin/env python3
-"""
-Corpus Collection Engine - Hugging Face Spaces Entry Point
-AI-powered app for collecting diverse data on Indian languages, history, and culture
-"""
-import sys
-import os
-# Add the current directory to Python path for imports
-sys.path.append(os.path.dirname(os.path.abspath(__file__)))
-# Import and run the main application
-from corpus_collection_engine.main import main
-if __name__ == "__main__":
-    main()

intern_project/corpus_collection_engine/config.py DELETED Viewed

@@ -1,71 +0,0 @@
-"""
-Configuration settings for the Corpus Collection Engine
-"""
-import os
-from pathlib import Path
-from typing import List, Dict
-# Project paths
-PROJECT_ROOT = Path(__file__).parent.parent
-DATA_DIR = PROJECT_ROOT / "data"
-MODELS_DIR = PROJECT_ROOT / "models"
-CACHE_DIR = PROJECT_ROOT / ".cache"
-# Supported Indic languages
-SUPPORTED_LANGUAGES: Dict[str, str] = {
-    'hi': 'Hindi',
-    'bn': 'Bengali',
-    'ta': 'Tamil',
-    'te': 'Telugu',
-    'ml': 'Malayalam',
-    'kn': 'Kannada',
-    'gu': 'Gujarati',
-    'mr': 'Marathi',
-    'pa': 'Punjabi',
-    'or': 'Odia',
-    'en': 'English'
-}
-# Activity types
-ACTIVITY_TYPES: List[str] = [
-    'meme',
-    'recipe',
-    'folklore',
-    'landmark'
-]
-# AI model configurations
-AI_CONFIG = {
-    'text_model': 'sarvamai/sarvam-1',
-    'vision_model': 'microsoft/DiT-base',
-    'max_tokens': 512,
-    'temperature': 0.7
-}
-# Database configuration
-DATABASE_CONFIG = {
-    'local_db': 'sqlite:///corpus_collection.db',
-    'remote_db': os.getenv('DATABASE_URL', ''),
-    'batch_size': 100
-}
-# PWA and offline configuration
-PWA_CONFIG = {
-    'cache_version': 'v1.0.0',
-    'offline_timeout': 5000,  # milliseconds
-    'sync_interval': 300000,  # 5 minutes in milliseconds
-    'max_offline_storage': 50 * 1024 * 1024  # 50MB
-}
-# Content validation settings
-VALIDATION_CONFIG = {
-    'min_text_length': 10,
-    'max_text_length': 5000,
-    'max_image_size': 10 * 1024 * 1024,  # 10MB
-    'allowed_image_types': ['jpg', 'jpeg', 'png', 'webp']
-}
-# Create necessary directories
-for directory in [DATA_DIR, MODELS_DIR, CACHE_DIR]:
-    directory.mkdir(exist_ok=True)

intern_project/corpus_collection_engine/data/corpus_collection.db DELETED Viewed

Binary file (53.2 kB)

intern_project/corpus_collection_engine/main.py DELETED Viewed

@@ -1,212 +0,0 @@
-"""
-Corpus Collection Engine - Main Streamlit Application
-AI-powered app for collecting diverse data on Indian languages, history, and culture
-"""
-import streamlit as st
-import sys
-import os
-# Add the parent directory to Python path for imports
-sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-from corpus_collection_engine.activities.activity_router import ActivityRouter
-from corpus_collection_engine.utils.performance_optimizer import PerformanceOptimizer
-from corpus_collection_engine.utils.error_handler import global_error_handler, ErrorCategory, ErrorSeverity
-from corpus_collection_engine.services.privacy_service import PrivacyService
-from corpus_collection_engine.services.engagement_service import EngagementService
-from corpus_collection_engine.pwa.pwa_manager import PWAManager
-from corpus_collection_engine.utils.performance_dashboard import PerformanceDashboard
-# Configure Streamlit page
-st.set_page_config(
-    page_title="Corpus Collection Engine",
-    page_icon="🇮🇳",
-    layout="wide",
-    initial_sidebar_state="expanded"
-)
-def initialize_application():
-    """Initialize all application services and components"""
-    # Initialize session state for global app management
-    if 'app_initialized' not in st.session_state:
-        st.session_state.app_initialized = False
-        st.session_state.privacy_consent_given = False
-        st.session_state.onboarding_completed = False
-        st.session_state.admin_mode = False
-    # Initialize services
-    services = {}
-    try:
-        # Performance optimization
-        services['optimizer'] = PerformanceOptimizer()
-        services['optimizer'].initialize_performance_optimization()
-        # Privacy service
-        services['privacy'] = PrivacyService()
-        # Engagement service
-        services['engagement'] = EngagementService()
-        # PWA manager
-        services['pwa'] = PWAManager()
-        services['pwa'].initialize_pwa()
-        # Performance dashboard (for admin mode)
-        services['performance_dashboard'] = PerformanceDashboard()
-        st.session_state.app_initialized = True
-        return services
-    except Exception as e:
-        global_error_handler.handle_error(
-            e,
-            ErrorCategory.SYSTEM,
-            ErrorSeverity.HIGH,
-            context={'component': 'app_initialization'},
-            show_user_message=True
-        )
-        return {}
-def render_admin_interface(services):
-    """Render admin interface for monitoring and management"""
-    if not st.session_state.get('admin_mode', False):
-        return
-    with st.sidebar.expander("🔧 Admin Panel"):
-        st.markdown("**System Monitoring**")
-        if st.button("📊 Performance Dashboard"):
-            st.session_state.show_performance_dashboard = True
-        if st.button("🚨 Error Dashboard"):
-            st.session_state.show_error_dashboard = True
-        if st.button("📈 Analytics Dashboard"):
-            st.session_state.show_analytics_dashboard = True
-        st.markdown("**System Actions**")
-        if st.button("🧹 Clear Cache"):
-            st.cache_data.clear()
-            st.success("Cache cleared!")
-        if st.button("🔄 Reset Session"):
-            for key in list(st.session_state.keys()):
-                if key not in ['app_initialized']:
-                    del st.session_state[key]
-            st.success("Session reset!")
-            st.rerun()
-def render_admin_dashboards(services):
-    """Render admin dashboards when requested"""
-    if st.session_state.get('show_performance_dashboard', False):
-        st.markdown("---")
-        services['performance_dashboard'].render_dashboard()
-        if st.button("❌ Close Performance Dashboard"):
-            st.session_state.show_performance_dashboard = False
-            st.rerun()
-    if st.session_state.get('show_error_dashboard', False):
-        st.markdown("---")
-        global_error_handler.render_error_dashboard()
-        if st.button("❌ Close Error Dashboard"):
-            st.session_state.show_error_dashboard = False
-            st.rerun()
-    if st.session_state.get('show_analytics_dashboard', False):
-        st.markdown("---")
-        if 'router' in st.session_state:
-            router = st.session_state.router
-            if hasattr(router, 'analytics_service'):
-                router.analytics_service.render_analytics_dashboard()
-        if st.button("❌ Close Analytics Dashboard"):
-            st.session_state.show_analytics_dashboard = False
-            st.rerun()
-def handle_privacy_consent(privacy_service):
-    """Handle privacy consent flow - Auto-consent for public deployment"""
-    # Auto-consent for Hugging Face Spaces deployment
-    if not st.session_state.privacy_consent_given:
-        st.session_state.privacy_consent_given = True
-        # Initialize privacy service without requiring explicit consent
-        privacy_service.initialize_privacy_management()
-def handle_onboarding(engagement_service):
-    """Handle user onboarding flow - Optional for public deployment"""
-    if not st.session_state.onboarding_completed and st.session_state.privacy_consent_given:
-        # Auto-complete onboarding for public deployment
-        st.session_state.onboarding_completed = True
-        # Show optional welcome message in sidebar
-        with st.sidebar:
-            st.success("🎉 Welcome to Corpus Collection Engine!")
-            st.markdown("Help preserve Indian cultural heritage through AI!")
-            if st.button("ℹ️ Show Quick Guide"):
-                st.session_state.show_quick_guide = True
-def enable_admin_mode():
-    """Enable admin mode for Hugging Face Spaces deployment"""
-    # Admin mode is always enabled for public deployment
-    st.session_state.admin_mode = True
-def main():
-    """Main application entry point"""
-    try:
-        # Initialize application services
-        services = initialize_application()
-        if not services:
-            st.error("Failed to initialize application services. Please refresh the page.")
-            return
-        # Show performance indicator
-        services['optimizer'].render_performance_indicator()
-        # Apply Streamlit-specific optimizations
-        services['optimizer'].optimize_streamlit_config()
-        # Enable admin mode for public deployment
-        enable_admin_mode()
-        # Handle privacy consent
-        handle_privacy_consent(services['privacy'])
-        # Handle onboarding
-        handle_onboarding(services['engagement'])
-        # Initialize activity router
-        router = ActivityRouter()
-        st.session_state.router = router  # Store for admin access
-        # Render admin interface
-        render_admin_interface(services)
-        # Run main application
-        router.run()
-        # Render admin dashboards if requested
-        render_admin_dashboards(services)
-        # Show engagement features
-        services['engagement'].render_session_summary()
-    except Exception as e:
-        # Handle critical application errors
-        global_error_handler.handle_error(
-            e,
-            ErrorCategory.SYSTEM,
-            ErrorSeverity.CRITICAL,
-            context={'component': 'main_application'},
-            show_user_message=True
-        )
-        # Show fallback interface
-        st.error("🚨 Critical application error occurred. Please refresh the page.")
-        if st.button("🔄 Refresh Application"):
-            st.rerun()
-if __name__ == "__main__":
-    main()

intern_project/corpus_collection_engine/models/__init__.py DELETED Viewed

	@@ -1 +0,0 @@
1	- # Data models for user contributions and corpus entries

intern_project/corpus_collection_engine/models/data_models.py DELETED Viewed

@@ -1,149 +0,0 @@
-"""
-Core data models for the Corpus Collection Engine
-"""
-from dataclasses import dataclass, field
-from datetime import datetime
-from typing import Dict, List, Optional, Any
-from enum import Enum
-import uuid
-import json
-class ActivityType(Enum):
-    """Supported activity types"""
-    MEME = "meme"
-    RECIPE = "recipe"
-    FOLKLORE = "folklore"
-    LANDMARK = "landmark"
-class ValidationStatus(Enum):
-    """Content validation status"""
-    PENDING = "pending"
-    APPROVED = "approved"
-    REJECTED = "rejected"
-    NEEDS_REVIEW = "needs_review"
-@dataclass
-class UserContribution:
-    """Model for user contributions across all activities"""
-    id: str = field(default_factory=lambda: str(uuid.uuid4()))
-    user_session: str = ""
-    activity_type: ActivityType = ActivityType.MEME
-    content_data: Dict[str, Any] = field(default_factory=dict)
-    language: str = "en"
-    region: Optional[str] = None
-    cultural_context: Dict[str, Any] = field(default_factory=dict)
-    timestamp: datetime = field(default_factory=datetime.now)
-    validation_status: ValidationStatus = ValidationStatus.PENDING
-    metadata: Dict[str, Any] = field(default_factory=dict)
-    def to_dict(self) -> Dict[str, Any]:
-        """Convert to dictionary for storage"""
-        return {
-            'id': self.id,
-            'user_session': self.user_session,
-            'activity_type': self.activity_type.value,
-            'content_data': json.dumps(self.content_data),
-            'language': self.language,
-            'region': self.region,
-            'cultural_context': json.dumps(self.cultural_context),
-            'timestamp': self.timestamp.isoformat(),
-            'validation_status': self.validation_status.value,
-            'metadata': json.dumps(self.metadata)
-        }
-    @classmethod
-    def from_dict(cls, data: Dict[str, Any]) -> 'UserContribution':
-        """Create instance from dictionary"""
-        return cls(
-            id=data['id'],
-            user_session=data['user_session'],
-            activity_type=ActivityType(data['activity_type']),
-            content_data=json.loads(data['content_data']),
-            language=data['language'],
-            region=data.get('region'),
-            cultural_context=json.loads(data['cultural_context']),
-            timestamp=datetime.fromisoformat(data['timestamp']),
-            validation_status=ValidationStatus(data['validation_status']),
-            metadata=json.loads(data['metadata'])
-        )
-@dataclass
-class CorpusEntry:
-    """Model for processed corpus entries"""
-    id: str = field(default_factory=lambda: str(uuid.uuid4()))
-    contribution_id: str = ""
-    text_content: Optional[str] = None
-    image_content: Optional[bytes] = None
-    language: str = "en"
-    cultural_tags: List[str] = field(default_factory=list)
-    quality_score: float = 0.0
-    processed_features: Dict[str, Any] = field(default_factory=dict)
-    created_at: datetime = field(default_factory=datetime.now)
-    def to_dict(self) -> Dict[str, Any]:
-        """Convert to dictionary for storage"""
-        return {
-            'id': self.id,
-            'contribution_id': self.contribution_id,
-            'text_content': self.text_content,
-            'image_content': self.image_content,
-            'language': self.language,
-            'cultural_tags': json.dumps(self.cultural_tags),
-            'quality_score': self.quality_score,
-            'processed_features': json.dumps(self.processed_features),
-            'created_at': self.created_at.isoformat()
-        }
-    @classmethod
-    def from_dict(cls, data: Dict[str, Any]) -> 'CorpusEntry':
-        """Create instance from dictionary"""
-        return cls(
-            id=data['id'],
-            contribution_id=data['contribution_id'],
-            text_content=data.get('text_content'),
-            image_content=data.get('image_content'),
-            language=data['language'],
-            cultural_tags=json.loads(data['cultural_tags']),
-            quality_score=data['quality_score'],
-            processed_features=json.loads(data['processed_features']),
-            created_at=datetime.fromisoformat(data['created_at'])
-        )
-@dataclass
-class ActivitySession:
-    """Model for tracking user activity sessions"""
-    session_id: str = field(default_factory=lambda: str(uuid.uuid4()))
-    user_id: Optional[str] = None
-    activity_type: ActivityType = ActivityType.MEME
-    start_time: datetime = field(default_factory=datetime.now)
-    contributions: List[str] = field(default_factory=list)
-    engagement_metrics: Dict[str, Any] = field(default_factory=dict)
-    def to_dict(self) -> Dict[str, Any]:
-        """Convert to dictionary for storage"""
-        return {
-            'session_id': self.session_id,
-            'user_id': self.user_id,
-            'activity_type': self.activity_type.value,
-            'start_time': self.start_time.isoformat(),
-            'contributions': json.dumps(self.contributions),
-            'engagement_metrics': json.dumps(self.engagement_metrics)
-        }
-    @classmethod
-    def from_dict(cls, data: Dict[str, Any]) -> 'ActivitySession':
-        """Create instance from dictionary"""
-        return cls(
-            session_id=data['session_id'],
-            user_id=data.get('user_id'),
-            activity_type=ActivityType(data['activity_type']),
-            start_time=datetime.fromisoformat(data['start_time']),
-            contributions=json.loads(data['contributions']),
-            engagement_metrics=json.loads(data['engagement_metrics'])
-        )

intern_project/corpus_collection_engine/models/validation.py DELETED Viewed

@@ -1,223 +0,0 @@
-"""
-Validation functions for data models and user input
-"""
-from typing import Dict, List, Tuple, Any, Optional
-import re
-from datetime import datetime
-from corpus_collection_engine.models.data_models import UserContribution, CorpusEntry, ActivitySession, ActivityType
-from corpus_collection_engine.config import VALIDATION_CONFIG, SUPPORTED_LANGUAGES
-class ValidationError(Exception):
-    """Custom exception for validation errors"""
-    pass
-class DataValidator:
-    """Validator class for all data models and user input"""
-    @staticmethod
-    def validate_text_content(text: str, min_length: int = None, max_length: int = None) -> Tuple[bool, str]:
-        """Validate text content length and basic format"""
-        if not text or not text.strip():
-            return False, "Text content cannot be empty"
-        text = text.strip()
-        min_len = min_length or VALIDATION_CONFIG['min_text_length']
-        max_len = max_length or VALIDATION_CONFIG['max_text_length']
-        if len(text) < min_len:
-            return False, f"Text must be at least {min_len} characters long"
-        if len(text) > max_len:
-            return False, f"Text must not exceed {max_len} characters"
-        # Check for suspicious patterns (basic spam detection)
-        if re.search(r'(.)\1{10,}', text):  # Repeated characters
-            return False, "Text contains suspicious repeated patterns"
-        return True, "Valid text content"
-    @staticmethod
-    def validate_language_code(language: str) -> Tuple[bool, str]:
-        """Validate language code against supported languages"""
-        if not language:
-            return False, "Language code cannot be empty"
-        if language not in SUPPORTED_LANGUAGES:
-            return False, f"Unsupported language code: {language}"
-        return True, f"Valid language: {SUPPORTED_LANGUAGES[language]}"
-    @staticmethod
-    def validate_image_data(image_data: bytes, max_size: int = None) -> Tuple[bool, str]:
-        """Validate image data size and basic format"""
-        if not image_data:
-            return False, "Image data cannot be empty"
-        max_size = max_size or VALIDATION_CONFIG['max_image_size']
-        if len(image_data) > max_size:
-            size_mb = len(image_data) / (1024 * 1024)
-            max_mb = max_size / (1024 * 1024)
-            return False, f"Image size ({size_mb:.1f}MB) exceeds maximum ({max_mb:.1f}MB)"
-        # Basic image format validation (check for common headers)
-        image_headers = {
-            b'\xff\xd8\xff': 'JPEG',
-            b'\x89PNG\r\n\x1a\n': 'PNG',
-            b'RIFF': 'WEBP'
-        }
-        is_valid_image = any(image_data.startswith(header) for header in image_headers.keys())
-        if not is_valid_image:
-            return False, "Invalid image format. Supported: JPEG, PNG, WEBP"
-        return True, "Valid image data"
-    @staticmethod
-    def validate_cultural_context(context: Dict[str, Any]) -> Tuple[bool, str]:
-        """Validate cultural context data"""
-        if not isinstance(context, dict):
-            return False, "Cultural context must be a dictionary"
-        # Check for required fields based on activity type
-        required_fields = ['region', 'cultural_significance']
-        missing_fields = [field for field in required_fields if field not in context]
-        if missing_fields:
-            return False, f"Missing required cultural context fields: {missing_fields}"
-        # Validate region if provided
-        if 'region' in context and context['region']:
-            region = context['region'].strip()
-            if len(region) < 2:
-                return False, "Region must be at least 2 characters long"
-        return True, "Valid cultural context"
-    @classmethod
-    def validate_user_contribution(cls, contribution: UserContribution) -> Tuple[bool, List[str]]:
-        """Comprehensive validation for UserContribution"""
-        errors = []
-        # Validate basic fields
-        if not contribution.user_session:
-            errors.append("User session ID is required")
-        if not isinstance(contribution.activity_type, ActivityType):
-            errors.append("Invalid activity type")
-        # Validate language
-        is_valid_lang, lang_msg = cls.validate_language_code(contribution.language)
-        if not is_valid_lang:
-            errors.append(lang_msg)
-        # Validate content data based on activity type
-        content_errors = cls._validate_activity_content(
-            contribution.activity_type,
-            contribution.content_data
-        )
-        errors.extend(content_errors)
-        # Validate cultural context
-        is_valid_context, context_msg = cls.validate_cultural_context(contribution.cultural_context)
-        if not is_valid_context:
-            errors.append(context_msg)
-        # Validate timestamp
-        if contribution.timestamp > datetime.now():
-            errors.append("Timestamp cannot be in the future")
-        return len(errors) == 0, errors
-    @classmethod
-    def _validate_activity_content(cls, activity_type: ActivityType, content_data: Dict[str, Any]) -> List[str]:
-        """Validate content data specific to activity type"""
-        errors = []
-        if activity_type == ActivityType.MEME:
-            if 'text' not in content_data:
-                errors.append("Meme content must include text")
-            else:
-                is_valid, msg = cls.validate_text_content(content_data['text'])
-                if not is_valid:
-                    errors.append(f"Meme text: {msg}")
-        elif activity_type == ActivityType.RECIPE:
-            required_fields = ['title', 'ingredients', 'instructions']
-            for field in required_fields:
-                if field not in content_data:
-                    errors.append(f"Recipe content must include {field}")
-                elif not content_data[field]:
-                    errors.append(f"Recipe {field} cannot be empty")
-        elif activity_type == ActivityType.FOLKLORE:
-            if 'story' not in content_data:
-                errors.append("Folklore content must include story")
-            else:
-                is_valid, msg = cls.validate_text_content(content_data['story'], min_length=50)
-                if not is_valid:
-                    errors.append(f"Folklore story: {msg}")
-        elif activity_type == ActivityType.LANDMARK:
-            if 'description' not in content_data:
-                errors.append("Landmark content must include description")
-            else:
-                is_valid, msg = cls.validate_text_content(content_data['description'])
-                if not is_valid:
-                    errors.append(f"Landmark description: {msg}")
-        return errors
-    @classmethod
-    def validate_corpus_entry(cls, entry: CorpusEntry) -> Tuple[bool, List[str]]:
-        """Comprehensive validation for CorpusEntry"""
-        errors = []
-        if not entry.contribution_id:
-            errors.append("Contribution ID is required")
-        # Must have either text or image content
-        if not entry.text_content and not entry.image_content:
-            errors.append("Corpus entry must have either text or image content")
-        # Validate text content if present
-        if entry.text_content:
-            is_valid, msg = cls.validate_text_content(entry.text_content)
-            if not is_valid:
-                errors.append(f"Text content: {msg}")
-        # Validate image content if present
-        if entry.image_content:
-            is_valid, msg = cls.validate_image_data(entry.image_content)
-            if not is_valid:
-                errors.append(f"Image content: {msg}")
-        # Validate language
-        is_valid_lang, lang_msg = cls.validate_language_code(entry.language)
-        if not is_valid_lang:
-            errors.append(lang_msg)
-        # Validate quality score
-        if not 0.0 <= entry.quality_score <= 1.0:
-            errors.append("Quality score must be between 0.0 and 1.0")
-        return len(errors) == 0, errors
-    @classmethod
-    def validate_activity_session(cls, session: ActivitySession) -> Tuple[bool, List[str]]:
-        """Comprehensive validation for ActivitySession"""
-        errors = []
-        if not session.session_id:
-            errors.append("Session ID is required")
-        if not isinstance(session.activity_type, ActivityType):
-            errors.append("Invalid activity type")
-        if session.start_time > datetime.now():
-            errors.append("Start time cannot be in the future")
-        return len(errors) == 0, errors

intern_project/corpus_collection_engine/pwa/offline.html DELETED Viewed

@@ -1,256 +0,0 @@
-<!DOCTYPE html>
-<html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
-    <title>Offline - Corpus Collection Engine</title>
-    <style>
-        body {
-            font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
-            margin: 0;
-            padding: 0;
-            background: linear-gradient(135deg, #FF6B35, #F7931E);
-            color: white;
-            min-height: 100vh;
-            display: flex;
-            align-items: center;
-            justify-content: center;
-        }
-        .offline-container {
-            text-align: center;
-            padding: 40px 20px;
-            max-width: 500px;
-        }
-        .offline-icon {
-            font-size: 80px;
-            margin-bottom: 20px;
-            opacity: 0.8;
-        }
-        .offline-title {
-            font-size: 32px;
-            font-weight: bold;
-            margin-bottom: 16px;
-        }
-        .offline-message {
-            font-size: 18px;
-            line-height: 1.6;
-            margin-bottom: 30px;
-            opacity: 0.9;
-        }
-        .offline-features {
-            background: rgba(255, 255, 255, 0.1);
-            border-radius: 12px;
-            padding: 24px;
-            margin: 30px 0;
-            text-align: left;
-        }
-        .offline-features h3 {
-            margin-top: 0;
-            margin-bottom: 16px;
-            font-size: 20px;
-        }
-        .offline-features ul {
-            list-style: none;
-            padding: 0;
-            margin: 0;
-        }
-        .offline-features li {
-            padding: 8px 0;
-            padding-left: 24px;
-            position: relative;
-        }
-        .offline-features li:before {
-            content: "✓";
-            position: absolute;
-            left: 0;
-            color: #4CAF50;
-            font-weight: bold;
-        }
-        .retry-button {
-            background: white;
-            color: #FF6B35;
-            border: none;
-            padding: 12px 24px;
-            border-radius: 6px;
-            font-size: 16px;
-            font-weight: bold;
-            cursor: pointer;
-            transition: transform 0.2s;
-        }
-        .retry-button:hover {
-            transform: translateY(-2px);
-        }
-        .retry-button:active {
-            transform: translateY(0);
-        }
-        .connection-status {
-            margin-top: 20px;
-            padding: 12px;
-            border-radius: 6px;
-            background: rgba(255, 255, 255, 0.1);
-            font-size: 14px;
-        }
-        .status-online {
-            background: rgba(76, 175, 80, 0.2);
-        }
-        .status-offline {
-            background: rgba(244, 67, 54, 0.2);
-        }
-        @keyframes pulse {
-            0% { opacity: 1; }
-            50% { opacity: 0.5; }
-            100% { opacity: 1; }
-        }
-        .checking {
-            animation: pulse 2s infinite;
-        }
-    </style>
-</head>
-<body>
-    <div class="offline-container">
-        <div class="offline-icon">📡</div>
-        <h1 class="offline-title">You're Offline</h1>
-        <p class="offline-message">
-            Don't worry! The Corpus Collection Engine works offline too.
-            Your cultural contributions will be saved locally and synced when you're back online.
-        </p>
-        <div class="offline-features">
-            <h3>🌟 What you can still do offline:</h3>
-            <ul>
-                <li>Create memes with local dialect captions</li>
-                <li>Write down family recipes and stories</li>
-                <li>Document folklore and traditional tales</li>
-                <li>Describe cultural landmarks (photos saved locally)</li>
-                <li>Browse previously loaded content</li>
-                <li>All contributions saved for later sync</li>
-            </ul>
-        </div>
-        <button class="retry-button" onclick="checkConnection()">
-            🔄 Check Connection
-        </button>
-        <div id="connection-status" class="connection-status">
-            <span id="status-text">Checking connection...</span>
-        </div>
-        <div style="margin-top: 30px; font-size: 14px; opacity: 0.8;">
-            <p>🇮🇳 Preserving Indian Culture Through AI</p>
-            <p>Even offline, every contribution matters!</p>
-        </div>
-    </div>
-    <script>
-        let isChecking = false;
-        function updateConnectionStatus(online) {
-            const statusElement = document.getElementById('connection-status');
-            const statusText = document.getElementById('status-text');
-            if (online) {
-                statusElement.className = 'connection-status status-online';
-                statusText.textContent = '✅ Connection restored! Redirecting...';
-                // Redirect to main app after a short delay
-                setTimeout(() => {
-                    window.location.href = '/';
-                }, 2000);
-            } else {
-                statusElement.className = 'connection-status status-offline';
-                statusText.textContent = '❌ Still offline. Your contributions will be saved locally.';
-            }
-        }
-        function checkConnection() {
-            if (isChecking) return;
-            isChecking = true;
-            const button = document.querySelector('.retry-button');
-            const statusText = document.getElementById('status-text');
-            button.textContent = '🔄 Checking...';
-            button.classList.add('checking');
-            statusText.textContent = 'Checking connection...';
-            // Try to fetch a small resource
-            fetch('/', {
-                method: 'HEAD',
-                cache: 'no-cache',
-                mode: 'no-cors'
-            })
-            .then(() => {
-                updateConnectionStatus(true);
-            })
-            .catch(() => {
-                updateConnectionStatus(false);
-            })
-            .finally(() => {
-                isChecking = false;
-                button.textContent = '🔄 Check Connection';
-                button.classList.remove('checking');
-            });
-        }
-        // Auto-check connection status
-        function autoCheckConnection() {
-            if (!isChecking) {
-                fetch('/', {
-                    method: 'HEAD',
-                    cache: 'no-cache',
-                    mode: 'no-cors'
-                })
-                .then(() => {
-                    updateConnectionStatus(true);
-                })
-                .catch(() => {
-                    // Still offline, continue checking
-                });
-            }
-        }
-        // Check connection every 10 seconds
-        setInterval(autoCheckConnection, 10000);
-        // Listen for online/offline events
-        window.addEventListener('online', () => updateConnectionStatus(true));
-        window.addEventListener('offline', () => updateConnectionStatus(false));
-        // Initial connection check
-        setTimeout(() => {
-            updateConnectionStatus(navigator.onLine);
-        }, 1000);
-        // Service worker message handling
-        if ('serviceWorker' in navigator) {
-            navigator.serviceWorker.addEventListener('message', event => {
-                const { type, count } = event.data;
-                if (type === 'SYNC_COMPLETE' && count > 0) {
-                    const statusText = document.getElementById('status-text');
-                    statusText.textContent = `✅ Synced ${count} contribution(s) successfully!`;
-                }
-            });
-        }
-    </script>
-</body>
-</html>

intern_project/corpus_collection_engine/pwa/pwa_manager.py DELETED Viewed

@@ -1,541 +0,0 @@
-"""
-PWA Manager for Streamlit integration and offline functionality
-"""
-import streamlit as st
-import json
-import os
-from typing import Dict, List, Any, Optional
-from pathlib import Path
-import logging
-from corpus_collection_engine.config import PWA_CONFIG, DATA_DIR
-class PWAManager:
-    """Manager for Progressive Web App functionality"""
-    def __init__(self):
-        self.logger = logging.getLogger(__name__)
-        self.config = PWA_CONFIG
-        self.offline_storage_path = os.path.join(DATA_DIR, "offline_data.json")
-        # Initialize PWA state in session
-        if 'pwa_initialized' not in st.session_state:
-            st.session_state.pwa_initialized = False
-            st.session_state.is_online = True
-            st.session_state.offline_contributions = []
-            st.session_state.sync_status = "idle"
-    def initialize_pwa(self):
-        """Initialize PWA functionality in Streamlit"""
-        if st.session_state.pwa_initialized:
-            return
-        try:
-            # Inject PWA components into Streamlit
-            self._inject_pwa_components()
-            # Register service worker
-            self._register_service_worker()
-            # Setup offline detection
-            self._setup_offline_detection()
-            # Load offline data
-            self._load_offline_data()
-            st.session_state.pwa_initialized = True
-            self.logger.info("PWA initialized successfully")
-        except Exception as e:
-            self.logger.error(f"PWA initialization failed: {e}")
-    def _inject_pwa_components(self):
-        """Inject PWA-related HTML components"""
-        # Web App Manifest
-        manifest = self._generate_manifest()
-        # PWA HTML components
-        pwa_html = f"""
-        <script>
-            // Web App Manifest
-            const manifestBlob = new Blob(['{json.dumps(manifest)}'], {{type: 'application/json'}});
-            const manifestURL = URL.createObjectURL(manifestBlob);
-            const manifestLink = document.createElement('link');
-            manifestLink.rel = 'manifest';
-            manifestLink.href = manifestURL;
-            document.head.appendChild(manifestLink);
-            // Viewport meta tag for mobile
-            const viewportMeta = document.createElement('meta');
-            viewportMeta.name = 'viewport';
-            viewportMeta.content = 'width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no';
-            document.head.appendChild(viewportMeta);
-            // Theme color
-            const themeColorMeta = document.createElement('meta');
-            themeColorMeta.name = 'theme-color';
-            themeColorMeta.content = '#FF6B35';
-            document.head.appendChild(themeColorMeta);
-            // Apple touch icon
-            const appleTouchIcon = document.createElement('link');
-            appleTouchIcon.rel = 'apple-touch-icon';
-            appleTouchIcon.href = 'data:image/svg+xml,<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 100 100"><rect width="100" height="100" fill="%23FF6B35"/><text x="50" y="55" font-size="40" text-anchor="middle" fill="white">🇮🇳</text></svg>';
-            document.head.appendChild(appleTouchIcon);
-            // PWA installation prompt
-            window.pwaInstallPrompt = null;
-            window.addEventListener('beforeinstallprompt', (e) => {{
-                e.preventDefault();
-                window.pwaInstallPrompt = e;
-                console.log('PWA install prompt available');
-            }});
-            // Online/offline detection
-            window.addEventListener('online', () => {{
-                console.log('Connection restored');
-                window.parent.postMessage({{type: 'CONNECTION_STATUS', online: true}}, '*');
-            }});
-            window.addEventListener('offline', () => {{
-                console.log('Connection lost');
-                window.parent.postMessage({{type: 'CONNECTION_STATUS', online: false}}, '*');
-            }});
-            // Initial connection status
-            window.parent.postMessage({{type: 'CONNECTION_STATUS', online: navigator.onLine}}, '*');
-        </script>
-        <style>
-            /* PWA-specific styles */
-            .pwa-offline-indicator {{
-                position: fixed;
-                top: 0;
-                left: 0;
-                right: 0;
-                background: #ff4444;
-                color: white;
-                text-align: center;
-                padding: 8px;
-                z-index: 9999;
-                font-size: 14px;
-                display: none;
-            }}
-            .pwa-sync-indicator {{
-                position: fixed;
-                bottom: 20px;
-                right: 20px;
-                background: #4CAF50;
-                color: white;
-                padding: 12px 16px;
-                border-radius: 4px;
-                font-size: 14px;
-                z-index: 9999;
-                display: none;
-            }}
-            .pwa-install-banner {{
-                background: linear-gradient(135deg, #FF6B35, #F7931E);
-                color: white;
-                padding: 16px;
-                border-radius: 8px;
-                margin: 16px 0;
-                text-align: center;
-            }}
-            .pwa-install-button {{
-                background: white;
-                color: #FF6B35;
-                border: none;
-                padding: 8px 16px;
-                border-radius: 4px;
-                font-weight: bold;
-                cursor: pointer;
-                margin-top: 8px;
-            }}
-        </style>
-        <div id="pwa-offline-indicator" class="pwa-offline-indicator">
-            📡 You're offline. Your contributions will be saved and synced when connection is restored.
-        </div>
-        <div id="pwa-sync-indicator" class="pwa-sync-indicator">
-            ✅ Contributions synced successfully!
-        </div>
-        """
-        st.components.v1.html(pwa_html, height=0)
-    def _generate_manifest(self) -> Dict[str, Any]:
-        """Generate Web App Manifest"""
-        return {
-            "name": "Corpus Collection Engine",
-            "short_name": "CorpusCollect",
-            "description": "AI-powered app for collecting diverse data on Indian languages, history, and culture",
-            "start_url": "/",
-            "display": "standalone",
-            "background_color": "#FFFFFF",
-            "theme_color": "#FF6B35",
-            "orientation": "portrait-primary",
-            "categories": ["education", "culture", "productivity"],
-            "lang": "en-IN",
-            "icons": [
-                {
-                    "src": "data:image/svg+xml,<svg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 192 192'><rect width='192' height='192' fill='%23FF6B35'/><text x='96' y='110' font-size='80' text-anchor='middle' fill='white'>🇮🇳</text></svg>",
-                    "sizes": "192x192",
-                    "type": "image/svg+xml",
-                    "purpose": "any maskable"
-                },
-                {
-                    "src": "data:image/svg+xml,<svg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 512 512'><rect width='512' height='512' fill='%23FF6B35'/><text x='256' y='290' font-size='200' text-anchor='middle' fill='white'>🇮🇳</text></svg>",
-                    "sizes": "512x512",
-                    "type": "image/svg+xml",
-                    "purpose": "any maskable"
-                }
-            ],
-            "screenshots": [
-                {
-                    "src": "data:image/svg+xml,<svg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 540 720'><rect width='540' height='720' fill='%23f8f9fa'/><rect x='20' y='60' width='500' height='80' fill='%23FF6B35' rx='8'/><text x='270' y='110' font-size='24' text-anchor='middle' fill='white'>Corpus Collection Engine</text></svg>",
-                    "sizes": "540x720",
-                    "type": "image/svg+xml",
-                    "form_factor": "narrow"
-                }
-            ]
-        }
-    def _register_service_worker(self):
-        """Register service worker for offline functionality"""
-        # Read service worker content
-        sw_path = Path(__file__).parent / "service_worker.js"
-        if sw_path.exists():
-            with open(sw_path, 'r', encoding='utf-8') as f:
-                sw_content = f.read()
-        else:
-            self.logger.warning("Service worker file not found")
-            return
-        # Inject service worker registration
-        sw_registration = f"""
-        <script>
-            if ('serviceWorker' in navigator) {{
-                // Create service worker from string content
-                const swBlob = new Blob([`{sw_content}`], {{type: 'application/javascript'}});
-                const swURL = URL.createObjectURL(swBlob);
-                navigator.serviceWorker.register(swURL)
-                    .then(registration => {{
-                        console.log('Service Worker registered successfully:', registration);
-                        // Listen for updates
-                        registration.addEventListener('updatefound', () => {{
-                            const newWorker = registration.installing;
-                            newWorker.addEventListener('statechange', () => {{
-                                if (newWorker.state === 'installed' && navigator.serviceWorker.controller) {{
-                                    console.log('New service worker available');
-                                    // Optionally show update notification
-                                }}
-                            }});
-                        }});
-                    }})
-                    .catch(error => {{
-                        console.error('Service Worker registration failed:', error);
-                    }});
-                // Listen for messages from service worker
-                navigator.serviceWorker.addEventListener('message', event => {{
-                    const {{ type, count }} = event.data;
-                    if (type === 'SYNC_COMPLETE') {{
-                        console.log(`Synced ${{count}} contributions`);
-                        window.parent.postMessage({{type: 'SYNC_COMPLETE', count}}, '*');
-                        // Show sync indicator
-                        const indicator = document.getElementById('pwa-sync-indicator');
-                        if (indicator) {{
-                            indicator.style.display = 'block';
-                            setTimeout(() => {{
-                                indicator.style.display = 'none';
-                            }}, 3000);
-                        }}
-                    }}
-                }});
-            }} else {{
-                console.log('Service Workers not supported');
-            }}
-        </script>
-        """
-        st.components.v1.html(sw_registration, height=0)
-    def _setup_offline_detection(self):
-        """Setup offline/online detection"""
-        # JavaScript for connection monitoring
-        connection_monitor = """
-        <script>
-            function updateConnectionStatus(online) {
-                const indicator = document.getElementById('pwa-offline-indicator');
-                if (indicator) {
-                    indicator.style.display = online ? 'none' : 'block';
-                }
-                // Update Streamlit session state
-                window.parent.postMessage({
-                    type: 'CONNECTION_STATUS',
-                    online: online
-                }, '*');
-            }
-            // Monitor connection status
-            window.addEventListener('online', () => updateConnectionStatus(true));
-            window.addEventListener('offline', () => updateConnectionStatus(false));
-            // Initial status
-            updateConnectionStatus(navigator.onLine);
-            // Periodic connectivity check
-            setInterval(() => {
-                fetch('/ping', {method: 'HEAD', cache: 'no-cache'})
-                    .then(() => updateConnectionStatus(true))
-                    .catch(() => updateConnectionStatus(false));
-            }, 30000); // Check every 30 seconds
-        </script>
-        """
-        st.components.v1.html(connection_monitor, height=0)
-    def _load_offline_data(self):
-        """Load offline contributions from storage"""
-        try:
-            if os.path.exists(self.offline_storage_path):
-                with open(self.offline_storage_path, 'r', encoding='utf-8') as f:
-                    offline_data = json.load(f)
-                    st.session_state.offline_contributions = offline_data.get('contributions', [])
-        except Exception as e:
-            self.logger.error(f"Failed to load offline data: {e}")
-            st.session_state.offline_contributions = []
-    def save_offline_contribution(self, contribution_data: Dict[str, Any]) -> bool:
-        """Save contribution for offline sync"""
-        try:
-            # Add timestamp and ID
-            contribution_data['offline_timestamp'] = st.session_state.get('current_timestamp', '')
-            contribution_data['offline_id'] = f"offline_{len(st.session_state.offline_contributions)}"
-            # Add to session state
-            st.session_state.offline_contributions.append(contribution_data)
-            # Save to file
-            offline_data = {
-                'contributions': st.session_state.offline_contributions,
-                'last_updated': st.session_state.get('current_timestamp', '')
-            }
-            os.makedirs(os.path.dirname(self.offline_storage_path), exist_ok=True)
-            with open(self.offline_storage_path, 'w', encoding='utf-8') as f:
-                json.dump(offline_data, f, indent=2, ensure_ascii=False)
-            self.logger.info(f"Saved offline contribution: {contribution_data.get('offline_id')}")
-            return True
-        except Exception as e:
-            self.logger.error(f"Failed to save offline contribution: {e}")
-            return False
-    def get_offline_contributions(self) -> List[Dict[str, Any]]:
-        """Get all offline contributions"""
-        return st.session_state.offline_contributions.copy()
-    def clear_offline_contributions(self):
-        """Clear all offline contributions after successful sync"""
-        st.session_state.offline_contributions = []
-        try:
-            if os.path.exists(self.offline_storage_path):
-                os.remove(self.offline_storage_path)
-        except Exception as e:
-            self.logger.error(f"Failed to clear offline storage file: {e}")
-    def render_offline_status(self):
-        """Render offline status and sync information"""
-        if not st.session_state.is_online:
-            st.warning("📡 You're currently offline. Your contributions will be saved locally and synced when connection is restored.")
-        # Show offline contributions count
-        offline_count = len(st.session_state.offline_contributions)
-        if offline_count > 0:
-            st.info(f"📱 {offline_count} contribution(s) saved offline, waiting for sync.")
-            if st.button("🔄 Try Sync Now", key="manual_sync"):
-                self.trigger_sync()
-    def render_install_prompt(self):
-        """Render PWA installation prompt"""
-        install_prompt = """
-        <script>
-            function showInstallPrompt() {
-                if (window.pwaInstallPrompt) {
-                    window.pwaInstallPrompt.prompt();
-                    window.pwaInstallPrompt.userChoice.then((choiceResult) => {
-                        if (choiceResult.outcome === 'accepted') {
-                            console.log('User accepted the install prompt');
-                        } else {
-                            console.log('User dismissed the install prompt');
-                        }
-                        window.pwaInstallPrompt = null;
-                    });
-                } else {
-                    alert('Install prompt not available. You can manually install from your browser menu.');
-                }
-            }
-        </script>
-        <div class="pwa-install-banner">
-            <h4>📱 Install Corpus Collection Engine</h4>
-            <p>Install our app for the best offline experience and quick access!</p>
-            <button class="pwa-install-button" onclick="showInstallPrompt()">
-                Install App
-            </button>
-        </div>
-        """
-        # Only show install prompt if not already installed
-        if not self._is_pwa_installed():
-            st.components.v1.html(install_prompt, height=150)
-    def _is_pwa_installed(self) -> bool:
-        """Check if PWA is already installed"""
-        # This is a simplified check - in reality, detection is more complex
-        user_agent = st.context.headers.get("User-Agent", "")
-        return "Mobile" in user_agent and "wv" in user_agent
-    def trigger_sync(self):
-        """Trigger manual sync of offline contributions"""
-        sync_script = """
-        <script>
-            if ('serviceWorker' in navigator && navigator.serviceWorker.controller) {
-                navigator.serviceWorker.controller.postMessage({
-                    type: 'TRIGGER_SYNC'
-                });
-                // Also trigger background sync if supported
-                if ('sync' in window.ServiceWorkerRegistration.prototype) {
-                    navigator.serviceWorker.ready.then(registration => {
-                        return registration.sync.register('sync-contributions');
-                    }).then(() => {
-                        console.log('Background sync registered');
-                    }).catch(error => {
-                        console.error('Background sync registration failed:', error);
-                    });
-                }
-            }
-        </script>
-        """
-        st.components.v1.html(sync_script, height=0)
-        st.session_state.sync_status = "syncing"
-    def get_pwa_status(self) -> Dict[str, Any]:
-        """Get current PWA status"""
-        return {
-            'initialized': st.session_state.pwa_initialized,
-            'online': st.session_state.is_online,
-            'offline_contributions': len(st.session_state.offline_contributions),
-            'sync_status': st.session_state.sync_status,
-            'cache_version': self.config['cache_version']
-        }
-    def render_pwa_debug_info(self):
-        """Render PWA debug information (for development)"""
-        if st.checkbox("🔧 Show PWA Debug Info", key="pwa_debug"):
-            status = self.get_pwa_status()
-            st.json(status)
-            if st.button("Clear PWA Cache", key="clear_cache"):
-                clear_cache_script = """
-                <script>
-                    if ('serviceWorker' in navigator) {
-                        navigator.serviceWorker.controller?.postMessage({
-                            type: 'CLEAR_CACHE'
-                        });
-                        caches.keys().then(cacheNames => {
-                            return Promise.all(
-                                cacheNames.map(cacheName => caches.delete(cacheName))
-                            );
-                        }).then(() => {
-                            console.log('All caches cleared');
-                            alert('PWA cache cleared successfully!');
-                        });
-                    }
-                </script>
-                """
-                st.components.v1.html(clear_cache_script, height=0)
-    def optimize_for_low_bandwidth(self):
-        """Apply optimizations for low-bandwidth environments"""
-        # Inject bandwidth optimization styles and scripts
-        optimization_html = """
-        <style>
-            /* Low bandwidth optimizations */
-            img {
-                max-width: 100%;
-                height: auto;
-                loading: lazy;
-            }
-            .stImage > img {
-                max-height: 400px;
-                object-fit: contain;
-            }
-            /* Reduce animations for slower connections */
-            @media (prefers-reduced-motion: reduce) {
-                * {
-                    animation-duration: 0.01ms !important;
-                    animation-iteration-count: 1 !important;
-                    transition-duration: 0.01ms !important;
-                }
-            }
-            /* Compress text rendering */
-            .stMarkdown {
-                text-rendering: optimizeSpeed;
-            }
-        </style>
-        <script>
-            // Detect slow connection and apply optimizations
-            if ('connection' in navigator) {
-                const connection = navigator.connection;
-                if (connection.effectiveType === 'slow-2g' || connection.effectiveType === '2g') {
-                    console.log('Slow connection detected, applying optimizations');
-                    // Reduce image quality
-                    document.querySelectorAll('img').forEach(img => {
-                        if (img.src && !img.dataset.optimized) {
-                            img.style.filter = 'blur(0.5px)'; // Slight blur to reduce perceived quality
-                            img.dataset.optimized = 'true';
-                        }
-                    });
-                    // Disable non-essential animations
-                    document.body.style.setProperty('--animation-duration', '0s');
-                }
-            }
-        </script>
-        """
-        st.components.v1.html(optimization_html, height=0)

intern_project/corpus_collection_engine/pwa/service_worker.js DELETED Viewed

@@ -1,335 +0,0 @@
-/**
- * Service Worker for Corpus Collection Engine PWA
- * Provides offline functionality and caching for low-bandwidth environments
- */
-const CACHE_NAME = 'corpus-collection-v1.0.0';
-const OFFLINE_URL = '/offline.html';
-// Resources to cache for offline functionality
-const CACHE_URLS = [
-    '/',
-    '/offline.html',
-    // Streamlit static assets will be added dynamically
-    '/static/css/bootstrap.min.css',
-    '/static/js/bootstrap.bundle.min.js',
-    // Add other critical assets
-];
-// Install event - cache critical resources
-self.addEventListener('install', event => {
-    console.log('Service Worker: Installing...');
-    event.waitUntil(
-        caches.open(CACHE_NAME)
-            .then(cache => {
-                console.log('Service Worker: Caching critical resources');
-                return cache.addAll(CACHE_URLS);
-            })
-            .then(() => {
-                console.log('Service Worker: Installation complete');
-                return self.skipWaiting();
-            })
-            .catch(error => {
-                console.error('Service Worker: Installation failed', error);
-            })
-    );
-});
-// Activate event - clean up old caches
-self.addEventListener('activate', event => {
-    console.log('Service Worker: Activating...');
-    event.waitUntil(
-        caches.keys()
-            .then(cacheNames => {
-                return Promise.all(
-                    cacheNames.map(cacheName => {
-                        if (cacheName !== CACHE_NAME) {
-                            console.log('Service Worker: Deleting old cache', cacheName);
-                            return caches.delete(cacheName);
-                        }
-                    })
-                );
-            })
-            .then(() => {
-                console.log('Service Worker: Activation complete');
-                return self.clients.claim();
-            })
-    );
-});
-// Fetch event - implement caching strategies
-self.addEventListener('fetch', event => {
-    const request = event.request;
-    const url = new URL(request.url);
-    // Skip non-GET requests
-    if (request.method !== 'GET') {
-        return;
-    }
-    // Handle different types of requests with appropriate strategies
-    if (url.pathname.startsWith('/static/')) {
-        // Static assets - Cache First strategy
-        event.respondWith(cacheFirstStrategy(request));
-    } else if (url.pathname.includes('api') || url.pathname.includes('_stcore')) {
-        // API calls and Streamlit core - Network First strategy
-        event.respondWith(networkFirstStrategy(request));
-    } else if (url.pathname === '/' || url.pathname.includes('.html')) {
-        // HTML pages - Stale While Revalidate strategy
-        event.respondWith(staleWhileRevalidateStrategy(request));
-    } else {
-        // Default - Network First with offline fallback
-        event.respondWith(networkFirstWithOfflineFallback(request));
-    }
-});
-// Cache First Strategy - for static assets
-async function cacheFirstStrategy(request) {
-    try {
-        const cachedResponse = await caches.match(request);
-        if (cachedResponse) {
-            return cachedResponse;
-        }
-        const networkResponse = await fetch(request);
-        if (networkResponse.ok) {
-            const cache = await caches.open(CACHE_NAME);
-            cache.put(request, networkResponse.clone());
-        }
-        return networkResponse;
-    } catch (error) {
-        console.error('Cache First Strategy failed:', error);
-        return new Response('Resource not available offline', { status: 503 });
-    }
-}
-// Network First Strategy - for dynamic content
-async function networkFirstStrategy(request) {
-    try {
-        const networkResponse = await fetch(request);
-        if (networkResponse.ok) {
-            const cache = await caches.open(CACHE_NAME);
-            cache.put(request, networkResponse.clone());
-        }
-        return networkResponse;
-    } catch (error) {
-        console.log('Network failed, trying cache:', request.url);
-        const cachedResponse = await caches.match(request);
-        if (cachedResponse) {
-            return cachedResponse;
-        }
-        throw error;
-    }
-}
-// Stale While Revalidate Strategy - for HTML pages
-async function staleWhileRevalidateStrategy(request) {
-    const cache = await caches.open(CACHE_NAME);
-    const cachedResponse = await cache.match(request);
-    const fetchPromise = fetch(request).then(networkResponse => {
-        if (networkResponse.ok) {
-            cache.put(request, networkResponse.clone());
-        }
-        return networkResponse;
-    }).catch(error => {
-        console.log('Network failed for:', request.url);
-        return null;
-    });
-    return cachedResponse || await fetchPromise || await cache.match(OFFLINE_URL);
-}
-// Network First with Offline Fallback
-async function networkFirstWithOfflineFallback(request) {
-    try {
-        const networkResponse = await fetch(request);
-        if (networkResponse.ok) {
-            const cache = await caches.open(CACHE_NAME);
-            cache.put(request, networkResponse.clone());
-        }
-        return networkResponse;
-    } catch (error) {
-        const cachedResponse = await caches.match(request);
-        if (cachedResponse) {
-            return cachedResponse;
-        }
-        // Return offline page for navigation requests
-        if (request.mode === 'navigate') {
-            return caches.match(OFFLINE_URL);
-        }
-        return new Response('Content not available offline', {
-            status: 503,
-            statusText: 'Service Unavailable'
-        });
-    }
-}
-// Background sync for offline submissions
-self.addEventListener('sync', event => {
-    console.log('Service Worker: Background sync triggered', event.tag);
-    if (event.tag === 'sync-contributions') {
-        event.waitUntil(syncContributions());
-    }
-});
-// Sync offline contributions when connection is restored
-async function syncContributions() {
-    try {
-        console.log('Service Worker: Syncing offline contributions...');
-        // Get offline contributions from IndexedDB
-        const contributions = await getOfflineContributions();
-        for (const contribution of contributions) {
-            try {
-                const response = await fetch('/api/contributions', {
-                    method: 'POST',
-                    headers: {
-                        'Content-Type': 'application/json',
-                    },
-                    body: JSON.stringify(contribution)
-                });
-                if (response.ok) {
-                    await removeOfflineContribution(contribution.id);
-                    console.log('Synced contribution:', contribution.id);
-                } else {
-                    console.error('Failed to sync contribution:', contribution.id);
-                }
-            } catch (error) {
-                console.error('Error syncing contribution:', error);
-            }
-        }
-        // Notify the main thread about sync completion
-        const clients = await self.clients.matchAll();
-        clients.forEach(client => {
-            client.postMessage({
-                type: 'SYNC_COMPLETE',
-                count: contributions.length
-            });
-        });
-    } catch (error) {
-        console.error('Background sync failed:', error);
-    }
-}
-// IndexedDB operations for offline storage
-async function getOfflineContributions() {
-    return new Promise((resolve, reject) => {
-        const request = indexedDB.open('CorpusCollectionDB', 1);
-        request.onerror = () => reject(request.error);
-        request.onsuccess = () => {
-            const db = request.result;
-            const transaction = db.transaction(['offline_contributions'], 'readonly');
-            const store = transaction.objectStore('offline_contributions');
-            const getAllRequest = store.getAll();
-            getAllRequest.onsuccess = () => resolve(getAllRequest.result);
-            getAllRequest.onerror = () => reject(getAllRequest.error);
-        };
-        request.onupgradeneeded = (event) => {
-            const db = event.target.result;
-            if (!db.objectStoreNames.contains('offline_contributions')) {
-                const store = db.createObjectStore('offline_contributions', { keyPath: 'id' });
-                store.createIndex('timestamp', 'timestamp', { unique: false });
-            }
-        };
-    });
-}
-async function removeOfflineContribution(id) {
-    return new Promise((resolve, reject) => {
-        const request = indexedDB.open('CorpusCollectionDB', 1);
-        request.onsuccess = () => {
-            const db = request.result;
-            const transaction = db.transaction(['offline_contributions'], 'readwrite');
-            const store = transaction.objectStore('offline_contributions');
-            const deleteRequest = store.delete(id);
-            deleteRequest.onsuccess = () => resolve();
-            deleteRequest.onerror = () => reject(deleteRequest.error);
-        };
-    });
-}
-// Handle messages from the main thread
-self.addEventListener('message', event => {
-    const { type, data } = event.data;
-    switch (type) {
-        case 'SKIP_WAITING':
-            self.skipWaiting();
-            break;
-        case 'CACHE_URLS':
-            cacheUrls(data.urls);
-            break;
-        case 'CLEAR_CACHE':
-            clearCache();
-            break;
-        default:
-            console.log('Unknown message type:', type);
-    }
-});
-// Cache additional URLs dynamically
-async function cacheUrls(urls) {
-    try {
-        const cache = await caches.open(CACHE_NAME);
-        await cache.addAll(urls);
-        console.log('Cached additional URLs:', urls);
-    } catch (error) {
-        console.error('Failed to cache URLs:', error);
-    }
-}
-// Clear all caches
-async function clearCache() {
-    try {
-        const cacheNames = await caches.keys();
-        await Promise.all(cacheNames.map(name => caches.delete(name)));
-        console.log('All caches cleared');
-    } catch (error) {
-        console.error('Failed to clear caches:', error);
-    }
-}
-// Periodic cleanup of old cached data
-setInterval(async () => {
-    try {
-        const cache = await caches.open(CACHE_NAME);
-        const requests = await cache.keys();
-        // Remove old cached responses (older than 7 days)
-        const oneWeekAgo = Date.now() - (7 * 24 * 60 * 60 * 1000);
-        for (const request of requests) {
-            const response = await cache.match(request);
-            const dateHeader = response.headers.get('date');
-            if (dateHeader) {
-                const responseDate = new Date(dateHeader).getTime();
-                if (responseDate < oneWeekAgo) {
-                    await cache.delete(request);
-                    console.log('Removed old cached response:', request.url);
-                }
-            }
-        }
-    } catch (error) {
-        console.error('Cache cleanup failed:', error);
-    }
-}, 24 * 60 * 60 * 1000); // Run daily

intern_project/corpus_collection_engine/requirements.txt DELETED Viewed

@@ -1,6 +0,0 @@
-streamlit>=1.28.0
-pandas>=1.5.0
-numpy>=1.24.0
-Pillow>=9.0.0
-requests>=2.28.0
-python-dateutil>=2.8.0

intern_project/corpus_collection_engine/services/__init__.py DELETED Viewed

	@@ -1 +0,0 @@
1	- # Services module for AI, language processing, and validation services

intern_project/corpus_collection_engine/services/ai_service.py DELETED Viewed

@@ -1,417 +0,0 @@
-"""
-AI Service for text generation, translation, and image processing
-"""
-import logging
-from typing import Dict, List, Optional, Tuple, Any
-import json
-import time
-from datetime import datetime
-# For Hugging Face Spaces deployment, disable transformers to avoid auth issues
-TRANSFORMERS_AVAILABLE = False
-from corpus_collection_engine.config import AI_CONFIG, SUPPORTED_LANGUAGES
-from corpus_collection_engine.services.language_service import LanguageService
-class AIService:
-    """Service for AI-powered text generation, translation, and processing"""
-    def __init__(self):
-        self.logger = logging.getLogger(__name__)
-        self.language_service = LanguageService()
-        # AI model configurations
-        self.config = AI_CONFIG
-        self.models = {}
-        self.fallback_mode = True  # Always use fallback for public deployment
-        # Initialize models (will use fallback mode)
-        self._initialize_models()
-        # Circuit breaker for model failures
-        self.circuit_breaker = {
-            'failures': 0,
-            'last_failure': None,
-            'threshold': 3,
-            'timeout': 300  # 5 minutes
-        }
-    def _initialize_models(self):
-        """Initialize AI models with fallback handling"""
-        try:
-            if TRANSFORMERS_AVAILABLE:
-                self.logger.info("Initializing AI models...")
-                # For MVP, use lightweight models that are readily available
-                # In production, replace with Sarvam-1 or other Indic language models
-                # Text generation model (lightweight)
-                try:
-                    # For Hugging Face Spaces deployment, disable model loading to avoid auth issues
-                    # Use fallback text generation instead
-                    self.models['text_generator'] = None
-                    self.logger.info("Text generation model disabled for public deployment")
-                except Exception as e:
-                    self.logger.warning(f"Could not load text generation model: {e}")
-                # Translation model (if available)
-                try:
-                    # For MVP, we'll use a simple approach
-                    # In production, use proper Indic translation models
-                    self.models['translator'] = None  # Placeholder
-                    self.logger.info("Translation service initialized")
-                except Exception as e:
-                    self.logger.warning(f"Could not load translation model: {e}")
-            else:
-                self.logger.warning("Transformers library not available, using fallback mode")
-                self.fallback_mode = True
-        except Exception as e:
-            self.logger.error(f"Error initializing AI models: {e}")
-            self.fallback_mode = True
-    def _is_circuit_breaker_open(self) -> bool:
-        """Check if circuit breaker is open due to recent failures"""
-        if self.circuit_breaker['failures'] < self.circuit_breaker['threshold']:
-            return False
-        if self.circuit_breaker['last_failure']:
-            time_since_failure = time.time() - self.circuit_breaker['last_failure']
-            if time_since_failure > self.circuit_breaker['timeout']:
-                # Reset circuit breaker
-                self.circuit_breaker['failures'] = 0
-                self.circuit_breaker['last_failure'] = None
-                return False
-        return True
-    def _record_failure(self):
-        """Record a model failure for circuit breaker"""
-        self.circuit_breaker['failures'] += 1
-        self.circuit_breaker['last_failure'] = time.time()
-    def _record_success(self):
-        """Record a successful operation"""
-        if self.circuit_breaker['failures'] > 0:
-            self.circuit_breaker['failures'] = max(0, self.circuit_breaker['failures'] - 1)
-    def generate_text(self, prompt: str, language: str = "en",
-                     max_length: int = 100) -> Tuple[Optional[str], float]:
-        """
-        Generate text based on prompt
-        Args:
-            prompt: Input prompt for text generation
-            language: Target language for generation
-            max_length: Maximum length of generated text
-        Returns:
-            Tuple of (generated_text, confidence_score)
-        """
-        if self._is_circuit_breaker_open():
-            self.logger.warning("AI service circuit breaker is open")
-            return self._fallback_text_generation(prompt, language), 0.3
-        try:
-            # For Hugging Face Spaces deployment, always use fallback mode
-            # to avoid authentication issues with external models
-            pass
-        except Exception as e:
-            self.logger.error(f"Error in text generation: {e}")
-            self._record_failure()
-        # Fallback to rule-based generation
-        return self._fallback_text_generation(prompt, language), 0.4
-    def _format_prompt_for_language(self, prompt: str, language: str) -> str:
-        """Format prompt based on target language"""
-        if language == "en":
-            return prompt
-        # For Indic languages, add context
-        lang_name = self.language_service.get_language_name(language)
-        return f"In {lang_name}: {prompt}"
-    def _fallback_text_generation(self, prompt: str, language: str) -> str:
-        """Fallback text generation using templates"""
-        # Simple template-based generation for common scenarios
-        templates = {
-            "meme_caption": [
-                "When you {prompt}",
-                "That moment when {prompt}",
-                "Me: {prompt}",
-                "{prompt} be like:",
-                "POV: {prompt}"
-            ],
-            "recipe_suggestion": [
-                "Try adding {prompt} for better taste",
-                "This {prompt} recipe is perfect for festivals",
-                "Traditional {prompt} with a modern twist",
-                "Family recipe for {prompt}"
-            ],
-            "story_continuation": [
-                "Once upon a time, {prompt}",
-                "In the village, {prompt}",
-                "The wise elder said, {prompt}",
-                "As the story goes, {prompt}"
-            ]
-        }
-        # Detect prompt type and use appropriate template
-        prompt_lower = prompt.lower()
-        if any(word in prompt_lower for word in ["meme", "funny", "joke"]):
-            template_list = templates["meme_caption"]
-        elif any(word in prompt_lower for word in ["recipe", "cook", "food"]):
-            template_list = templates["recipe_suggestion"]
-        elif any(word in prompt_lower for word in ["story", "tale", "once"]):
-            template_list = templates["story_continuation"]
-        else:
-            # Generic response
-            return f"Here's something about {prompt}..."
-        # Select a random template
-        import random
-        template = random.choice(template_list)
-        return template.format(prompt=prompt)
-    def translate_text(self, text: str, source_lang: str,
-                      target_lang: str) -> Tuple[Optional[str], float]:
-        """
-        Translate text between languages
-        Args:
-            text: Text to translate
-            source_lang: Source language code
-            target_lang: Target language code
-        Returns:
-            Tuple of (translated_text, confidence_score)
-        """
-        if self._is_circuit_breaker_open():
-            return self._fallback_translation(text, source_lang, target_lang), 0.2
-        try:
-            # For MVP, we'll use a simple approach
-            # In production, use proper translation models like IndicTrans
-            if source_lang == target_lang:
-                return text, 1.0
-            # Placeholder for actual translation
-            # In production, integrate with translation APIs or models
-            translated = self._fallback_translation(text, source_lang, target_lang)
-            return translated, 0.6
-        except Exception as e:
-            self.logger.error(f"Error in translation: {e}")
-            self._record_failure()
-            return self._fallback_translation(text, source_lang, target_lang), 0.3
-    def _fallback_translation(self, text: str, source_lang: str, target_lang: str) -> str:
-        """Fallback translation using simple rules"""
-        # For MVP, return original text with language indicator
-        # In production, implement proper translation
-        if source_lang == target_lang:
-            return text
-        source_name = self.language_service.get_language_name(source_lang)
-        target_name = self.language_service.get_language_name(target_lang)
-        return f"[{source_name} → {target_name}] {text}"
-    def generate_caption(self, image_description: str, language: str = "en") -> Tuple[Optional[str], float]:
-        """
-        Generate caption for image based on description
-        Args:
-            image_description: Description of the image
-            language: Target language for caption
-        Returns:
-            Tuple of (caption, confidence_score)
-        """
-        # Use text generation with image-specific prompts
-        prompts = [
-            f"Caption for image showing {image_description}:",
-            f"Describe this image: {image_description}",
-            f"What's happening in this picture of {image_description}?"
-        ]
-        import random
-        prompt = random.choice(prompts)
-        return self.generate_text(prompt, language, max_length=50)
-    def suggest_cultural_tags(self, content: str, language: str,
-                            region: Optional[str] = None) -> List[str]:
-        """
-        Suggest cultural tags based on content
-        Args:
-            content: Text content to analyze
-            language: Language of the content
-            region: Optional region information
-        Returns:
-            List of suggested cultural tags
-        """
-        tags = []
-        content_lower = content.lower()
-        # Festival-related tags
-        festivals = {
-            "diwali": ["festival", "lights", "celebration", "hindu"],
-            "holi": ["festival", "colors", "spring", "celebration"],
-            "eid": ["festival", "islamic", "celebration", "community"],
-            "christmas": ["festival", "christian", "celebration", "winter"],
-            "dussehra": ["festival", "victory", "hindu", "tradition"],
-            "ganesh": ["festival", "hindu", "elephant", "wisdom"],
-            "navratri": ["festival", "dance", "hindu", "goddess"]
-        }
-        for festival, festival_tags in festivals.items():
-            if festival in content_lower:
-                tags.extend(festival_tags)
-        # Food-related tags
-        foods = {
-            "biryani": ["food", "rice", "spices", "traditional"],
-            "curry": ["food", "spices", "traditional", "sauce"],
-            "roti": ["food", "bread", "staple", "traditional"],
-            "dal": ["food", "lentils", "protein", "staple"],
-            "samosa": ["food", "snack", "fried", "traditional"],
-            "lassi": ["drink", "yogurt", "traditional", "cooling"]
-        }
-        for food, food_tags in foods.items():
-            if food in content_lower:
-                tags.extend(food_tags)
-        # Regional tags
-        if region:
-            region_lower = region.lower()
-            regional_tags = {
-                "maharashtra": ["marathi", "western_india", "mumbai"],
-                "karnataka": ["kannada", "southern_india", "bangalore"],
-                "tamil nadu": ["tamil", "southern_india", "chennai"],
-                "kerala": ["malayalam", "southern_india", "backwaters"],
-                "punjab": ["punjabi", "northern_india", "agriculture"],
-                "bengal": ["bengali", "eastern_india", "kolkata"],
-                "gujarat": ["gujarati", "western_india", "business"]
-            }
-            for region_key, region_tags in regional_tags.items():
-                if region_key in region_lower:
-                    tags.extend(region_tags)
-        # Language-specific tags
-        if language != "en":
-            tags.append("multilingual")
-            tags.append(f"{language}_language")
-        # Remove duplicates and return
-        return list(set(tags))
-    def analyze_sentiment(self, text: str, language: str = "en") -> Dict[str, float]:
-        """
-        Analyze sentiment of text
-        Args:
-            text: Text to analyze
-            language: Language of the text
-        Returns:
-            Dictionary with sentiment scores
-        """
-        # Simple rule-based sentiment analysis for MVP
-        # In production, use proper sentiment analysis models
-        positive_words = [
-            "good", "great", "excellent", "amazing", "wonderful", "beautiful",
-            "love", "like", "happy", "joy", "celebration", "festival",
-            "अच्छा", "सुंदर", "खुशी", "प्रेम"  # Hindi examples
-        ]
-        negative_words = [
-            "bad", "terrible", "awful", "hate", "sad", "angry", "disappointed",
-            "बुरा", "गुस्सा", "दुख"  # Hindi examples
-        ]
-        text_lower = text.lower()
-        positive_count = sum(1 for word in positive_words if word in text_lower)
-        negative_count = sum(1 for word in negative_words if word in text_lower)
-        total_words = len(text.split())
-        if total_words == 0:
-            return {"positive": 0.5, "negative": 0.5, "neutral": 0.0}
-        positive_score = positive_count / total_words
-        negative_score = negative_count / total_words
-        neutral_score = max(0, 1 - positive_score - negative_score)
-        return {
-            "positive": min(1.0, positive_score * 2),
-            "negative": min(1.0, negative_score * 2),
-            "neutral": neutral_score
-        }
-    def extract_keywords(self, text: str, language: str = "en",
-                        max_keywords: int = 10) -> List[str]:
-        """
-        Extract keywords from text
-        Args:
-            text: Text to analyze
-            language: Language of the text
-            max_keywords: Maximum number of keywords to return
-        Returns:
-            List of extracted keywords
-        """
-        # Simple keyword extraction for MVP
-        # In production, use proper NLP libraries
-        # Common stop words (basic list)
-        stop_words = {
-            "en": {"the", "a", "an", "and", "or", "but", "in", "on", "at", "to", "for", "of", "with", "by"},
-            "hi": {"और", "या", "में", "पर", "से", "को", "का", "की", "के", "है", "हैं", "था", "थी", "थे"}
-        }
-        # Get stop words for language
-        lang_stop_words = stop_words.get(language, stop_words["en"])
-        # Simple tokenization and filtering
-        words = text.lower().split()
-        keywords = []
-        for word in words:
-            # Remove punctuation
-            word = ''.join(char for char in word if char.isalnum())
-            # Filter out stop words and short words
-            if len(word) > 2 and word not in lang_stop_words:
-                keywords.append(word)
-        # Count frequency and return most common
-        from collections import Counter
-        word_counts = Counter(keywords)
-        return [word for word, count in word_counts.most_common(max_keywords)]
-    def get_service_status(self) -> Dict[str, Any]:
-        """Get current status of AI service"""
-        return {
-            "fallback_mode": self.fallback_mode,
-            "models_loaded": list(self.models.keys()),
-            "circuit_breaker": {
-                "failures": self.circuit_breaker["failures"],
-                "is_open": self._is_circuit_breaker_open()
-            },
-            "transformers_available": TRANSFORMERS_AVAILABLE,
-            "last_updated": datetime.now().isoformat()
-        }

intern_project/corpus_collection_engine/services/analytics_service.py DELETED Viewed

@@ -1,766 +0,0 @@
-"""
-Analytics and Metrics Collection Service
-"""
-import streamlit as st
-from typing import Dict, List, Any, Optional, Tuple
-from datetime import datetime, timedelta
-from dataclasses import dataclass
-from enum import Enum
-import json
-import logging
-import pandas as pd
-from collections import defaultdict, Counter
-from corpus_collection_engine.models.data_models import UserContribution, ActivityType, ValidationStatus
-from corpus_collection_engine.services.storage_service import StorageService
-from corpus_collection_engine.services.language_service import LanguageService
-from corpus_collection_engine.services.engagement_service import EngagementService
-class MetricType(Enum):
-    """Types of metrics to track"""
-    CONTRIBUTION_COUNT = "contribution_count"
-    USER_ENGAGEMENT = "user_engagement"
-    LANGUAGE_DIVERSITY = "language_diversity"
-    QUALITY_SCORE = "quality_score"
-    CULTURAL_IMPACT = "cultural_impact"
-    GEOGRAPHIC_DISTRIBUTION = "geographic_distribution"
-    ACTIVITY_POPULARITY = "activity_popularity"
-    RETENTION_RATE = "retention_rate"
-@dataclass
-class MetricSnapshot:
-    """Snapshot of a metric at a point in time"""
-    metric_type: MetricType
-    value: float
-    timestamp: datetime
-    metadata: Dict[str, Any]
-@dataclass
-class AnalyticsReport:
-    """Comprehensive analytics report"""
-    report_id: str
-    generated_at: datetime
-    total_contributions: int
-    unique_contributors: int
-    language_distribution: Dict[str, int]
-    activity_distribution: Dict[str, int]
-    regional_distribution: Dict[str, int]
-    quality_metrics: Dict[str, float]
-    engagement_metrics: Dict[str, float]
-    growth_metrics: Dict[str, float]
-    cultural_impact_score: float
-    recommendations: List[str]
-class AnalyticsService:
-    """Service for collecting and analyzing platform metrics"""
-    def __init__(self):
-        self.logger = logging.getLogger(__name__)
-        self.storage_service = StorageService()
-        self.language_service = LanguageService()
-        self.engagement_service = EngagementService()
-        # Initialize analytics tracking
-        if 'analytics_initialized' not in st.session_state:
-            st.session_state.analytics_initialized = False
-            st.session_state.metrics_cache = {}
-            st.session_state.last_analytics_update = None
-    def initialize_analytics(self):
-        """Initialize analytics tracking"""
-        if st.session_state.analytics_initialized:
-            return
-        try:
-            # Set up analytics tracking
-            st.session_state.analytics_initialized = True
-            st.session_state.last_analytics_update = datetime.now()
-            self.logger.info("Analytics service initialized")
-        except Exception as e:
-            self.logger.error(f"Analytics initialization failed: {e}")
-    def generate_comprehensive_report(self) -> AnalyticsReport:
-        """Generate comprehensive analytics report"""
-        try:
-            # Get all contributions for analysis
-            all_contributions = self._get_all_contributions()
-            # Calculate basic metrics
-            total_contributions = len(all_contributions)
-            unique_contributors = len(set(contrib.user_session for contrib in all_contributions))
-            # Language distribution
-            language_distribution = self._calculate_language_distribution(all_contributions)
-            # Activity distribution
-            activity_distribution = self._calculate_activity_distribution(all_contributions)
-            # Regional distribution
-            regional_distribution = self._calculate_regional_distribution(all_contributions)
-            # Quality metrics
-            quality_metrics = self._calculate_quality_metrics(all_contributions)
-            # Engagement metrics
-            engagement_metrics = self._calculate_engagement_metrics(all_contributions)
-            # Growth metrics
-            growth_metrics = self._calculate_growth_metrics(all_contributions)
-            # Cultural impact score
-            cultural_impact_score = self._calculate_platform_cultural_impact(all_contributions)
-            # Generate recommendations
-            recommendations = self._generate_recommendations(
-                all_contributions, language_distribution, activity_distribution
-            )
-            return AnalyticsReport(
-                report_id=f"report_{datetime.now().strftime('%Y%m%d_%H%M%S')}",
-                generated_at=datetime.now(),
-                total_contributions=total_contributions,
-                unique_contributors=unique_contributors,
-                language_distribution=language_distribution,
-                activity_distribution=activity_distribution,
-                regional_distribution=regional_distribution,
-                quality_metrics=quality_metrics,
-                engagement_metrics=engagement_metrics,
-                growth_metrics=growth_metrics,
-                cultural_impact_score=cultural_impact_score,
-                recommendations=recommendations
-            )
-        except Exception as e:
-            self.logger.error(f"Error generating analytics report: {e}")
-            return self._create_empty_report()
-    def _get_all_contributions(self) -> List[UserContribution]:
-        """Get all contributions from storage"""
-        all_contributions = []
-        # Get contributions for all supported languages
-        supported_languages = self.language_service.get_supported_languages_list()
-        for lang_info in supported_languages:
-            lang_code = lang_info['code']
-            contributions = self.storage_service.get_contributions_by_language(lang_code, limit=10000)
-            all_contributions.extend(contributions)
-        # Remove duplicates based on contribution ID
-        seen_ids = set()
-        unique_contributions = []
-        for contrib in all_contributions:
-            if contrib.id not in seen_ids:
-                seen_ids.add(contrib.id)
-                unique_contributions.append(contrib)
-        return unique_contributions
-    def _calculate_language_distribution(self, contributions: List[UserContribution]) -> Dict[str, int]:
-        """Calculate distribution of contributions by language"""
-        language_counts = Counter(contrib.language for contrib in contributions)
-        return dict(language_counts)
-    def _calculate_activity_distribution(self, contributions: List[UserContribution]) -> Dict[str, int]:
-        """Calculate distribution of contributions by activity type"""
-        activity_counts = Counter(contrib.activity_type.value for contrib in contributions)
-        return dict(activity_counts)
-    def _calculate_regional_distribution(self, contributions: List[UserContribution]) -> Dict[str, int]:
-        """Calculate distribution of contributions by region"""
-        regional_counts = defaultdict(int)
-        for contrib in contributions:
-            region = contrib.cultural_context.get('region', 'Unknown')
-            if region and region.strip():
-                regional_counts[region.strip()] += 1
-        return dict(regional_counts)
-    def _calculate_quality_metrics(self, contributions: List[UserContribution]) -> Dict[str, float]:
-        """Calculate quality-related metrics"""
-        if not contributions:
-            return {}
-        # Calculate average content length by activity
-        activity_lengths = defaultdict(list)
-        for contrib in contributions:
-            content_length = len(str(contrib.content_data))
-            activity_lengths[contrib.activity_type.value].append(content_length)
-        quality_metrics = {}
-        # Average content length per activity
-        for activity, lengths in activity_lengths.items():
-            quality_metrics[f"avg_content_length_{activity}"] = sum(lengths) / len(lengths)
-        # Overall average content length
-        all_lengths = [len(str(contrib.content_data)) for contrib in contributions]
-        quality_metrics["avg_content_length_overall"] = sum(all_lengths) / len(all_lengths)
-        # Percentage with cultural context
-        with_cultural_context = sum(1 for contrib in contributions
-                                  if contrib.cultural_context.get('cultural_significance'))
-        quality_metrics["cultural_context_percentage"] = (with_cultural_context / len(contributions)) * 100
-        # Percentage with regional information
-        with_region = sum(1 for contrib in contributions
-                         if contrib.cultural_context.get('region'))
-        quality_metrics["regional_info_percentage"] = (with_region / len(contributions)) * 100
-        return quality_metrics
-    def _calculate_engagement_metrics(self, contributions: List[UserContribution]) -> Dict[str, float]:
-        """Calculate user engagement metrics"""
-        if not contributions:
-            return {}
-        # Group contributions by user session
-        user_contributions = defaultdict(list)
-        for contrib in contributions:
-            user_contributions[contrib.user_session].append(contrib)
-        engagement_metrics = {}
-        # Average contributions per user
-        engagement_metrics["avg_contributions_per_user"] = len(contributions) / len(user_contributions)
-        # User retention (users with multiple contributions)
-        multi_contribution_users = sum(1 for contribs in user_contributions.values() if len(contribs) > 1)
-        engagement_metrics["user_retention_rate"] = (multi_contribution_users / len(user_contributions)) * 100
-        # Activity diversity per user
-        user_activity_diversity = []
-        for contribs in user_contributions.values():
-            unique_activities = len(set(contrib.activity_type for contrib in contribs))
-            user_activity_diversity.append(unique_activities)
-        engagement_metrics["avg_activity_diversity_per_user"] = sum(user_activity_diversity) / len(user_activity_diversity)
-        # Language diversity per user
-        user_language_diversity = []
-        for contribs in user_contributions.values():
-            unique_languages = len(set(contrib.language for contrib in contribs))
-            user_language_diversity.append(unique_languages)
-        engagement_metrics["avg_language_diversity_per_user"] = sum(user_language_diversity) / len(user_language_diversity)
-        return engagement_metrics
-    def _calculate_growth_metrics(self, contributions: List[UserContribution]) -> Dict[str, float]:
-        """Calculate growth and trend metrics"""
-        if not contributions:
-            return {}
-        # Sort contributions by timestamp
-        sorted_contributions = sorted(contributions, key=lambda x: x.timestamp)
-        growth_metrics = {}
-        # Daily contribution counts for the last 30 days
-        now = datetime.now()
-        daily_counts = defaultdict(int)
-        for contrib in sorted_contributions:
-            days_ago = (now - contrib.timestamp).days
-            if days_ago <= 30:
-                date_key = contrib.timestamp.date()
-                daily_counts[date_key] += 1
-        # Calculate growth rate (last 7 days vs previous 7 days)
-        last_7_days = sum(count for date, count in daily_counts.items()
-                         if (now.date() - date).days <= 7)
-        previous_7_days = sum(count for date, count in daily_counts.items()
-                            if 7 < (now.date() - date).days <= 14)
-        if previous_7_days > 0:
-            growth_metrics["weekly_growth_rate"] = ((last_7_days - previous_7_days) / previous_7_days) * 100
-        else:
-            growth_metrics["weekly_growth_rate"] = 0.0
-        # Average daily contributions
-        if daily_counts:
-            growth_metrics["avg_daily_contributions"] = sum(daily_counts.values()) / len(daily_counts)
-        else:
-            growth_metrics["avg_daily_contributions"] = 0.0
-        # Peak day contribution count
-        growth_metrics["peak_daily_contributions"] = max(daily_counts.values()) if daily_counts else 0
-        return growth_metrics
-    def _calculate_platform_cultural_impact(self, contributions: List[UserContribution]) -> float:
-        """Calculate overall platform cultural impact score"""
-        if not contributions:
-            return 0.0
-        impact_score = 0.0
-        # Base score for total contributions
-        impact_score += len(contributions) * 10
-        # Bonus for language diversity
-        unique_languages = len(set(contrib.language for contrib in contributions))
-        impact_score += unique_languages * 50
-        # Bonus for regional diversity
-        unique_regions = len(set(contrib.cultural_context.get('region', '')
-                               for contrib in contributions
-                               if contrib.cultural_context.get('region')))
-        impact_score += unique_regions * 30
-        # Bonus for activity diversity
-        unique_activities = len(set(contrib.activity_type for contrib in contributions))
-        impact_score += unique_activities * 40
-        # Bonus for cultural context richness
-        with_cultural_significance = sum(1 for contrib in contributions
-                                       if contrib.cultural_context.get('cultural_significance'))
-        impact_score += with_cultural_significance * 5
-        # Normalize to 0-100 scale
-        max_possible_score = len(contributions) * 100  # Rough estimate
-        normalized_score = min(100.0, (impact_score / max_possible_score) * 100) if max_possible_score > 0 else 0.0
-        return round(normalized_score, 1)
-    def _generate_recommendations(self, contributions: List[UserContribution],
-                                language_dist: Dict[str, int],
-                                activity_dist: Dict[str, int]) -> List[str]:
-        """Generate actionable recommendations based on analytics"""
-        recommendations = []
-        if not contributions:
-            recommendations.append("Start collecting contributions to generate meaningful analytics")
-            return recommendations
-        # Language diversity recommendations
-        if len(language_dist) < 3:
-            recommendations.append("Encourage contributions in more Indian languages to increase diversity")
-        # Activity balance recommendations
-        if activity_dist:
-            min_activity_count = min(activity_dist.values())
-            max_activity_count = max(activity_dist.values())
-            if max_activity_count > min_activity_count * 3:
-                underrepresented_activities = [activity for activity, count in activity_dist.items()
-                                             if count == min_activity_count]
-                recommendations.append(f"Promote {', '.join(underrepresented_activities)} activities to balance contribution types")
-        # Quality recommendations
-        quality_metrics = self._calculate_quality_metrics(contributions)
-        cultural_context_pct = quality_metrics.get("cultural_context_percentage", 0)
-        if cultural_context_pct < 70:
-            recommendations.append("Encourage users to provide more cultural context in their contributions")
-        # Engagement recommendations
-        engagement_metrics = self._calculate_engagement_metrics(contributions)
-        retention_rate = engagement_metrics.get("user_retention_rate", 0)
-        if retention_rate < 30:
-            recommendations.append("Implement strategies to improve user retention and repeat contributions")
-        # Growth recommendations
-        growth_metrics = self._calculate_growth_metrics(contributions)
-        weekly_growth = growth_metrics.get("weekly_growth_rate", 0)
-        if weekly_growth < 10:
-            recommendations.append("Focus on user acquisition strategies to increase weekly growth")
-        return recommendations
-    def _create_empty_report(self) -> AnalyticsReport:
-        """Create empty analytics report for error cases"""
-        return AnalyticsReport(
-            report_id=f"empty_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}",
-            generated_at=datetime.now(),
-            total_contributions=0,
-            unique_contributors=0,
-            language_distribution={},
-            activity_distribution={},
-            regional_distribution={},
-            quality_metrics={},
-            engagement_metrics={},
-            growth_metrics={},
-            cultural_impact_score=0.0,
-            recommendations=["No data available for analysis"]
-        )
-    def render_analytics_dashboard(self):
-        """Render comprehensive analytics dashboard"""
-        st.title("📊 Analytics Dashboard")
-        st.markdown("*Insights into cultural preservation impact*")
-        # Generate report
-        with st.spinner("Generating analytics report..."):
-            report = self.generate_comprehensive_report()
-        # Overview metrics
-        st.subheader("🌟 Platform Overview")
-        col1, col2, col3, col4 = st.columns(4)
-        with col1:
-            st.metric(
-                "Total Contributions",
-                report.total_contributions,
-                delta=f"+{report.growth_metrics.get('weekly_growth_rate', 0):.1f}% this week" if report.growth_metrics else None
-            )
-        with col2:
-            st.metric(
-                "Active Contributors",
-                report.unique_contributors,
-                delta=f"{report.engagement_metrics.get('user_retention_rate', 0):.1f}% retention" if report.engagement_metrics else None
-            )
-        with col3:
-            st.metric(
-                "Languages Covered",
-                len(report.language_distribution),
-                delta=f"{len(report.language_distribution)} of 11 supported"
-            )
-        with col4:
-            st.metric(
-                "Cultural Impact",
-                f"{report.cultural_impact_score}/100",
-                delta=f"Platform-wide score"
-            )
-        # Language Distribution
-        if report.language_distribution:
-            st.subheader("🌍 Language Distribution")
-            # Create language chart
-            lang_df = pd.DataFrame(
-                list(report.language_distribution.items()),
-                columns=['Language', 'Contributions']
-            )
-            # Map language codes to names
-            lang_names = {}
-            for lang_info in self.language_service.get_supported_languages_list():
-                lang_names[lang_info['code']] = lang_info['name']
-            lang_df['Language Name'] = lang_df['Language'].map(lambda x: lang_names.get(x, x))
-            col1, col2 = st.columns([2, 1])
-            with col1:
-                st.bar_chart(lang_df.set_index('Language Name')['Contributions'])
-            with col2:
-                st.dataframe(
-                    lang_df[['Language Name', 'Contributions']].sort_values('Contributions', ascending=False),
-                    use_container_width=True
-                )
-        # Activity Distribution
-        if report.activity_distribution:
-            st.subheader("🎭 Activity Popularity")
-            activity_names = {
-                'meme': '🎭 Memes',
-                'recipe': '🍛 Recipes',
-                'folklore': '📚 Folklore',
-                'landmark': '🏛️ Landmarks'
-            }
-            activity_df = pd.DataFrame(
-                list(report.activity_distribution.items()),
-                columns=['Activity', 'Contributions']
-            )
-            activity_df['Activity Name'] = activity_df['Activity'].map(lambda x: activity_names.get(x, x.title()))
-            col1, col2 = st.columns([2, 1])
-            with col1:
-                st.bar_chart(activity_df.set_index('Activity Name')['Contributions'])
-            with col2:
-                for _, row in activity_df.iterrows():
-                    st.metric(row['Activity Name'], row['Contributions'])
-        # Regional Distribution
-        if report.regional_distribution:
-            st.subheader("📍 Regional Contributions")
-            # Show top regions
-            sorted_regions = sorted(report.regional_distribution.items(), key=lambda x: x[1], reverse=True)
-            col1, col2 = st.columns([2, 1])
-            with col1:
-                region_df = pd.DataFrame(sorted_regions[:10], columns=['Region', 'Contributions'])
-                st.bar_chart(region_df.set_index('Region')['Contributions'])
-            with col2:
-                st.markdown("**Top Regions:**")
-                for region, count in sorted_regions[:5]:
-                    st.markdown(f"• {region}: {count}")
-        # Quality Metrics
-        if report.quality_metrics:
-            st.subheader("💎 Quality Metrics")
-            col1, col2, col3 = st.columns(3)
-            with col1:
-                cultural_context_pct = report.quality_metrics.get("cultural_context_percentage", 0)
-                st.metric(
-                    "Cultural Context",
-                    f"{cultural_context_pct:.1f}%",
-                    delta="of contributions"
-                )
-            with col2:
-                regional_info_pct = report.quality_metrics.get("regional_info_percentage", 0)
-                st.metric(
-                    "Regional Info",
-                    f"{regional_info_pct:.1f}%",
-                    delta="with location"
-                )
-            with col3:
-                avg_length = report.quality_metrics.get("avg_content_length_overall", 0)
-                st.metric(
-                    "Avg Content Length",
-                    f"{avg_length:.0f}",
-                    delta="characters"
-                )
-        # Engagement Metrics
-        if report.engagement_metrics:
-            st.subheader("🤝 User Engagement")
-            col1, col2, col3 = st.columns(3)
-            with col1:
-                avg_contributions = report.engagement_metrics.get("avg_contributions_per_user", 0)
-                st.metric(
-                    "Avg Contributions",
-                    f"{avg_contributions:.1f}",
-                    delta="per user"
-                )
-            with col2:
-                retention_rate = report.engagement_metrics.get("user_retention_rate", 0)
-                st.metric(
-                    "User Retention",
-                    f"{retention_rate:.1f}%",
-                    delta="return users"
-                )
-            with col3:
-                activity_diversity = report.engagement_metrics.get("avg_activity_diversity_per_user", 0)
-                st.metric(
-                    "Activity Diversity",
-                    f"{activity_diversity:.1f}",
-                    delta="activities per user"
-                )
-        # Growth Trends
-        if report.growth_metrics:
-            st.subheader("📈 Growth Trends")
-            col1, col2, col3 = st.columns(3)
-            with col1:
-                weekly_growth = report.growth_metrics.get("weekly_growth_rate", 0)
-                st.metric(
-                    "Weekly Growth",
-                    f"{weekly_growth:+.1f}%",
-                    delta="vs previous week"
-                )
-            with col2:
-                avg_daily = report.growth_metrics.get("avg_daily_contributions", 0)
-                st.metric(
-                    "Daily Average",
-                    f"{avg_daily:.1f}",
-                    delta="contributions"
-                )
-            with col3:
-                peak_daily = report.growth_metrics.get("peak_daily_contributions", 0)
-                st.metric(
-                    "Peak Day",
-                    f"{peak_daily}",
-                    delta="contributions"
-                )
-        # Recommendations
-        if report.recommendations:
-            st.subheader("💡 Recommendations")
-            for i, recommendation in enumerate(report.recommendations, 1):
-                st.markdown(f"{i}. {recommendation}")
-        # Export options
-        st.subheader("📤 Export Data")
-        col1, col2 = st.columns(2)
-        with col1:
-            if st.button("📊 Export Analytics Report", use_container_width=True):
-                report_json = self._export_report_to_json(report)
-                st.download_button(
-                    label="Download JSON Report",
-                    data=report_json,
-                    file_name=f"analytics_report_{report.report_id}.json",
-                    mime="application/json"
-                )
-        with col2:
-            if st.button("📈 Export Contribution Data", use_container_width=True):
-                contributions_csv = self._export_contributions_to_csv()
-                if contributions_csv:
-                    st.download_button(
-                        label="Download CSV Data",
-                        data=contributions_csv,
-                        file_name=f"contributions_data_{datetime.now().strftime('%Y%m%d')}.csv",
-                        mime="text/csv"
-                    )
-    def _export_report_to_json(self, report: AnalyticsReport) -> str:
-        """Export analytics report to JSON format"""
-        report_dict = {
-            'report_id': report.report_id,
-            'generated_at': report.generated_at.isoformat(),
-            'total_contributions': report.total_contributions,
-            'unique_contributors': report.unique_contributors,
-            'language_distribution': report.language_distribution,
-            'activity_distribution': report.activity_distribution,
-            'regional_distribution': report.regional_distribution,
-            'quality_metrics': report.quality_metrics,
-            'engagement_metrics': report.engagement_metrics,
-            'growth_metrics': report.growth_metrics,
-            'cultural_impact_score': report.cultural_impact_score,
-            'recommendations': report.recommendations
-        }
-        return json.dumps(report_dict, indent=2, ensure_ascii=False)
-    def _export_contributions_to_csv(self) -> Optional[str]:
-        """Export contributions data to CSV format"""
-        try:
-            contributions = self._get_all_contributions()
-            if not contributions:
-                return None
-            # Prepare data for CSV
-            csv_data = []
-            for contrib in contributions:
-                csv_data.append({
-                    'id': contrib.id,
-                    'user_session': contrib.user_session,
-                    'activity_type': contrib.activity_type.value,
-                    'language': contrib.language,
-                    'timestamp': contrib.timestamp.isoformat(),
-                    'validation_status': contrib.validation_status.value,
-                    'region': contrib.cultural_context.get('region', ''),
-                    'cultural_significance': contrib.cultural_context.get('cultural_significance', ''),
-                    'content_length': len(str(contrib.content_data))
-                })
-            # Convert to DataFrame and then CSV
-            df = pd.DataFrame(csv_data)
-            return df.to_csv(index=False)
-        except Exception as e:
-            self.logger.error(f"Error exporting contributions to CSV: {e}")
-            return None
-    def get_real_time_metrics(self) -> Dict[str, Any]:
-        """Get real-time metrics for dashboard updates"""
-        try:
-            # Get basic statistics from storage service
-            storage_stats = self.storage_service.get_statistics()
-            # Calculate additional real-time metrics
-            current_time = datetime.now()
-            # Recent activity (last 24 hours)
-            recent_contributions = []
-            for lang_info in self.language_service.get_supported_languages_list():
-                lang_contributions = self.storage_service.get_contributions_by_language(
-                    lang_info['code'], limit=100
-                )
-                recent_contributions.extend([
-                    contrib for contrib in lang_contributions
-                    if (current_time - contrib.timestamp).total_seconds() < 86400  # 24 hours
-                ])
-            return {
-                'total_contributions': storage_stats.get('total_contributions', 0),
-                'contributions_by_language': storage_stats.get('contributions_by_language', {}),
-                'contributions_by_activity': storage_stats.get('contributions_by_activity', {}),
-                'recent_24h_contributions': len(recent_contributions),
-                'last_updated': current_time.isoformat()
-            }
-        except Exception as e:
-            self.logger.error(f"Error getting real-time metrics: {e}")
-            return {}
-    def track_user_action(self, action: str, user_session: str, metadata: Dict[str, Any] = None):
-        """Track user actions for analytics (simplified implementation)"""
-        try:
-            # In a full implementation, this would log to an analytics database
-            # For now, we'll just log the action
-            action_data = {
-                'action': action,
-                'user_session': user_session,
-                'timestamp': datetime.now().isoformat(),
-                'metadata': metadata or {}
-            }
-            self.logger.info(f"User action tracked: {json.dumps(action_data)}")
-        except Exception as e:
-            self.logger.error(f"Error tracking user action: {e}")
-    def get_contribution_trends(self, days: int = 30) -> Dict[str, List[int]]:
-        """Get contribution trends over specified number of days"""
-        try:
-            contributions = self._get_all_contributions()
-            # Calculate daily contributions for the last N days
-            now = datetime.now()
-            daily_counts = defaultdict(int)
-            for contrib in contributions:
-                days_ago = (now - contrib.timestamp).days
-                if days_ago <= days:
-                    date_key = contrib.timestamp.date()
-                    daily_counts[date_key] += 1
-            # Create time series data
-            dates = []
-            counts = []
-            for i in range(days, -1, -1):
-                date = (now - timedelta(days=i)).date()
-                dates.append(date.isoformat())
-                counts.append(daily_counts.get(date, 0))
-            return {
-                'dates': dates,
-                'contributions': counts
-            }
-        except Exception as e:
-            self.logger.error(f"Error getting contribution trends: {e}")
-            return {'dates': [], 'contributions': []}

intern_project/corpus_collection_engine/services/engagement_service.py DELETED Viewed

@@ -1,665 +0,0 @@
-"""
-User Engagement and Feedback Service
-"""
-import streamlit as st
-from typing import Dict, List, Any, Optional, Tuple
-from datetime import datetime, timedelta
-from dataclasses import dataclass
-from enum import Enum
-import json
-import logging
-from corpus_collection_engine.models.data_models import UserContribution, ActivityType
-from corpus_collection_engine.services.storage_service import StorageService
-from corpus_collection_engine.services.language_service import LanguageService
-class EngagementLevel(Enum):
-    """User engagement levels"""
-    NEWCOMER = "newcomer"
-    CONTRIBUTOR = "contributor"
-    ACTIVE_CONTRIBUTOR = "active_contributor"
-    CULTURAL_AMBASSADOR = "cultural_ambassador"
-    HERITAGE_GUARDIAN = "heritage_guardian"
-class AchievementType(Enum):
-    """Types of achievements users can earn"""
-    FIRST_CONTRIBUTION = "first_contribution"
-    MULTILINGUAL = "multilingual"
-    STORYTELLER = "storyteller"
-    RECIPE_MASTER = "recipe_master"
-    MEME_CREATOR = "meme_creator"
-    LANDMARK_EXPLORER = "landmark_explorer"
-    CULTURAL_BRIDGE = "cultural_bridge"
-    CONSISTENCY_CHAMPION = "consistency_champion"
-    QUALITY_CONTRIBUTOR = "quality_contributor"
-    COMMUNITY_BUILDER = "community_builder"
-@dataclass
-class Achievement:
-    """User achievement"""
-    type: AchievementType
-    title: str
-    description: str
-    icon: str
-    earned_date: datetime
-    points: int
-@dataclass
-class UserStats:
-    """User statistics and engagement metrics"""
-    total_contributions: int
-    contributions_by_activity: Dict[str, int]
-    contributions_by_language: Dict[str, int]
-    engagement_level: EngagementLevel
-    achievements: List[Achievement]
-    total_points: int
-    streak_days: int
-    last_contribution_date: Optional[datetime]
-    favorite_activity: Optional[str]
-    cultural_impact_score: float
-class EngagementService:
-    """Service for managing user engagement and feedback"""
-    def __init__(self):
-        self.logger = logging.getLogger(__name__)
-        self.storage_service = StorageService()
-        self.language_service = LanguageService()
-        # Initialize engagement tracking in session state
-        if 'user_stats' not in st.session_state:
-            st.session_state.user_stats = None
-        if 'recent_achievements' not in st.session_state:
-            st.session_state.recent_achievements = []
-        if 'onboarding_completed' not in st.session_state:
-            st.session_state.onboarding_completed = False
-    def get_user_stats(self, user_session_id: str) -> UserStats:
-        """Get comprehensive user statistics"""
-        try:
-            # Get user contributions
-            contributions = self.storage_service.get_contributions_by_session(user_session_id)
-            if not contributions:
-                return self._create_new_user_stats()
-            # Calculate statistics
-            total_contributions = len(contributions)
-            # Group by activity type
-            contributions_by_activity = {}
-            for contrib in contributions:
-                activity = contrib.activity_type.value
-                contributions_by_activity[activity] = contributions_by_activity.get(activity, 0) + 1
-            # Group by language
-            contributions_by_language = {}
-            for contrib in contributions:
-                lang = contrib.language
-                contributions_by_language[lang] = contributions_by_language.get(lang, 0) + 1
-            # Calculate engagement level
-            engagement_level = self._calculate_engagement_level(total_contributions, contributions_by_language)
-            # Calculate achievements
-            achievements = self._calculate_achievements(contributions)
-            # Calculate points
-            total_points = sum(achievement.points for achievement in achievements)
-            # Calculate streak
-            streak_days = self._calculate_streak(contributions)
-            # Get last contribution date
-            last_contribution_date = max(contrib.timestamp for contrib in contributions) if contributions else None
-            # Find favorite activity
-            favorite_activity = max(contributions_by_activity, key=contributions_by_activity.get) if contributions_by_activity else None
-            # Calculate cultural impact score
-            cultural_impact_score = self._calculate_cultural_impact(contributions)
-            return UserStats(
-                total_contributions=total_contributions,
-                contributions_by_activity=contributions_by_activity,
-                contributions_by_language=contributions_by_language,
-                engagement_level=engagement_level,
-                achievements=achievements,
-                total_points=total_points,
-                streak_days=streak_days,
-                last_contribution_date=last_contribution_date,
-                favorite_activity=favorite_activity,
-                cultural_impact_score=cultural_impact_score
-            )
-        except Exception as e:
-            self.logger.error(f"Error calculating user stats: {e}")
-            return self._create_new_user_stats()
-    def _create_new_user_stats(self) -> UserStats:
-        """Create stats for new user"""
-        return UserStats(
-            total_contributions=0,
-            contributions_by_activity={},
-            contributions_by_language={},
-            engagement_level=EngagementLevel.NEWCOMER,
-            achievements=[],
-            total_points=0,
-            streak_days=0,
-            last_contribution_date=None,
-            favorite_activity=None,
-            cultural_impact_score=0.0
-        )
-    def _calculate_engagement_level(self, total_contributions: int,
-                                  contributions_by_language: Dict[str, int]) -> EngagementLevel:
-        """Calculate user engagement level based on contributions"""
-        num_languages = len(contributions_by_language)
-        if total_contributions >= 50 and num_languages >= 3:
-            return EngagementLevel.HERITAGE_GUARDIAN
-        elif total_contributions >= 25 and num_languages >= 2:
-            return EngagementLevel.CULTURAL_AMBASSADOR
-        elif total_contributions >= 10:
-            return EngagementLevel.ACTIVE_CONTRIBUTOR
-        elif total_contributions >= 3:
-            return EngagementLevel.CONTRIBUTOR
-        else:
-            return EngagementLevel.NEWCOMER
-    def _calculate_achievements(self, contributions: List[UserContribution]) -> List[Achievement]:
-        """Calculate user achievements based on contributions"""
-        achievements = []
-        if not contributions:
-            return achievements
-        # First contribution
-        if len(contributions) >= 1:
-            achievements.append(Achievement(
-                type=AchievementType.FIRST_CONTRIBUTION,
-                title="First Steps",
-                description="Made your first contribution to cultural preservation",
-                icon="🌟",
-                earned_date=contributions[0].timestamp,
-                points=10
-            ))
-        # Activity-specific achievements
-        activity_counts = {}
-        for contrib in contributions:
-            activity = contrib.activity_type.value
-            activity_counts[activity] = activity_counts.get(activity, 0) + 1
-        # Meme creator achievement
-        if activity_counts.get('meme', 0) >= 5:
-            achievements.append(Achievement(
-                type=AchievementType.MEME_CREATOR,
-                title="Meme Master",
-                description="Created 5+ cultural memes",
-                icon="🎭",
-                earned_date=datetime.now(),
-                points=25
-            ))
-        # Recipe master achievement
-        if activity_counts.get('recipe', 0) >= 3:
-            achievements.append(Achievement(
-                type=AchievementType.RECIPE_MASTER,
-                title="Recipe Keeper",
-                description="Shared 3+ family recipes",
-                icon="🍛",
-                earned_date=datetime.now(),
-                points=30
-            ))
-        # Storyteller achievement
-        if activity_counts.get('folklore', 0) >= 3:
-            achievements.append(Achievement(
-                type=AchievementType.STORYTELLER,
-                title="Master Storyteller",
-                description="Preserved 3+ traditional stories",
-                icon="📚",
-                earned_date=datetime.now(),
-                points=35
-            ))
-        # Landmark explorer achievement
-        if activity_counts.get('landmark', 0) >= 5:
-            achievements.append(Achievement(
-                type=AchievementType.LANDMARK_EXPLORER,
-                title="Heritage Explorer",
-                description="Documented 5+ cultural landmarks",
-                icon="🏛️",
-                earned_date=datetime.now(),
-                points=40
-            ))
-        # Multilingual achievement
-        languages = set(contrib.language for contrib in contributions)
-        if len(languages) >= 2:
-            achievements.append(Achievement(
-                type=AchievementType.MULTILINGUAL,
-                title="Cultural Bridge",
-                description=f"Contributed in {len(languages)} languages",
-                icon="🌍",
-                earned_date=datetime.now(),
-                points=20
-            ))
-        # Quality contributor achievement
-        high_quality_contributions = sum(1 for contrib in contributions
-                                       if len(str(contrib.content_data)) > 100)
-        if high_quality_contributions >= 5:
-            achievements.append(Achievement(
-                type=AchievementType.QUALITY_CONTRIBUTOR,
-                title="Quality Guardian",
-                description="Consistently provides detailed contributions",
-                icon="💎",
-                earned_date=datetime.now(),
-                points=50
-            ))
-        return achievements
-    def _calculate_streak(self, contributions: List[UserContribution]) -> int:
-        """Calculate user's contribution streak in days"""
-        if not contributions:
-            return 0
-        # Sort contributions by date
-        sorted_contributions = sorted(contributions, key=lambda x: x.timestamp, reverse=True)
-        # Get unique contribution dates
-        contribution_dates = list(set(contrib.timestamp.date() for contrib in sorted_contributions))
-        contribution_dates.sort(reverse=True)
-        if not contribution_dates:
-            return 0
-        # Calculate streak from most recent date
-        streak = 0
-        current_date = datetime.now().date()
-        for i, contrib_date in enumerate(contribution_dates):
-            expected_date = current_date - timedelta(days=i)
-            if contrib_date == expected_date or (i == 0 and contrib_date == current_date - timedelta(days=1)):
-                streak += 1
-            else:
-                break
-        return streak
-    def _calculate_cultural_impact(self, contributions: List[UserContribution]) -> float:
-        """Calculate cultural impact score based on contribution quality and diversity"""
-        if not contributions:
-            return 0.0
-        impact_score = 0.0
-        # Base score for each contribution
-        impact_score += len(contributions) * 10
-        # Bonus for language diversity
-        languages = set(contrib.language for contrib in contributions)
-        impact_score += len(languages) * 15
-        # Bonus for activity diversity
-        activities = set(contrib.activity_type.value for contrib in contributions)
-        impact_score += len(activities) * 20
-        # Bonus for cultural context richness
-        for contrib in contributions:
-            cultural_context = contrib.cultural_context
-            if cultural_context.get('cultural_significance'):
-                impact_score += 5
-            if cultural_context.get('region'):
-                impact_score += 3
-        # Normalize to 0-100 scale
-        max_possible_score = len(contributions) * 50  # Rough estimate
-        normalized_score = min(100.0, (impact_score / max_possible_score) * 100) if max_possible_score > 0 else 0.0
-        return round(normalized_score, 1)
-    def render_user_dashboard(self, user_session_id: str):
-        """Render user engagement dashboard"""
-        st.subheader("🏆 Your Cultural Impact Dashboard")
-        # Get user stats
-        user_stats = self.get_user_stats(user_session_id)
-        st.session_state.user_stats = user_stats
-        # Overview metrics
-        col1, col2, col3, col4 = st.columns(4)
-        with col1:
-            st.metric(
-                "Contributions",
-                user_stats.total_contributions,
-                delta=f"+{user_stats.total_contributions}" if user_stats.total_contributions > 0 else None
-            )
-        with col2:
-            st.metric(
-                "Languages",
-                len(user_stats.contributions_by_language),
-                delta=f"+{len(user_stats.contributions_by_language)}" if user_stats.contributions_by_language else None
-            )
-        with col3:
-            st.metric(
-                "Points",
-                user_stats.total_points,
-                delta=f"+{user_stats.total_points}" if user_stats.total_points > 0 else None
-            )
-        with col4:
-            st.metric(
-                "Streak",
-                f"{user_stats.streak_days} days",
-                delta=f"+{user_stats.streak_days}" if user_stats.streak_days > 0 else None
-            )
-        # Engagement level
-        level_info = self._get_engagement_level_info(user_stats.engagement_level)
-        st.markdown(f"### {level_info['icon']} {level_info['title']}")
-        st.markdown(f"*{level_info['description']}*")
-        # Progress to next level
-        self._render_progress_to_next_level(user_stats)
-        # Cultural impact score
-        st.markdown(f"### 🌟 Cultural Impact Score: {user_stats.cultural_impact_score}/100")
-        st.progress(user_stats.cultural_impact_score / 100)
-        # Activity breakdown
-        if user_stats.contributions_by_activity:
-            st.markdown("### 📊 Your Contributions by Activity")
-            activity_names = {
-                'meme': '🎭 Memes',
-                'recipe': '🍛 Recipes',
-                'folklore': '📚 Folklore',
-                'landmark': '🏛️ Landmarks'
-            }
-            cols = st.columns(len(user_stats.contributions_by_activity))
-            for i, (activity, count) in enumerate(user_stats.contributions_by_activity.items()):
-                with cols[i]:
-                    st.metric(activity_names.get(activity, activity.title()), count)
-        # Recent achievements
-        if user_stats.achievements:
-            st.markdown("### 🏅 Your Achievements")
-            self._render_achievements(user_stats.achievements)
-    def _get_engagement_level_info(self, level: EngagementLevel) -> Dict[str, str]:
-        """Get display information for engagement level"""
-        level_info = {
-            EngagementLevel.NEWCOMER: {
-                'icon': '🌱',
-                'title': 'Cultural Newcomer',
-                'description': 'Welcome to your cultural preservation journey!'
-            },
-            EngagementLevel.CONTRIBUTOR: {
-                'icon': '🌿',
-                'title': 'Cultural Contributor',
-                'description': 'You\'re making meaningful contributions to cultural preservation!'
-            },
-            EngagementLevel.ACTIVE_CONTRIBUTOR: {
-                'icon': '🌳',
-                'title': 'Active Cultural Contributor',
-                'description': 'Your dedication to cultural preservation is inspiring!'
-            },
-            EngagementLevel.CULTURAL_AMBASSADOR: {
-                'icon': '🏛️',
-                'title': 'Cultural Ambassador',
-                'description': 'You\'re a true ambassador of cultural heritage!'
-            },
-            EngagementLevel.HERITAGE_GUARDIAN: {
-                'icon': '👑',
-                'title': 'Heritage Guardian',
-                'description': 'You\'re a guardian of cultural heritage for future generations!'
-            }
-        }
-        return level_info.get(level, level_info[EngagementLevel.NEWCOMER])
-    def _render_progress_to_next_level(self, user_stats: UserStats):
-        """Render progress towards next engagement level"""
-        current_level = user_stats.engagement_level
-        total_contributions = user_stats.total_contributions
-        num_languages = len(user_stats.contributions_by_language)
-        # Define requirements for next level
-        next_level_requirements = {
-            EngagementLevel.NEWCOMER: {'contributions': 3, 'languages': 1, 'next': 'Contributor'},
-            EngagementLevel.CONTRIBUTOR: {'contributions': 10, 'languages': 1, 'next': 'Active Contributor'},
-            EngagementLevel.ACTIVE_CONTRIBUTOR: {'contributions': 25, 'languages': 2, 'next': 'Cultural Ambassador'},
-            EngagementLevel.CULTURAL_AMBASSADOR: {'contributions': 50, 'languages': 3, 'next': 'Heritage Guardian'},
-            EngagementLevel.HERITAGE_GUARDIAN: {'contributions': float('inf'), 'languages': float('inf'), 'next': 'Maximum Level Reached!'}
-        }
-        requirements = next_level_requirements.get(current_level)
-        if not requirements or current_level == EngagementLevel.HERITAGE_GUARDIAN:
-            return
-        st.markdown(f"### 🎯 Progress to {requirements['next']}")
-        # Contributions progress
-        contrib_progress = min(1.0, total_contributions / requirements['contributions'])
-        st.markdown(f"**Contributions:** {total_contributions}/{requirements['contributions']}")
-        st.progress(contrib_progress)
-        # Languages progress
-        if requirements['languages'] > 1:
-            lang_progress = min(1.0, num_languages / requirements['languages'])
-            st.markdown(f"**Languages:** {num_languages}/{requirements['languages']}")
-            st.progress(lang_progress)
-    def _render_achievements(self, achievements: List[Achievement]):
-        """Render user achievements"""
-        if not achievements:
-            st.info("Complete activities to earn your first achievement!")
-            return
-        # Sort achievements by points (highest first)
-        sorted_achievements = sorted(achievements, key=lambda x: x.points, reverse=True)
-        cols = st.columns(min(3, len(sorted_achievements)))
-        for i, achievement in enumerate(sorted_achievements):
-            with cols[i % 3]:
-                st.markdown(f"""
-                <div style="
-                    border: 2px solid #FF6B35;
-                    border-radius: 10px;
-                    padding: 16px;
-                    text-align: center;
-                    background: linear-gradient(135deg, #FF6B35, #F7931E);
-                    color: white;
-                    margin: 8px 0;
-                ">
-                    <div style="font-size: 40px; margin-bottom: 8px;">{achievement.icon}</div>
-                    <div style="font-weight: bold; font-size: 16px; margin-bottom: 4px;">{achievement.title}</div>
-                    <div style="font-size: 12px; opacity: 0.9; margin-bottom: 8px;">{achievement.description}</div>
-                    <div style="font-size: 14px; font-weight: bold;">{achievement.points} points</div>
-                </div>
-                """, unsafe_allow_html=True)
-    def render_immediate_feedback(self, contribution: UserContribution):
-        """Render immediate feedback after contribution"""
-        st.success("🎉 Contribution submitted successfully!")
-        # Show immediate impact
-        impact_messages = {
-            ActivityType.MEME: "Your meme adds humor and cultural context to our collection!",
-            ActivityType.RECIPE: "Your recipe preserves culinary traditions for future generations!",
-            ActivityType.FOLKLORE: "Your story keeps traditional wisdom alive!",
-            ActivityType.LANDMARK: "Your landmark documentation enriches our cultural map!"
-        }
-        message = impact_messages.get(contribution.activity_type, "Your contribution enriches our cultural heritage!")
-        st.info(f"💫 {message}")
-        # Check for new achievements
-        user_session_id = st.session_state.get('user_session_id', 'anonymous')
-        user_stats = self.get_user_stats(user_session_id)
-        # Show achievement notifications
-        self._check_and_show_new_achievements(user_stats)
-        # Show progress update
-        self._show_progress_update(user_stats)
-    def _check_and_show_new_achievements(self, user_stats: UserStats):
-        """Check for and display new achievements"""
-        # This is a simplified version - in a full implementation,
-        # you'd track which achievements are new since last session
-        if user_stats.achievements and user_stats.total_contributions <= 3:
-            # Show achievement for new users
-            latest_achievement = user_stats.achievements[-1]
-            st.balloons()
-            st.markdown(f"""
-            <div style="
-                background: linear-gradient(135deg, #4CAF50, #45a049);
-                color: white;
-                padding: 20px;
-                border-radius: 10px;
-                text-align: center;
-                margin: 16px 0;
-            ">
-                <div style="font-size: 50px; margin-bottom: 10px;">{latest_achievement.icon}</div>
-                <div style="font-size: 24px; font-weight: bold; margin-bottom: 8px;">Achievement Unlocked!</div>
-                <div style="font-size: 18px; margin-bottom: 4px;">{latest_achievement.title}</div>
-                <div style="font-size: 14px; opacity: 0.9;">{latest_achievement.description}</div>
-                <div style="font-size: 16px; font-weight: bold; margin-top: 10px;">+{latest_achievement.points} points</div>
-            </div>
-            """, unsafe_allow_html=True)
-    def _show_progress_update(self, user_stats: UserStats):
-        """Show progress update after contribution"""
-        col1, col2 = st.columns(2)
-        with col1:
-            st.metric(
-                "Total Contributions",
-                user_stats.total_contributions,
-                delta=1
-            )
-        with col2:
-            st.metric(
-                "Cultural Impact",
-                f"{user_stats.cultural_impact_score}/100",
-                delta=f"+{round(user_stats.cultural_impact_score / user_stats.total_contributions, 1) if user_stats.total_contributions > 0 else 0}"
-            )
-    def render_onboarding_flow(self) -> bool:
-        """Render onboarding flow for new users - Auto-complete for public deployment"""
-        if st.session_state.onboarding_completed:
-            return True
-        # Auto-complete onboarding for Hugging Face Spaces deployment
-        st.session_state.onboarding_completed = True
-        return True
-    def render_social_sharing(self, contribution: UserContribution):
-        """Render social sharing options"""
-        st.markdown("### 📢 Share Your Contribution")
-        # Generate sharing text
-        activity_names = {
-            ActivityType.MEME: "meme",
-            ActivityType.RECIPE: "family recipe",
-            ActivityType.FOLKLORE: "traditional story",
-            ActivityType.LANDMARK: "cultural landmark"
-        }
-        activity_name = activity_names.get(contribution.activity_type, "cultural contribution")
-        sharing_text = f"I just shared a {activity_name} on Corpus Collection Engine! 🇮🇳 Join me in preserving Indian cultural heritage through AI. #CulturalHeritage #IndianCulture #AI4Culture"
-        # Social sharing buttons (simplified - in production, use proper sharing APIs)
-        col1, col2, col3 = st.columns(3)
-        with col1:
-            if st.button("📱 Share on WhatsApp", use_container_width=True):
-                whatsapp_url = f"https://wa.me/?text={sharing_text}"
-                st.markdown(f"[Open WhatsApp]({whatsapp_url})")
-        with col2:
-            if st.button("🐦 Share on Twitter", use_container_width=True):
-                twitter_url = f"https://twitter.com/intent/tweet?text={sharing_text}"
-                st.markdown(f"[Open Twitter]({twitter_url})")
-        with col3:
-            if st.button("📋 Copy Link", use_container_width=True):
-                st.code(sharing_text)
-                st.success("Text copied! Share it anywhere you like!")
-    def get_engagement_analytics(self) -> Dict[str, Any]:
-        """Get engagement analytics for the platform"""
-        try:
-            # Get all contributions for analytics
-            all_stats = self.storage_service.get_statistics()
-            return {
-                'total_users': len(set()), # Would need user tracking
-                'total_contributions': all_stats.get('total_contributions', 0),
-                'contributions_by_language': all_stats.get('contributions_by_language', {}),
-                'contributions_by_activity': all_stats.get('contributions_by_activity', {}),
-                'engagement_trends': {},  # Would calculate from historical data
-                'achievement_distribution': {},  # Would calculate from user achievements
-                'cultural_impact_total': 0  # Would sum all user impact scores
-            }
-        except Exception as e:
-            self.logger.error(f"Error getting engagement analytics: {e}")
-            return {}
-    def render_session_summary(self):
-        """Render session summary for user engagement"""
-        try:
-            # Get current session stats
-            user_session_id = st.session_state.get('user_session_id', 'anonymous')
-            user_stats = self.get_user_stats(user_session_id)
-            # Only show if user has made contributions
-            if user_stats.total_contributions > 0:
-                with st.sidebar:
-                    st.markdown("---")
-                    st.markdown("### 🏆 Session Summary")
-                    col1, col2 = st.columns(2)
-                    with col1:
-                        st.metric("Contributions", user_stats.total_contributions)
-                    with col2:
-                        st.metric("Impact", f"{user_stats.cultural_impact_score:.1f}")
-                    # Show current level
-                    level_info = self._get_engagement_level_info(user_stats.engagement_level)
-                    st.markdown(f"**Level:** {level_info['icon']} {level_info['title']}")
-                    # Show recent achievements
-                    if user_stats.achievements:
-                        st.markdown("**Latest Achievement:**")
-                        latest = user_stats.achievements[-1]
-                        st.markdown(f"{latest.icon} {latest.title}")
-                    # Encourage continued participation
-                    if user_stats.total_contributions < 5:
-                        st.info("Keep contributing to unlock more achievements! 🌟")
-        except Exception as e:
-            self.logger.error(f"Error rendering session summary: {e}")
-            # Fail silently to not disrupt user experience

intern_project/corpus_collection_engine/services/language_service.py DELETED Viewed

@@ -1,295 +0,0 @@
-"""
-Language service for Indic language processing and detection
-"""
-from typing import Dict, List, Optional, Tuple
-import logging
-# Try to import langdetect, fall back to basic detection if not available
-try:
-    from langdetect import detect, DetectorFactory
-    from langdetect.lang_detect_exception import LangDetectException
-    LANGDETECT_AVAILABLE = True
-    # Set seed for consistent language detection results
-    DetectorFactory.seed = 0
-except ImportError:
-    LANGDETECT_AVAILABLE = False
-    LangDetectException = Exception
-from corpus_collection_engine.config import SUPPORTED_LANGUAGES
-class LanguageService:
-    """Service for language detection, validation, and processing"""
-    def __init__(self):
-        self.logger = logging.getLogger(__name__)
-        self.supported_languages = SUPPORTED_LANGUAGES
-        self.indic_scripts = {
-            'hi': 'देवनागरी',  # Devanagari
-            'bn': 'বাংলা',     # Bengali
-            'ta': 'தமிழ்',      # Tamil
-            'te': 'తెలుగు',     # Telugu
-            'ml': 'മലയാളം',    # Malayalam
-            'kn': 'ಕನ್ನಡ',     # Kannada
-            'gu': 'ગુજરાતી',   # Gujarati
-            'mr': 'मराठी',     # Marathi
-            'pa': 'ਪੰਜਾਬੀ',    # Punjabi
-            'or': 'ଓଡ଼ିଆ'      # Odia
-        }
-        # Unicode ranges for Indic scripts
-        self.script_ranges = {
-            'devanagari': (0x0900, 0x097F),
-            'bengali': (0x0980, 0x09FF),
-            'tamil': (0x0B80, 0x0BFF),
-            'telugu': (0x0C00, 0x0C7F),
-            'malayalam': (0x0D00, 0x0D7F),
-            'kannada': (0x0C80, 0x0CFF),
-            'gujarati': (0x0A80, 0x0AFF),
-            'punjabi': (0x0A00, 0x0A7F),
-            'odia': (0x0B00, 0x0B7F)
-        }
-    def detect_language(self, text: str, confidence_threshold: float = 0.7) -> Tuple[Optional[str], float]:
-        """
-        Detect language from text with confidence score
-        Args:
-            text: Input text to analyze
-            confidence_threshold: Minimum confidence required
-        Returns:
-            Tuple of (language_code, confidence_score)
-        """
-        if not text or not text.strip():
-            return None, 0.0
-        text = text.strip()
-        # First try script-based detection for Indic languages
-        script_lang = self._detect_by_script(text)
-        if script_lang:
-            return script_lang, 0.9  # High confidence for script-based detection
-        # Fall back to langdetect library if available
-        if LANGDETECT_AVAILABLE:
-            try:
-                detected_lang = detect(text)
-                # Map some common language codes
-                lang_mapping = {
-                    'hi': 'hi',  # Hindi
-                    'bn': 'bn',  # Bengali
-                    'ta': 'ta',  # Tamil
-                    'te': 'te',  # Telugu
-                    'ml': 'ml',  # Malayalam
-                    'kn': 'kn',  # Kannada
-                    'gu': 'gu',  # Gujarati
-                    'mr': 'mr',  # Marathi
-                    'pa': 'pa',  # Punjabi
-                    'or': 'or',  # Odia
-                    'en': 'en'   # English
-                }
-                mapped_lang = lang_mapping.get(detected_lang)
-                if mapped_lang and mapped_lang in self.supported_languages:
-                    return mapped_lang, 0.8  # Good confidence for library detection
-                # If detected language is not in our supported list, return English as fallback
-                return 'en', 0.5
-            except LangDetectException:
-                # If detection fails, return English as fallback
-                return 'en', 0.3
-        else:
-            # If langdetect is not available, use basic heuristics
-            return self._basic_language_detection(text)
-    def _detect_by_script(self, text: str) -> Optional[str]:
-        """Detect language based on Unicode script ranges"""
-        script_counts = {}
-        for char in text:
-            char_code = ord(char)
-            # Check each script range
-            for script, (start, end) in self.script_ranges.items():
-                if start <= char_code <= end:
-                    script_counts[script] = script_counts.get(script, 0) + 1
-                    break
-        if not script_counts:
-            return None
-        # Find the most common script
-        dominant_script = max(script_counts, key=script_counts.get)
-        # Map script to language code
-        script_to_lang = {
-            'devanagari': 'hi',  # Could be Hindi or Marathi, default to Hindi
-            'bengali': 'bn',
-            'tamil': 'ta',
-            'telugu': 'te',
-            'malayalam': 'ml',
-            'kannada': 'kn',
-            'gujarati': 'gu',
-            'punjabi': 'pa',
-            'odia': 'or'
-        }
-        return script_to_lang.get(dominant_script)
-    def _basic_language_detection(self, text: str) -> Tuple[str, float]:
-        """Basic language detection when langdetect is not available"""
-        # Check for common English patterns
-        english_words = ['the', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by']
-        text_lower = text.lower()
-        english_count = sum(1 for word in english_words if word in text_lower)
-        if english_count > 0:
-            return 'en', 0.6
-        # Check for common Hindi words (in Devanagari)
-        hindi_words = ['और', 'का', 'की', 'के', 'में', 'से', 'को', 'है', 'हैं', 'था', 'थी', 'थे']
-        hindi_count = sum(1 for word in hindi_words if word in text)
-        if hindi_count > 0:
-            return 'hi', 0.6
-        # Default to English if no patterns match
-        return 'en', 0.3
-    def validate_language_code(self, language_code: str) -> bool:
-        """Validate if language code is supported"""
-        return language_code in self.supported_languages
-    def get_language_name(self, language_code: str) -> str:
-        """Get human-readable language name"""
-        return self.supported_languages.get(language_code, "Unknown")
-    def get_native_script_name(self, language_code: str) -> str:
-        """Get native script name for the language"""
-        return self.indic_scripts.get(language_code, language_code.upper())
-    def is_indic_language(self, language_code: str) -> bool:
-        """Check if language is an Indic language"""
-        return language_code in self.indic_scripts
-    def transliterate_to_latin(self, text: str, source_language: str) -> str:
-        """
-        Basic transliteration to Latin script (placeholder implementation)
-        In a full implementation, this would use proper transliteration libraries
-        """
-        # This is a simplified implementation
-        # In production, you'd use libraries like indic-transliteration
-        if not self.is_indic_language(source_language):
-            return text
-        # For now, just return the original text
-        # TODO: Implement proper transliteration using indic-transliteration library
-        return text
-    def get_text_statistics(self, text: str) -> Dict[str, any]:
-        """Get statistics about the text"""
-        if not text:
-            return {
-                'character_count': 0,
-                'word_count': 0,
-                'detected_language': None,
-                'confidence': 0.0,
-                'script_distribution': {}
-            }
-        # Basic statistics
-        char_count = len(text)
-        word_count = len(text.split())
-        # Language detection
-        detected_lang, confidence = self.detect_language(text)
-        # Script distribution
-        script_dist = self._get_script_distribution(text)
-        return {
-            'character_count': char_count,
-            'word_count': word_count,
-            'detected_language': detected_lang,
-            'confidence': confidence,
-            'script_distribution': script_dist
-        }
-    def _get_script_distribution(self, text: str) -> Dict[str, float]:
-        """Get distribution of different scripts in text"""
-        script_counts = {}
-        total_chars = 0
-        for char in text:
-            if char.isalpha():  # Only count alphabetic characters
-                total_chars += 1
-                char_code = ord(char)
-                # Check each script range
-                script_found = False
-                for script, (start, end) in self.script_ranges.items():
-                    if start <= char_code <= end:
-                        script_counts[script] = script_counts.get(script, 0) + 1
-                        script_found = True
-                        break
-                # If not in any Indic script, assume Latin
-                if not script_found and char.isascii():
-                    script_counts['latin'] = script_counts.get('latin', 0) + 1
-        # Convert to percentages
-        if total_chars == 0:
-            return {}
-        return {script: (count / total_chars) * 100
-                for script, count in script_counts.items()}
-    def suggest_language_from_region(self, region: str) -> List[str]:
-        """Suggest likely languages based on region"""
-        region = region.lower().strip()
-        # Regional language mapping
-        region_languages = {
-            'maharashtra': ['mr', 'hi'],
-            'karnataka': ['kn', 'hi'],
-            'tamil nadu': ['ta', 'hi'],
-            'andhra pradesh': ['te', 'hi'],
-            'telangana': ['te', 'hi'],
-            'kerala': ['ml', 'hi'],
-            'west bengal': ['bn', 'hi'],
-            'gujarat': ['gu', 'hi'],
-            'punjab': ['pa', 'hi'],
-            'odisha': ['or', 'hi'],
-            'delhi': ['hi'],
-            'uttar pradesh': ['hi'],
-            'bihar': ['hi'],
-            'rajasthan': ['hi'],
-            'madhya pradesh': ['hi'],
-            'haryana': ['hi']
-        }
-        # Find matching regions
-        for region_key, languages in region_languages.items():
-            if region_key in region:
-                return languages
-        # Default to Hindi and English
-        return ['hi', 'en']
-    def get_supported_languages_list(self) -> List[Dict[str, str]]:
-        """Get list of supported languages with metadata"""
-        languages = []
-        for code, name in self.supported_languages.items():
-            languages.append({
-                'code': code,
-                'name': name,
-                'native_name': self.get_native_script_name(code),
-                'is_indic': self.is_indic_language(code)
-            })
-        return languages

intern_project/corpus_collection_engine/services/privacy_service.py DELETED Viewed

@@ -1,1069 +0,0 @@
-"""
-Privacy and Consent Management Service
-"""
-import streamlit as st
-from typing import Dict, List, Any, Optional, Tuple
-from datetime import datetime, timedelta
-from dataclasses import dataclass
-from enum import Enum
-import json
-import logging
-import hashlib
-from corpus_collection_engine.models.data_models import UserContribution
-from corpus_collection_engine.services.storage_service import StorageService
-class ConsentType(Enum):
-    """Types of consent that can be given"""
-    DATA_COLLECTION = "data_collection"
-    AI_TRAINING = "ai_training"
-    RESEARCH_USE = "research_use"
-    PUBLIC_SHARING = "public_sharing"
-    ANALYTICS = "analytics"
-    MARKETING = "marketing"
-class DataCategory(Enum):
-    """Categories of data being processed"""
-    CULTURAL_CONTENT = "cultural_content"
-    LANGUAGE_DATA = "language_data"
-    REGIONAL_INFO = "regional_info"
-    USER_BEHAVIOR = "user_behavior"
-    TECHNICAL_DATA = "technical_data"
-@dataclass
-class ConsentRecord:
-    """Record of user consent"""
-    user_session: str
-    consent_type: ConsentType
-    granted: bool
-    timestamp: datetime
-    version: str
-    ip_hash: Optional[str] = None
-    user_agent_hash: Optional[str] = None
-@dataclass
-class PrivacySettings:
-    """User privacy settings"""
-    user_session: str
-    consents: Dict[ConsentType, ConsentRecord]
-    data_retention_days: int
-    anonymize_data: bool
-    allow_data_export: bool
-    created_at: datetime
-    updated_at: datetime
-class PrivacyService:
-    """Service for managing user privacy and consent"""
-    def __init__(self):
-        self.logger = logging.getLogger(__name__)
-        self.storage_service = StorageService()
-        # Privacy policy version
-        self.current_privacy_version = "1.0"
-        self.current_terms_version = "1.0"
-        # Initialize privacy state
-        if "privacy_initialized" not in st.session_state:
-            st.session_state.privacy_initialized = False
-            st.session_state.consent_given = {}
-            st.session_state.privacy_settings = None
-            st.session_state.show_privacy_banner = True
-    def initialize_privacy_management(self):
-        """Initialize privacy management system"""
-        if st.session_state.privacy_initialized:
-            return
-        try:
-            # Load existing privacy settings if available
-            user_session_id = st.session_state.get("user_session_id", "anonymous")
-            self._load_privacy_settings(user_session_id)
-            st.session_state.privacy_initialized = True
-            self.logger.info("Privacy management initialized")
-        except Exception as e:
-            self.logger.error(f"Privacy management initialization failed: {e}")
-    def render_consent_interface(self) -> bool:
-        """Render consent interface - Auto-consent for public deployment"""
-        # Auto-consent for Hugging Face Spaces deployment
-        # This removes the need for explicit consent flow
-        return True
-    def render_privacy_banner(self):
-        """Render privacy consent banner"""
-        if not st.session_state.show_privacy_banner:
-            return
-        # Check if user has already given essential consent
-        if self._has_essential_consent():
-            st.session_state.show_privacy_banner = False
-            return
-        # Render privacy banner
-        banner_html = """
-        <div style="
-            position: fixed;
-            bottom: 0;
-            left: 0;
-            right: 0;
-            background: linear-gradient(135deg, #2C3E50, #34495E);
-            color: white;
-            padding: 20px;
-            z-index: 9999;
-            box-shadow: 0 -2px 10px rgba(0,0,0,0.3);
-        ">
-            <div style="max-width: 1200px; margin: 0 auto;">
-                <h4 style="margin: 0 0 10px 0; color: #FF6B35;">🔒 Your Privacy Matters</h4>
-                <p style="margin: 0 0 15px 0; font-size: 14px; line-height: 1.4;">
-                    We use your contributions to preserve Indian cultural heritage and train AI systems.
-                    Your data is handled with respect and transparency.
-                </p>
-                <div style="display: flex; gap: 10px; flex-wrap: wrap;">
-                    <button onclick="window.parent.postMessage({type: 'ACCEPT_ESSENTIAL'}, '*')"
-                            style="background: #FF6B35; color: white; border: none; padding: 8px 16px; border-radius: 4px; cursor: pointer;">
-                        Accept Essential
-                    </button>
-                    <button onclick="window.parent.postMessage({type: 'CUSTOMIZE_PRIVACY'}, '*')"
-                            style="background: transparent; color: white; border: 1px solid white; padding: 8px 16px; border-radius: 4px; cursor: pointer;">
-                        Customize
-                    </button>
-                    <button onclick="window.parent.postMessage({type: 'VIEW_PRIVACY_POLICY'}, '*')"
-                            style="background: transparent; color: #FF6B35; border: none; padding: 8px 16px; cursor: pointer; text-decoration: underline;">
-                        Privacy Policy
-                    </button>
-                </div>
-            </div>
-        </div>
-        <script>
-            window.addEventListener('message', function(event) {
-                if (event.data.type === 'HIDE_PRIVACY_BANNER') {
-                    document.querySelector('[style*="position: fixed"]').style.display = 'none';
-                }
-            });
-        </script>
-        """
-        st.components.v1.html(banner_html, height=0)
-    def render_privacy_settings(self):
-        """Render comprehensive privacy settings interface"""
-        st.title("🔒 Privacy & Data Management")
-        st.markdown("*Control how your data is used for cultural preservation*")
-        # Current privacy status
-        user_session_id = st.session_state.get("user_session_id", "anonymous")
-        privacy_settings = self._get_privacy_settings(user_session_id)
-        # Privacy overview
-        st.subheader("📊 Your Privacy Status")
-        col1, col2, col3 = st.columns(3)
-        with col1:
-            essential_consent = self._has_essential_consent()
-            st.metric(
-                "Essential Consent",
-                "✅ Given" if essential_consent else "❌ Required",
-                delta="Required for app usage",
-            )
-        with col2:
-            total_consents = len(
-                [c for c in privacy_settings.consents.values() if c.granted]
-            )
-            st.metric(
-                "Active Consents", total_consents, delta=f"out of {len(ConsentType)}"
-            )
-        with col3:
-            data_retention = privacy_settings.data_retention_days
-            st.metric(
-                "Data Retention", f"{data_retention} days", delta="Automatic deletion"
-            )
-        # Consent management
-        st.subheader("✅ Consent Management")
-        consent_descriptions = {
-            ConsentType.DATA_COLLECTION: {
-                "title": "Data Collection",
-                "description": "Allow collection of your cultural contributions for preservation",
-                "essential": True,
-            },
-            ConsentType.AI_TRAINING: {
-                "title": "AI Training",
-                "description": "Use your contributions to train AI models for cultural understanding",
-                "essential": True,
-            },
-            ConsentType.RESEARCH_USE: {
-                "title": "Research Use",
-                "description": "Allow academic researchers to study your contributions (anonymized)",
-                "essential": False,
-            },
-            ConsentType.PUBLIC_SHARING: {
-                "title": "Public Sharing",
-                "description": "Share your contributions in public cultural archives",
-                "essential": False,
-            },
-            ConsentType.ANALYTICS: {
-                "title": "Analytics",
-                "description": "Use your data for platform improvement and analytics",
-                "essential": False,
-            },
-            ConsentType.MARKETING: {
-                "title": "Marketing Communications",
-                "description": "Receive updates about cultural preservation initiatives",
-                "essential": False,
-            },
-        }
-        consent_changes = {}
-        for consent_type, info in consent_descriptions.items():
-            current_consent = privacy_settings.consents.get(consent_type)
-            current_status = current_consent.granted if current_consent else False
-            col1, col2 = st.columns([3, 1])
-            with col1:
-                st.markdown(f"**{info['title']}**")
-                st.markdown(f"*{info['description']}*")
-                if info["essential"]:
-                    st.markdown("🔴 **Essential** - Required for app functionality")
-            with col2:
-                if info["essential"]:
-                    # Essential consents cannot be disabled
-                    st.checkbox(
-                        "Enabled",
-                        value=True,
-                        disabled=True,
-                        key=f"consent_{consent_type.value}",
-                    )
-                else:
-                    new_status = st.checkbox(
-                        "Enable",
-                        value=current_status,
-                        key=f"consent_{consent_type.value}",
-                    )
-                    if new_status != current_status:
-                        consent_changes[consent_type] = new_status
-            st.divider()
-        # Save consent changes
-        if consent_changes and st.button("💾 Save Privacy Settings", type="primary"):
-            self._update_consents(user_session_id, consent_changes)
-            st.success("Privacy settings updated successfully!")
-            st.rerun()
-        # Data management
-        st.subheader("📁 Data Management")
-        col1, col2 = st.columns(2)
-        with col1:
-            st.markdown("**Data Retention**")
-            new_retention = st.selectbox(
-                "How long should we keep your data?",
-                [30, 90, 180, 365, -1],
-                format_func=lambda x: f"{x} days" if x > 0 else "Keep indefinitely",
-                index=[30, 90, 180, 365, -1].index(
-                    privacy_settings.data_retention_days
-                ),
-            )
-            if new_retention != privacy_settings.data_retention_days:
-                if st.button("Update Retention Period"):
-                    self._update_data_retention(user_session_id, new_retention)
-                    st.success("Data retention period updated!")
-                    st.rerun()
-        with col2:
-            st.markdown("**Data Anonymization**")
-            new_anonymize = st.checkbox(
-                "Anonymize my contributions",
-                value=privacy_settings.anonymize_data,
-                help="Remove identifying information from your contributions",
-            )
-            if new_anonymize != privacy_settings.anonymize_data:
-                if st.button("Update Anonymization"):
-                    self._update_anonymization(user_session_id, new_anonymize)
-                    st.success("Anonymization setting updated!")
-                    st.rerun()
-        # Data export and deletion
-        st.subheader("📤 Your Data Rights")
-        col1, col2, col3 = st.columns(3)
-        with col1:
-            if st.button("📊 Export My Data", use_container_width=True):
-                self._export_user_data(user_session_id)
-        with col2:
-            if st.button("🔍 View My Contributions", use_container_width=True):
-                self._show_user_contributions(user_session_id)
-        with col3:
-            if st.button(
-                "🗑️ Delete My Data", use_container_width=True, type="secondary"
-            ):
-                self._show_data_deletion_options(user_session_id)
-    def render_privacy_policy(self):
-        """Render comprehensive privacy policy"""
-        st.title("📋 Privacy Policy")
-        st.markdown(
-            f"*Version {self.current_privacy_version} - Effective Date: January 1, 2024*"
-        )
-        st.markdown("""
-        ## 🎯 Our Mission
-        The Corpus Collection Engine is dedicated to preserving Indian cultural heritage through
-        AI-powered data collection. We believe in transparency, respect, and ethical data practices.
-        ## 📊 What Data We Collect
-        ### Cultural Contributions
-        - **Memes**: Text captions and cultural context you provide
-        - **Recipes**: Family recipes, ingredients, and cooking instructions
-        - **Folklore**: Traditional stories, proverbs, and cultural wisdom
-        - **Landmarks**: Photos and descriptions of cultural sites
-        ### Cultural Context
-        - Regional information you provide
-        - Cultural significance descriptions
-        - Language preferences
-        - Personal stories and family connections
-        ### Technical Data
-        - Session identifiers (anonymized)
-        - Language detection results
-        - Usage patterns and engagement metrics
-        - Device and browser information (anonymized)
-        ## 🎯 How We Use Your Data
-        ### Primary Purposes
-        1. **Cultural Preservation**: Building a comprehensive archive of Indian cultural heritage
-        2. **AI Training**: Teaching AI systems to understand and respect cultural diversity
-        3. **Research**: Supporting academic research on Indian languages and culture
-        4. **Community Building**: Connecting people through shared cultural experiences
-        ### Secondary Purposes (With Your Consent)
-        - Platform improvement and analytics
-        - Academic research collaborations
-        - Public cultural archives and exhibitions
-        - Educational resources and materials
-        ## 🔒 How We Protect Your Data
-        ### Security Measures
-        - **Encryption**: All data is encrypted in transit and at rest
-        - **Access Control**: Strict access controls and authentication
-        - **Anonymization**: Personal identifiers are removed or hashed
-        - **Regular Audits**: Security assessments and vulnerability testing
-        ### Data Minimization
-        - We only collect data necessary for our cultural preservation mission
-        - Personal information is separated from cultural content
-        - Automatic deletion of old session data
-        - Optional anonymization of all contributions
-        ## 👥 Data Sharing
-        ### We Never Share
-        - Personal identifying information
-        - Private session data
-        - Individual user behavior patterns
-        - Contact information or personal details
-        ### We May Share (With Consent)
-        - Anonymized cultural contributions with researchers
-        - Aggregated statistics for academic studies
-        - Cultural content for public archives
-        - Educational materials for cultural learning
-        ## ⚖️ Your Rights
-        ### Access Rights
-        - View all your contributions
-        - Download your data in standard formats
-        - See how your data is being used
-        - Review your consent history
-        ### Control Rights
-        - Modify or delete your contributions
-        - Change your privacy settings anytime
-        - Withdraw consent for non-essential uses
-        - Request data anonymization
-        ### Deletion Rights
-        - Delete individual contributions
-        - Request complete data deletion
-        - Automatic deletion after retention period
-        - Right to be forgotten
-        ## 🌍 International Considerations
-        ### Data Location
-        - Data is stored in secure facilities
-        - We comply with applicable data protection laws
-        - Cross-border transfers are protected
-        - Local data residency options available
-        ### Legal Compliance
-        - We follow Indian data protection regulations
-        - Compliance with international privacy standards
-        - Regular legal and compliance reviews
-        - Transparent reporting on data requests
-        ## 👶 Children's Privacy
-        - Our service is designed for users 13 and older
-        - We do not knowingly collect data from children under 13
-        - Parental consent required for users under 18
-        - Special protections for young users
-        ## 📞 Contact Us
-        ### Privacy Questions
-        If you have questions about this privacy policy or your data:
-        - **Email**: [email protected]
-        - **Address**: [Privacy Officer Address]
-        - **Response Time**: We respond within 30 days
-        ### Data Protection Officer
-        Our Data Protection Officer is available for privacy concerns:
-        - **Email**: [email protected]
-        - **Specialized Training**: Cultural data sensitivity
-        ## 🔄 Policy Updates
-        - We may update this policy to reflect changes in our practices
-        - Users will be notified of significant changes
-        - Continued use implies acceptance of updates
-        - Previous versions available upon request
-        ---
-        *This privacy policy reflects our commitment to ethical cultural preservation
-        and respect for user privacy. Thank you for helping preserve Indian cultural heritage!*
-        """)
-    def render_terms_of_service(self):
-        """Render terms of service"""
-        st.title("📜 Terms of Service")
-        st.markdown(
-            f"*Version {self.current_terms_version} - Effective Date: January 1, 2024*"
-        )
-        st.markdown("""
-        ## 🤝 Agreement to Terms
-        By using the Corpus Collection Engine, you agree to these terms of service.
-        If you disagree with any part of these terms, please do not use our service.
-        ## 🎯 Service Description
-        ### Our Mission
-        The Corpus Collection Engine is a platform for preserving Indian cultural heritage
-        through community contributions and AI-powered analysis.
-        ### What We Provide
-        - Tools for sharing cultural content (memes, recipes, folklore, landmarks)
-        - AI-powered features for content enhancement
-        - Community platform for cultural exchange
-        - Educational resources about Indian culture
-        ## 👤 User Responsibilities
-        ### Content Guidelines
-        - **Authenticity**: Share genuine cultural content
-        - **Respect**: Treat all cultures and communities with respect
-        - **Accuracy**: Provide accurate information to the best of your knowledge
-        - **Originality**: Only share content you have rights to share
-        ### Prohibited Content
-        - Hate speech or discriminatory content
-        - False or misleading cultural information
-        - Copyrighted material without permission
-        - Personal information of others
-        - Spam or commercial content
-        ### User Conduct
-        - Use the service for its intended cultural preservation purpose
-        - Respect other users and their contributions
-        - Follow community guidelines and cultural sensitivities
-        - Report inappropriate content or behavior
-        ## 🏛️ Intellectual Property
-        ### Your Content
-        - You retain ownership of your cultural contributions
-        - You grant us license to use your content for cultural preservation
-        - You can modify or delete your contributions anytime
-        - We respect traditional knowledge and cultural heritage rights
-        ### Our Platform
-        - The Corpus Collection Engine platform is our intellectual property
-        - You may not copy, modify, or distribute our software
-        - Our AI models and algorithms are proprietary
-        - Trademarks and logos are protected
-        ### Traditional Knowledge
-        - We respect indigenous and traditional knowledge rights
-        - Cultural content is treated with appropriate sensitivity
-        - Community ownership of cultural heritage is acknowledged
-        - Traditional knowledge is not claimed as our property
-        ## 🔒 Privacy and Data
-        - Your privacy is governed by our Privacy Policy
-        - We collect data only for cultural preservation purposes
-        - You have control over your data and privacy settings
-        - We implement strong security measures to protect your data
-        ## ⚠️ Disclaimers
-        ### Service Availability
-        - We strive for high availability but cannot guarantee 100% uptime
-        - Maintenance and updates may temporarily interrupt service
-        - We are not liable for service interruptions
-        ### Content Accuracy
-        - Cultural content is provided by community members
-        - We do not verify the accuracy of all cultural information
-        - Users should use their judgment when relying on cultural content
-        - We are not responsible for inaccuracies in user-generated content
-        ### AI Features
-        - AI-generated content is provided as assistance only
-        - AI may make mistakes or provide inaccurate suggestions
-        - Users should review and verify AI-generated content
-        - We continuously improve AI accuracy but cannot guarantee perfection
-        ## 📞 Support and Contact
-        ### Getting Help
-        - **Technical Support**: [email protected]
-        - **Cultural Questions**: [email protected]
-        - **Legal Issues**: [email protected]
-        ### Response Times
-        - We aim to respond to inquiries within 48 hours
-        - Complex issues may require additional time
-        - Emergency security issues are prioritized
-        ## 🔄 Changes to Terms
-        - We may update these terms to reflect service changes
-        - Users will be notified of significant changes
-        - Continued use implies acceptance of updated terms
-        - Previous versions available upon request
-        ## ⚖️ Legal Information
-        ### Governing Law
-        - These terms are governed by Indian law
-        - Disputes will be resolved in Indian courts
-        - We comply with applicable international laws
-        ### Limitation of Liability
-        - Our liability is limited to the extent permitted by law
-        - We are not liable for indirect or consequential damages
-        - Maximum liability is limited to service fees (if any)
-        ---
-        *Thank you for helping preserve Indian cultural heritage through the
-        Corpus Collection Engine. Together, we're building a lasting legacy
-        for future generations.*
-        """)
-    def _has_essential_consent(self) -> bool:
-        """Check if user has given essential consent"""
-        user_session_id = st.session_state.get("user_session_id", "anonymous")
-        privacy_settings = self._get_privacy_settings(user_session_id)
-        essential_consents = [ConsentType.DATA_COLLECTION, ConsentType.AI_TRAINING]
-        for consent_type in essential_consents:
-            consent_record = privacy_settings.consents.get(consent_type)
-            if not consent_record or not consent_record.granted:
-                return False
-        return True
-    def _get_privacy_settings(self, user_session_id: str) -> PrivacySettings:
-        """Get privacy settings for user"""
-        if st.session_state.privacy_settings:
-            return st.session_state.privacy_settings
-        # Create default privacy settings
-        default_consents = {}
-        # Essential consents are granted by default when user starts using the app
-        essential_consents = [ConsentType.DATA_COLLECTION, ConsentType.AI_TRAINING]
-        for consent_type in ConsentType:
-            is_essential = consent_type in essential_consents
-            default_consents[consent_type] = ConsentRecord(
-                user_session=user_session_id,
-                consent_type=consent_type,
-                granted=is_essential,
-                timestamp=datetime.now(),
-                version=self.current_privacy_version,
-            )
-        privacy_settings = PrivacySettings(
-            user_session=user_session_id,
-            consents=default_consents,
-            data_retention_days=365,
-            anonymize_data=False,
-            allow_data_export=True,
-            created_at=datetime.now(),
-            updated_at=datetime.now(),
-        )
-        st.session_state.privacy_settings = privacy_settings
-        return privacy_settings
-    def _load_privacy_settings(self, user_session_id: str):
-        """Load privacy settings from storage"""
-        try:
-            # In a full implementation, this would load from database
-            # For now, we'll use session state
-            if "privacy_settings" not in st.session_state:
-                st.session_state.privacy_settings = None
-        except Exception as e:
-            self.logger.error(f"Error loading privacy settings: {e}")
-    def _update_consents(
-        self, user_session_id: str, consent_changes: Dict[ConsentType, bool]
-    ):
-        """Update user consent preferences"""
-        try:
-            privacy_settings = self._get_privacy_settings(user_session_id)
-            for consent_type, granted in consent_changes.items():
-                # Create new consent record
-                consent_record = ConsentRecord(
-                    user_session=user_session_id,
-                    consent_type=consent_type,
-                    granted=granted,
-                    timestamp=datetime.now(),
-                    version=self.current_privacy_version,
-                    ip_hash=self._hash_ip(),
-                    user_agent_hash=self._hash_user_agent(),
-                )
-                privacy_settings.consents[consent_type] = consent_record
-            privacy_settings.updated_at = datetime.now()
-            st.session_state.privacy_settings = privacy_settings
-            self.logger.info(
-                f"Updated consents for user {user_session_id}: {consent_changes}"
-            )
-        except Exception as e:
-            self.logger.error(f"Error updating consents: {e}")
-    def _update_data_retention(self, user_session_id: str, retention_days: int):
-        """Update data retention period"""
-        try:
-            privacy_settings = self._get_privacy_settings(user_session_id)
-            privacy_settings.data_retention_days = retention_days
-            privacy_settings.updated_at = datetime.now()
-            st.session_state.privacy_settings = privacy_settings
-            self.logger.info(
-                f"Updated data retention for user {user_session_id}: {retention_days} days"
-            )
-        except Exception as e:
-            self.logger.error(f"Error updating data retention: {e}")
-    def _update_anonymization(self, user_session_id: str, anonymize: bool):
-        """Update data anonymization setting"""
-        try:
-            privacy_settings = self._get_privacy_settings(user_session_id)
-            privacy_settings.anonymize_data = anonymize
-            privacy_settings.updated_at = datetime.now()
-            st.session_state.privacy_settings = privacy_settings
-            self.logger.info(
-                f"Updated anonymization for user {user_session_id}: {anonymize}"
-            )
-        except Exception as e:
-            self.logger.error(f"Error updating anonymization: {e}")
-    def _export_user_data(self, user_session_id: str):
-        """Export all user data"""
-        try:
-            # Get user contributions
-            contributions = self.storage_service.get_contributions_by_session(
-                user_session_id
-            )
-            # Get privacy settings
-            privacy_settings = self._get_privacy_settings(user_session_id)
-            # Prepare export data
-            export_data = {
-                "user_session": user_session_id,
-                "export_date": datetime.now().isoformat(),
-                "privacy_settings": {
-                    "data_retention_days": privacy_settings.data_retention_days,
-                    "anonymize_data": privacy_settings.anonymize_data,
-                    "allow_data_export": privacy_settings.allow_data_export,
-                    "consents": {
-                        consent_type.value: {
-                            "granted": record.granted,
-                            "timestamp": record.timestamp.isoformat(),
-                            "version": record.version,
-                        }
-                        for consent_type, record in privacy_settings.consents.items()
-                    },
-                },
-                "contributions": [],
-            }
-            # Add contributions data
-            for contrib in contributions:
-                contrib_data = {
-                    "id": contrib.id,
-                    "activity_type": contrib.activity_type.value,
-                    "language": contrib.language,
-                    "timestamp": contrib.timestamp.isoformat(),
-                    "content_data": contrib.content_data,
-                    "cultural_context": contrib.cultural_context,
-                    "validation_status": contrib.validation_status.value,
-                }
-                export_data["contributions"].append(contrib_data)
-            # Create download
-            export_json = json.dumps(export_data, indent=2, ensure_ascii=False)
-            st.download_button(
-                label="📥 Download My Data (JSON)",
-                data=export_json,
-                file_name=f"my_cultural_data_{datetime.now().strftime('%Y%m%d')}.json",
-                mime="application/json",
-            )
-            st.success(f"Data export ready! Found {len(contributions)} contributions.")
-        except Exception as e:
-            self.logger.error(f"Error exporting user data: {e}")
-            st.error("Failed to export data. Please try again.")
-    def _show_user_contributions(self, user_session_id: str):
-        """Show user's contributions"""
-        try:
-            contributions = self.storage_service.get_contributions_by_session(
-                user_session_id
-            )
-            if not contributions:
-                st.info("You haven't made any contributions yet.")
-                return
-            st.subheader(f"📊 Your {len(contributions)} Contributions")
-            # Group by activity type
-            activity_groups = {}
-            for contrib in contributions:
-                activity = contrib.activity_type.value
-                if activity not in activity_groups:
-                    activity_groups[activity] = []
-                activity_groups[activity].append(contrib)
-            # Display by activity
-            activity_names = {
-                "meme": "🎭 Memes",
-                "recipe": "🍛 Recipes",
-                "folklore": "📚 Folklore",
-                "landmark": "🏛️ Landmarks",
-            }
-            for activity, contribs in activity_groups.items():
-                st.markdown(
-                    f"### {activity_names.get(activity, activity.title())} ({len(contribs)})"
-                )
-                for contrib in contribs[:5]:  # Show first 5
-                    with st.expander(
-                        f"{contrib.id[:8]}... - {contrib.timestamp.strftime('%Y-%m-%d')}"
-                    ):
-                        col1, col2 = st.columns([2, 1])
-                        with col1:
-                            st.json(contrib.content_data)
-                        with col2:
-                            st.markdown(f"**Language:** {contrib.language}")
-                            st.markdown(
-                                f"**Status:** {contrib.validation_status.value}"
-                            )
-                            if contrib.cultural_context.get("region"):
-                                st.markdown(
-                                    f"**Region:** {contrib.cultural_context['region']}"
-                                )
-                            if st.button(f"🗑️ Delete", key=f"delete_{contrib.id}"):
-                                self._delete_contribution(contrib.id)
-                                st.success("Contribution deleted!")
-                                st.rerun()
-                if len(contribs) > 5:
-                    st.markdown(f"*... and {len(contribs) - 5} more*")
-        except Exception as e:
-            self.logger.error(f"Error showing user contributions: {e}")
-            st.error("Failed to load contributions.")
-    def _show_data_deletion_options(self, user_session_id: str):
-        """Show data deletion options"""
-        st.subheader("🗑️ Data Deletion Options")
-        st.warning("""
-        **Important**: Data deletion is permanent and cannot be undone.
-        Consider exporting your data first if you want to keep a copy.
-        """)
-        deletion_options = st.radio(
-            "What would you like to delete?",
-            [
-                "Delete specific contributions",
-                "Delete all my contributions",
-                "Delete all my data (contributions + settings)",
-            ],
-        )
-        if deletion_options == "Delete specific contributions":
-            st.info(
-                "Use the 'View My Contributions' section above to delete individual items."
-            )
-        elif deletion_options == "Delete all my contributions":
-            st.markdown("**This will delete:**")
-            st.markdown(
-                "- All your memes, recipes, folklore, and landmark contributions"
-            )
-            st.markdown("- Cultural context and metadata")
-            st.markdown("- Contribution history")
-            st.markdown("**This will keep:**")
-            st.markdown("- Your privacy settings")
-            st.markdown("- Your consent records")
-            if st.checkbox("I understand this action cannot be undone"):
-                if st.button("🗑️ Delete All My Contributions", type="secondary"):
-                    self._delete_all_contributions(user_session_id)
-                    st.success("All contributions deleted successfully.")
-                    st.rerun()
-        elif deletion_options == "Delete all my data (contributions + settings)":
-            st.markdown("**This will delete:**")
-            st.markdown("- All your contributions")
-            st.markdown("- All privacy settings")
-            st.markdown("- All consent records")
-            st.markdown("- All session data")
-            st.error(
-                "**Warning**: This is complete data deletion. You will need to start fresh if you use the app again."
-            )
-            confirm_text = st.text_input("Type 'DELETE ALL MY DATA' to confirm:")
-            if confirm_text == "DELETE ALL MY DATA":
-                if st.button("🗑️ Delete Everything", type="secondary"):
-                    self._delete_all_user_data(user_session_id)
-                    st.success("All your data has been deleted.")
-                    st.balloons()
-                    st.rerun()
-    def _delete_contribution(self, contribution_id: str):
-        """Delete a specific contribution"""
-        try:
-            # In a full implementation, this would delete from database
-            self.logger.info(f"Deleted contribution: {contribution_id}")
-        except Exception as e:
-            self.logger.error(f"Error deleting contribution: {e}")
-    def _delete_all_contributions(self, user_session_id: str):
-        """Delete all contributions for a user"""
-        try:
-            contributions = self.storage_service.get_contributions_by_session(
-                user_session_id
-            )
-            for contrib in contributions:
-                self._delete_contribution(contrib.id)
-            self.logger.info(f"Deleted all contributions for user: {user_session_id}")
-        except Exception as e:
-            self.logger.error(f"Error deleting all contributions: {e}")
-    def _delete_all_user_data(self, user_session_id: str):
-        """Delete all data for a user"""
-        try:
-            # Delete contributions
-            self._delete_all_contributions(user_session_id)
-            # Clear privacy settings
-            st.session_state.privacy_settings = None
-            st.session_state.consent_given = {}
-            # Clear other session data
-            for key in list(st.session_state.keys()):
-                if "user" in key.lower() or "privacy" in key.lower():
-                    del st.session_state[key]
-            self.logger.info(f"Deleted all data for user: {user_session_id}")
-        except Exception as e:
-            self.logger.error(f"Error deleting all user data: {e}")
-    def _hash_ip(self) -> str:
-        """Hash IP address for privacy"""
-        try:
-            # In a real implementation, get actual IP
-            ip = "127.0.0.1"  # Placeholder
-            return hashlib.sha256(ip.encode()).hexdigest()[:16]
-        except:
-            return "unknown"
-    def _hash_user_agent(self) -> str:
-        """Hash user agent for privacy"""
-        try:
-            # In a real implementation, get actual user agent
-            user_agent = "unknown"  # Placeholder
-            return hashlib.sha256(user_agent.encode()).hexdigest()[:16]
-        except:
-            return "unknown"
-    def check_consent_for_action(self, action: str, user_session_id: str) -> bool:
-        """Check if user has given consent for a specific action"""
-        try:
-            privacy_settings = self._get_privacy_settings(user_session_id)
-            # Map actions to consent types
-            action_consent_map = {
-                "collect_data": ConsentType.DATA_COLLECTION,
-                "train_ai": ConsentType.AI_TRAINING,
-                "research_use": ConsentType.RESEARCH_USE,
-                "public_sharing": ConsentType.PUBLIC_SHARING,
-                "analytics": ConsentType.ANALYTICS,
-                "marketing": ConsentType.MARKETING,
-            }
-            consent_type = action_consent_map.get(action)
-            if not consent_type:
-                return False
-            consent_record = privacy_settings.consents.get(consent_type)
-            return consent_record and consent_record.granted
-        except Exception as e:
-            self.logger.error(f"Error checking consent for action {action}: {e}")
-            return False
-    def get_data_retention_date(self, user_session_id: str) -> Optional[datetime]:
-        """Get the date when user's data should be deleted"""
-        try:
-            privacy_settings = self._get_privacy_settings(user_session_id)
-            if privacy_settings.data_retention_days == -1:
-                return None  # Keep indefinitely
-            return privacy_settings.created_at + timedelta(
-                days=privacy_settings.data_retention_days
-            )
-        except Exception as e:
-            self.logger.error(f"Error calculating data retention date: {e}")
-            return None
-    def should_anonymize_data(self, user_session_id: str) -> bool:
-        """Check if user's data should be anonymized"""
-        try:
-            privacy_settings = self._get_privacy_settings(user_session_id)
-            return privacy_settings.anonymize_data
-        except Exception as e:
-            self.logger.error(f"Error checking anonymization setting: {e}")
-            return False
-    def get_privacy_summary(self, user_session_id: str) -> Dict[str, Any]:
-        """Get privacy summary for user"""
-        try:
-            privacy_settings = self._get_privacy_settings(user_session_id)
-            granted_consents = [
-                consent_type.value
-                for consent_type, record in privacy_settings.consents.items()
-                if record.granted
-            ]
-            return {
-                "user_session": user_session_id,
-                "privacy_version": self.current_privacy_version,
-                "granted_consents": granted_consents,
-                "data_retention_days": privacy_settings.data_retention_days,
-                "anonymize_data": privacy_settings.anonymize_data,
-                "allow_data_export": privacy_settings.allow_data_export,
-                "settings_updated": privacy_settings.updated_at.isoformat(),
-                "has_essential_consent": self._has_essential_consent(),
-            }
-        except Exception as e:
-            self.logger.error(f"Error getting privacy summary: {e}")
-            return {}
-    def handle_privacy_banner_action(self, action: str, user_session_id: str):
-        """Handle privacy banner actions"""
-        try:
-            if action == "ACCEPT_ESSENTIAL":
-                # Grant essential consents
-                essential_consents = {
-                    ConsentType.DATA_COLLECTION: True,
-                    ConsentType.AI_TRAINING: True,
-                }
-                self._update_consents(user_session_id, essential_consents)
-                st.session_state.show_privacy_banner = False
-            elif action == "CUSTOMIZE_PRIVACY":
-                # Show privacy settings
-                st.session_state.show_privacy_settings = True
-            elif action == "VIEW_PRIVACY_POLICY":
-                # Show privacy policy
-                st.session_state.show_privacy_policy = True
-        except Exception as e:
-            self.logger.error(f"Error handling privacy banner action: {e}")

intern_project/corpus_collection_engine/services/storage_service.py DELETED Viewed

@@ -1,509 +0,0 @@
-"""
-Storage service with offline support for the Corpus Collection Engine
-"""
-import sqlite3
-import json
-import os
-from datetime import datetime
-from typing import List, Dict, Optional, Any, Tuple
-from pathlib import Path
-import logging
-from corpus_collection_engine.models.data_models import (
-    UserContribution, CorpusEntry, ActivitySession, ValidationStatus
-)
-from corpus_collection_engine.config import DATABASE_CONFIG, DATA_DIR
-class StorageService:
-    """Service for managing local and remote data storage with offline support"""
-    def __init__(self, db_path: Optional[str] = None):
-        self.db_path = db_path or os.path.join(DATA_DIR, "corpus_collection.db")
-        self.offline_queue_path = os.path.join(DATA_DIR, "offline_queue.json")
-        self.logger = logging.getLogger(__name__)
-        # Ensure data directory exists
-        os.makedirs(DATA_DIR, exist_ok=True)
-        # Initialize database
-        self._initialize_database()
-        # Load offline queue
-        self.offline_queue = self._load_offline_queue()
-    def _initialize_database(self):
-        """Initialize SQLite database with required tables"""
-        try:
-            with sqlite3.connect(self.db_path) as conn:
-                cursor = conn.cursor()
-                # Create user_contributions table
-                cursor.execute('''
-                    CREATE TABLE IF NOT EXISTS user_contributions (
-                        id TEXT PRIMARY KEY,
-                        user_session TEXT NOT NULL,
-                        activity_type TEXT NOT NULL,
-                        content_data TEXT NOT NULL,
-                        language TEXT NOT NULL,
-                        region TEXT,
-                        cultural_context TEXT NOT NULL,
-                        timestamp TEXT NOT NULL,
-                        validation_status TEXT NOT NULL,
-                        metadata TEXT NOT NULL,
-                        synced BOOLEAN DEFAULT FALSE,
-                        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
-                    )
-                ''')
-                # Create corpus_entries table
-                cursor.execute('''
-                    CREATE TABLE IF NOT EXISTS corpus_entries (
-                        id TEXT PRIMARY KEY,
-                        contribution_id TEXT NOT NULL,
-                        text_content TEXT,
-                        image_content BLOB,
-                        language TEXT NOT NULL,
-                        cultural_tags TEXT NOT NULL,
-                        quality_score REAL NOT NULL,
-                        processed_features TEXT NOT NULL,
-                        created_at TEXT NOT NULL,
-                        synced BOOLEAN DEFAULT FALSE,
-                        FOREIGN KEY (contribution_id) REFERENCES user_contributions (id)
-                    )
-                ''')
-                # Create activity_sessions table
-                cursor.execute('''
-                    CREATE TABLE IF NOT EXISTS activity_sessions (
-                        session_id TEXT PRIMARY KEY,
-                        user_id TEXT,
-                        activity_type TEXT NOT NULL,
-                        start_time TEXT NOT NULL,
-                        contributions TEXT NOT NULL,
-                        engagement_metrics TEXT NOT NULL,
-                        synced BOOLEAN DEFAULT FALSE,
-                        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
-                    )
-                ''')
-                # Create indexes for better performance
-                cursor.execute('CREATE INDEX IF NOT EXISTS idx_contributions_session ON user_contributions(user_session)')
-                cursor.execute('CREATE INDEX IF NOT EXISTS idx_contributions_activity ON user_contributions(activity_type)')
-                cursor.execute('CREATE INDEX IF NOT EXISTS idx_contributions_language ON user_contributions(language)')
-                cursor.execute('CREATE INDEX IF NOT EXISTS idx_contributions_synced ON user_contributions(synced)')
-                cursor.execute('CREATE INDEX IF NOT EXISTS idx_corpus_language ON corpus_entries(language)')
-                cursor.execute('CREATE INDEX IF NOT EXISTS idx_sessions_activity ON activity_sessions(activity_type)')
-                conn.commit()
-                self.logger.info("Database initialized successfully")
-        except sqlite3.Error as e:
-            self.logger.error(f"Database initialization error: {e}")
-            raise
-    def _load_offline_queue(self) -> List[Dict[str, Any]]:
-        """Load offline queue from file"""
-        try:
-            if os.path.exists(self.offline_queue_path):
-                with open(self.offline_queue_path, 'r', encoding='utf-8') as f:
-                    return json.load(f)
-        except (json.JSONDecodeError, IOError) as e:
-            self.logger.warning(f"Could not load offline queue: {e}")
-        return []
-    def _save_offline_queue(self):
-        """Save offline queue to file"""
-        try:
-            with open(self.offline_queue_path, 'w', encoding='utf-8') as f:
-                json.dump(self.offline_queue, f, indent=2, ensure_ascii=False)
-        except IOError as e:
-            self.logger.error(f"Could not save offline queue: {e}")
-    def save_contribution(self, contribution: UserContribution, offline_mode: bool = False) -> bool:
-        """
-        Save user contribution to local database
-        Args:
-            contribution: UserContribution object to save
-            offline_mode: If True, add to offline queue for later sync
-        Returns:
-            bool: Success status
-        """
-        try:
-            with sqlite3.connect(self.db_path) as conn:
-                cursor = conn.cursor()
-                # Convert contribution to dict for storage
-                data = contribution.to_dict()
-                cursor.execute('''
-                    INSERT OR REPLACE INTO user_contributions
-                    (id, user_session, activity_type, content_data, language, region,
-                     cultural_context, timestamp, validation_status, metadata, synced)
-                    VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
-                ''', (
-                    data['id'], data['user_session'], data['activity_type'],
-                    data['content_data'], data['language'], data['region'],
-                    data['cultural_context'], data['timestamp'],
-                    data['validation_status'], data['metadata'], not offline_mode
-                ))
-                conn.commit()
-                # Add to offline queue if in offline mode
-                if offline_mode:
-                    self.offline_queue.append({
-                        'type': 'contribution',
-                        'data': data,
-                        'timestamp': datetime.now().isoformat()
-                    })
-                    self._save_offline_queue()
-                self.logger.info(f"Contribution {contribution.id} saved successfully")
-                return True
-        except sqlite3.Error as e:
-            self.logger.error(f"Error saving contribution: {e}")
-            return False
-    def get_contribution(self, contribution_id: str) -> Optional[UserContribution]:
-        """Get contribution by ID"""
-        try:
-            with sqlite3.connect(self.db_path) as conn:
-                cursor = conn.cursor()
-                cursor.execute('''
-                    SELECT * FROM user_contributions WHERE id = ?
-                ''', (contribution_id,))
-                row = cursor.fetchone()
-                if row:
-                    # Convert row to dict
-                    columns = [desc[0] for desc in cursor.description]
-                    data = dict(zip(columns, row))
-                    # Remove database-specific fields
-                    data.pop('synced', None)
-                    data.pop('created_at', None)
-                    return UserContribution.from_dict(data)
-        except sqlite3.Error as e:
-            self.logger.error(f"Error retrieving contribution: {e}")
-        return None
-    def get_contributions_by_session(self, session_id: str) -> List[UserContribution]:
-        """Get all contributions for a session"""
-        contributions = []
-        try:
-            with sqlite3.connect(self.db_path) as conn:
-                cursor = conn.cursor()
-                cursor.execute('''
-                    SELECT * FROM user_contributions WHERE user_session = ?
-                    ORDER BY timestamp DESC
-                ''', (session_id,))
-                rows = cursor.fetchall()
-                columns = [desc[0] for desc in cursor.description]
-                for row in rows:
-                    data = dict(zip(columns, row))
-                    data.pop('synced', None)
-                    data.pop('created_at', None)
-                    contributions.append(UserContribution.from_dict(data))
-        except sqlite3.Error as e:
-            self.logger.error(f"Error retrieving contributions by session: {e}")
-        return contributions
-    def get_contributions_by_language(self, language: str, limit: int = 100) -> List[UserContribution]:
-        """Get contributions by language"""
-        contributions = []
-        try:
-            with sqlite3.connect(self.db_path) as conn:
-                cursor = conn.cursor()
-                cursor.execute('''
-                    SELECT * FROM user_contributions WHERE language = ?
-                    ORDER BY timestamp DESC LIMIT ?
-                ''', (language, limit))
-                rows = cursor.fetchall()
-                columns = [desc[0] for desc in cursor.description]
-                for row in rows:
-                    data = dict(zip(columns, row))
-                    data.pop('synced', None)
-                    data.pop('created_at', None)
-                    contributions.append(UserContribution.from_dict(data))
-        except sqlite3.Error as e:
-            self.logger.error(f"Error retrieving contributions by language: {e}")
-        return contributions
-    def save_corpus_entry(self, entry: CorpusEntry) -> bool:
-        """Save corpus entry to database"""
-        try:
-            with sqlite3.connect(self.db_path) as conn:
-                cursor = conn.cursor()
-                data = entry.to_dict()
-                cursor.execute('''
-                    INSERT OR REPLACE INTO corpus_entries
-                    (id, contribution_id, text_content, image_content, language,
-                     cultural_tags, quality_score, processed_features, created_at)
-                    VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
-                ''', (
-                    data['id'], data['contribution_id'], data['text_content'],
-                    data['image_content'], data['language'], data['cultural_tags'],
-                    data['quality_score'], data['processed_features'], data['created_at']
-                ))
-                conn.commit()
-                self.logger.info(f"Corpus entry {entry.id} saved successfully")
-                return True
-        except sqlite3.Error as e:
-            self.logger.error(f"Error saving corpus entry: {e}")
-            return False
-    def save_activity_session(self, session: ActivitySession) -> bool:
-        """Save activity session to database"""
-        try:
-            with sqlite3.connect(self.db_path) as conn:
-                cursor = conn.cursor()
-                data = session.to_dict()
-                cursor.execute('''
-                    INSERT OR REPLACE INTO activity_sessions
-                    (session_id, user_id, activity_type, start_time,
-                     contributions, engagement_metrics)
-                    VALUES (?, ?, ?, ?, ?, ?)
-                ''', (
-                    data['session_id'], data['user_id'], data['activity_type'],
-                    data['start_time'], data['contributions'], data['engagement_metrics']
-                ))
-                conn.commit()
-                self.logger.info(f"Activity session {session.session_id} saved successfully")
-                return True
-        except sqlite3.Error as e:
-            self.logger.error(f"Error saving activity session: {e}")
-            return False
-    def get_statistics(self) -> Dict[str, Any]:
-        """Get database statistics"""
-        stats = {
-            'total_contributions': 0,
-            'contributions_by_language': {},
-            'contributions_by_activity': {},
-            'unsynced_contributions': 0,
-            'total_corpus_entries': 0,
-            'total_sessions': 0,
-            'offline_queue_size': len(self.offline_queue)
-        }
-        try:
-            with sqlite3.connect(self.db_path) as conn:
-                cursor = conn.cursor()
-                # Total contributions
-                cursor.execute('SELECT COUNT(*) FROM user_contributions')
-                stats['total_contributions'] = cursor.fetchone()[0]
-                # Contributions by language
-                cursor.execute('''
-                    SELECT language, COUNT(*) FROM user_contributions
-                    GROUP BY language
-                ''')
-                stats['contributions_by_language'] = dict(cursor.fetchall())
-                # Contributions by activity
-                cursor.execute('''
-                    SELECT activity_type, COUNT(*) FROM user_contributions
-                    GROUP BY activity_type
-                ''')
-                stats['contributions_by_activity'] = dict(cursor.fetchall())
-                # Unsynced contributions
-                cursor.execute('SELECT COUNT(*) FROM user_contributions WHERE synced = FALSE')
-                stats['unsynced_contributions'] = cursor.fetchone()[0]
-                # Total corpus entries
-                cursor.execute('SELECT COUNT(*) FROM corpus_entries')
-                stats['total_corpus_entries'] = cursor.fetchone()[0]
-                # Total sessions
-                cursor.execute('SELECT COUNT(*) FROM activity_sessions')
-                stats['total_sessions'] = cursor.fetchone()[0]
-        except sqlite3.Error as e:
-            self.logger.error(f"Error getting statistics: {e}")
-        return stats
-    def get_unsynced_contributions(self, limit: int = 100) -> List[UserContribution]:
-        """Get contributions that haven't been synced to remote storage"""
-        contributions = []
-        try:
-            with sqlite3.connect(self.db_path) as conn:
-                cursor = conn.cursor()
-                cursor.execute('''
-                    SELECT * FROM user_contributions WHERE synced = FALSE
-                    ORDER BY timestamp ASC LIMIT ?
-                ''', (limit,))
-                rows = cursor.fetchall()
-                columns = [desc[0] for desc in cursor.description]
-                for row in rows:
-                    data = dict(zip(columns, row))
-                    data.pop('synced', None)
-                    data.pop('created_at', None)
-                    contributions.append(UserContribution.from_dict(data))
-        except sqlite3.Error as e:
-            self.logger.error(f"Error retrieving unsynced contributions: {e}")
-        return contributions
-    def mark_contribution_synced(self, contribution_id: str) -> bool:
-        """Mark contribution as synced to remote storage"""
-        try:
-            with sqlite3.connect(self.db_path) as conn:
-                cursor = conn.cursor()
-                cursor.execute('''
-                    UPDATE user_contributions SET synced = TRUE WHERE id = ?
-                ''', (contribution_id,))
-                conn.commit()
-                return cursor.rowcount > 0
-        except sqlite3.Error as e:
-            self.logger.error(f"Error marking contribution as synced: {e}")
-            return False
-    def process_offline_queue(self) -> int:
-        """Process offline queue and attempt to sync items"""
-        processed_count = 0
-        if not self.offline_queue:
-            return processed_count
-        # Create a copy of the queue to process
-        queue_copy = self.offline_queue.copy()
-        self.offline_queue.clear()
-        for item in queue_copy:
-            try:
-                if item['type'] == 'contribution':
-                    # Re-save contribution with sync enabled
-                    contribution = UserContribution.from_dict(item['data'])
-                    if self.save_contribution(contribution, offline_mode=False):
-                        processed_count += 1
-                    else:
-                        # If save fails, add back to queue
-                        self.offline_queue.append(item)
-            except Exception as e:
-                self.logger.error(f"Error processing offline queue item: {e}")
-                # Add back to queue for retry
-                self.offline_queue.append(item)
-        # Save updated queue
-        self._save_offline_queue()
-        if processed_count > 0:
-            self.logger.info(f"Processed {processed_count} items from offline queue")
-        return processed_count
-    def cleanup_old_data(self, days_old: int = 30) -> int:
-        """Clean up old synced data to save space"""
-        try:
-            with sqlite3.connect(self.db_path) as conn:
-                cursor = conn.cursor()
-                # Delete old synced contributions
-                cursor.execute('''
-                    DELETE FROM user_contributions
-                    WHERE synced = TRUE
-                    AND created_at < datetime('now', '-{} days')
-                '''.format(days_old))
-                deleted_count = cursor.rowcount
-                conn.commit()
-                self.logger.info(f"Cleaned up {deleted_count} old records")
-                return deleted_count
-        except sqlite3.Error as e:
-            self.logger.error(f"Error cleaning up old data: {e}")
-            return 0
-    def export_data(self, output_path: str, include_synced: bool = False) -> bool:
-        """Export data to JSON file"""
-        try:
-            export_data = {
-                'contributions': [],
-                'corpus_entries': [],
-                'sessions': [],
-                'export_timestamp': datetime.now().isoformat()
-            }
-            with sqlite3.connect(self.db_path) as conn:
-                cursor = conn.cursor()
-                # Export contributions
-                sync_condition = "" if include_synced else "WHERE synced = FALSE"
-                cursor.execute(f'SELECT * FROM user_contributions {sync_condition}')
-                columns = [desc[0] for desc in cursor.description]
-                for row in cursor.fetchall():
-                    data = dict(zip(columns, row))
-                    data.pop('synced', None)
-                    data.pop('created_at', None)
-                    export_data['contributions'].append(data)
-                # Export corpus entries
-                cursor.execute('SELECT * FROM corpus_entries')
-                columns = [desc[0] for desc in cursor.description]
-                for row in cursor.fetchall():
-                    data = dict(zip(columns, row))
-                    data.pop('synced', None)
-                    # Convert blob to base64 if present
-                    if data.get('image_content'):
-                        import base64
-                        data['image_content'] = base64.b64encode(data['image_content']).decode('utf-8')
-                    export_data['corpus_entries'].append(data)
-                # Export sessions
-                cursor.execute('SELECT * FROM activity_sessions')
-                columns = [desc[0] for desc in cursor.description]
-                for row in cursor.fetchall():
-                    data = dict(zip(columns, row))
-                    data.pop('synced', None)
-                    data.pop('created_at', None)
-                    export_data['sessions'].append(data)
-            # Write to file
-            with open(output_path, 'w', encoding='utf-8') as f:
-                json.dump(export_data, f, indent=2, ensure_ascii=False)
-            self.logger.info(f"Data exported to {output_path}")
-            return True
-        except Exception as e:
-            self.logger.error(f"Error exporting data: {e}")
-            return False

intern_project/corpus_collection_engine/services/validation_service.py DELETED Viewed

@@ -1,618 +0,0 @@
-"""
-Validation Service for content moderation and quality control
-"""
-import re
-import logging
-from typing import Dict, List, Tuple, Any, Optional, Set
-from datetime import datetime
-from dataclasses import dataclass
-from enum import Enum
-from corpus_collection_engine.models.data_models import UserContribution, ValidationStatus, ActivityType
-from corpus_collection_engine.services.language_service import LanguageService
-from corpus_collection_engine.services.ai_service import AIService
-from corpus_collection_engine.config import VALIDATION_CONFIG
-class ModerationAction(Enum):
-    """Actions that can be taken during moderation"""
-    APPROVE = "approve"
-    REJECT = "reject"
-    FLAG_REVIEW = "flag_review"
-    REQUEST_EDIT = "request_edit"
-class ContentIssue(Enum):
-    """Types of content issues that can be detected"""
-    INAPPROPRIATE_LANGUAGE = "inappropriate_language"
-    SPAM_CONTENT = "spam_content"
-    LOW_QUALITY = "low_quality"
-    CULTURAL_INSENSITIVITY = "cultural_insensitivity"
-    DUPLICATE_CONTENT = "duplicate_content"
-    INSUFFICIENT_CONTENT = "insufficient_content"
-    PRIVACY_VIOLATION = "privacy_violation"
-    COPYRIGHT_VIOLATION = "copyright_violation"
-@dataclass
-class ModerationResult:
-    """Result of content moderation"""
-    action: ModerationAction
-    confidence: float
-    issues: List[ContentIssue]
-    suggestions: List[str]
-    quality_score: float
-    explanation: str
-class ValidationService:
-    """Service for content validation and moderation"""
-    def __init__(self):
-        self.logger = logging.getLogger(__name__)
-        self.language_service = LanguageService()
-        self.ai_service = AIService()
-        # Load moderation rules and filters
-        self._initialize_filters()
-        # Quality thresholds
-        self.quality_thresholds = {
-            'minimum_score': 0.3,
-            'auto_approve_score': 0.8,
-            'review_score': 0.5
-        }
-        # Content similarity threshold for duplicate detection
-        self.similarity_threshold = 0.85
-    def _initialize_filters(self):
-        """Initialize content filters and moderation rules"""
-        # Inappropriate content patterns (basic examples)
-        self.inappropriate_patterns = [
-            r'\b(hate|violence|discrimination)\b',
-            r'\b(offensive|abusive|harassment)\b',
-            # Add more patterns as needed, considering cultural context
-        ]
-        # Spam indicators
-        self.spam_patterns = [
-            r'(http[s]?://|www\.)',  # URLs
-            r'(\b\d{10}\b)',  # Phone numbers
-            r'(buy now|click here|limited offer)',  # Commercial spam
-            r'(.)\1{10,}',  # Repeated characters
-        ]
-        # Low quality indicators
-        self.low_quality_patterns = [
-            r'^(.{1,10})$',  # Very short content
-            r'^[A-Z\s!]{20,}$',  # All caps
-            r'[^\w\s]{5,}',  # Too many special characters
-        ]
-        # Cultural sensitivity keywords (to be handled carefully)
-        self.cultural_sensitivity_keywords = [
-            'caste', 'religion', 'community', 'tradition', 'ritual'
-        ]
-        # Privacy-related patterns
-        self.privacy_patterns = [
-            r'\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b',  # Credit card
-            r'\b[A-Z]{5}[0-9]{4}[A-Z]{1}\b',  # PAN card
-            r'\b\d{12}\b',  # Aadhaar-like numbers
-        ]
-    def moderate_contribution(self, contribution: UserContribution) -> ModerationResult:
-        """
-        Perform comprehensive moderation on a user contribution
-        Args:
-            contribution: UserContribution to moderate
-        Returns:
-            ModerationResult with action and details
-        """
-        issues = []
-        suggestions = []
-        quality_scores = []
-        try:
-            # Extract text content for analysis
-            text_content = self._extract_text_content(contribution)
-            # 1. Language and basic validation
-            lang_score, lang_issues = self._validate_language_content(text_content, contribution.language)
-            quality_scores.append(lang_score)
-            issues.extend(lang_issues)
-            # 2. Content appropriateness check
-            appropriate_score, appropriate_issues = self._check_content_appropriateness(text_content)
-            quality_scores.append(appropriate_score)
-            issues.extend(appropriate_issues)
-            # 3. Spam detection
-            spam_score, spam_issues = self._detect_spam_content(text_content)
-            quality_scores.append(spam_score)
-            issues.extend(spam_issues)
-            # 4. Quality assessment
-            quality_score, quality_issues = self._assess_content_quality(contribution)
-            quality_scores.append(quality_score)
-            issues.extend(quality_issues)
-            # 5. Cultural sensitivity check
-            cultural_score, cultural_issues = self._check_cultural_sensitivity(text_content, contribution.language)
-            quality_scores.append(cultural_score)
-            issues.extend(cultural_issues)
-            # 6. Privacy and safety check
-            privacy_score, privacy_issues = self._check_privacy_safety(text_content)
-            quality_scores.append(privacy_score)
-            issues.extend(privacy_issues)
-            # 7. Activity-specific validation
-            activity_score, activity_issues = self._validate_activity_specific(contribution)
-            quality_scores.append(activity_score)
-            issues.extend(activity_issues)
-            # Calculate overall quality score
-            overall_quality = sum(quality_scores) / len(quality_scores) if quality_scores else 0.0
-            # Generate suggestions based on issues
-            suggestions = self._generate_suggestions(issues, contribution)
-            # Determine moderation action
-            action, confidence, explanation = self._determine_action(overall_quality, issues)
-            return ModerationResult(
-                action=action,
-                confidence=confidence,
-                issues=issues,
-                suggestions=suggestions,
-                quality_score=overall_quality,
-                explanation=explanation
-            )
-        except Exception as e:
-            self.logger.error(f"Error during moderation: {e}")
-            return ModerationResult(
-                action=ModerationAction.FLAG_REVIEW,
-                confidence=0.5,
-                issues=[ContentIssue.LOW_QUALITY],
-                suggestions=["Content requires manual review due to processing error"],
-                quality_score=0.3,
-                explanation="Automatic moderation failed, requires manual review"
-            )
-    def _extract_text_content(self, contribution: UserContribution) -> str:
-        """Extract all text content from contribution for analysis"""
-        text_parts = []
-        # Extract from content_data based on activity type
-        content_data = contribution.content_data
-        if contribution.activity_type == ActivityType.MEME:
-            texts = content_data.get('texts', [])
-            text_parts.extend([text for text in texts if text])
-        elif contribution.activity_type == ActivityType.RECIPE:
-            text_parts.append(content_data.get('title', ''))
-            text_parts.append(content_data.get('instructions', ''))
-            text_parts.append(content_data.get('family_story', ''))
-            # Add ingredients
-            ingredients = content_data.get('ingredients', [])
-            for ing in ingredients:
-                if isinstance(ing, dict) and ing.get('name'):
-                    text_parts.append(ing['name'])
-        elif contribution.activity_type == ActivityType.FOLKLORE:
-            text_parts.append(content_data.get('title', ''))
-            text_parts.append(content_data.get('story', ''))
-            text_parts.append(content_data.get('meaning', ''))
-        elif contribution.activity_type == ActivityType.LANDMARK:
-            text_parts.append(content_data.get('name', ''))
-            text_parts.append(content_data.get('description', ''))
-        # Extract from cultural context
-        cultural_context = contribution.cultural_context
-        text_parts.append(cultural_context.get('cultural_significance', ''))
-        text_parts.append(cultural_context.get('additional_context', ''))
-        # Combine all text
-        combined_text = ' '.join([text for text in text_parts if text and text.strip()])
-        return combined_text
-    def _validate_language_content(self, text: str, expected_language: str) -> Tuple[float, List[ContentIssue]]:
-        """Validate language consistency and quality"""
-        issues = []
-        score = 1.0
-        if not text or len(text.strip()) < 10:
-            issues.append(ContentIssue.INSUFFICIENT_CONTENT)
-            score = 0.2
-            return score, issues
-        # Check language consistency
-        detected_lang, confidence = self.language_service.detect_language(text)
-        if detected_lang and detected_lang != expected_language and confidence > 0.7:
-            # Language mismatch - might be intentional for multilingual content
-            if confidence > 0.9:
-                issues.append(ContentIssue.LOW_QUALITY)
-                score *= 0.7
-        # Check text statistics
-        stats = self.language_service.get_text_statistics(text)
-        # Very short content
-        if stats['word_count'] < 5:
-            issues.append(ContentIssue.INSUFFICIENT_CONTENT)
-            score *= 0.5
-        # Very long content might be spam
-        if stats['word_count'] > 1000:
-            score *= 0.9
-        return score, issues
-    def _check_content_appropriateness(self, text: str) -> Tuple[float, List[ContentIssue]]:
-        """Check for inappropriate content"""
-        issues = []
-        score = 1.0
-        text_lower = text.lower()
-        # Check for inappropriate patterns
-        for pattern in self.inappropriate_patterns:
-            if re.search(pattern, text_lower, re.IGNORECASE):
-                issues.append(ContentIssue.INAPPROPRIATE_LANGUAGE)
-                score *= 0.3
-                break
-        # Use AI sentiment analysis for additional context
-        try:
-            sentiment = self.ai_service.analyze_sentiment(text)
-            if sentiment.get('negative', 0) > 0.8:
-                score *= 0.7  # High negative sentiment might indicate issues
-        except:
-            pass  # AI analysis is optional
-        return score, issues
-    def _detect_spam_content(self, text: str) -> Tuple[float, List[ContentIssue]]:
-        """Detect spam and promotional content"""
-        issues = []
-        score = 1.0
-        spam_indicators = 0
-        # Check spam patterns
-        for pattern in self.spam_patterns:
-            if re.search(pattern, text, re.IGNORECASE):
-                spam_indicators += 1
-        # Check for repeated words/phrases
-        words = text.lower().split()
-        if len(words) > 10:
-            word_freq = {}
-            for word in words:
-                word_freq[word] = word_freq.get(word, 0) + 1
-            # If any word appears more than 30% of the time, it might be spam
-            max_freq = max(word_freq.values()) if word_freq else 0
-            if max_freq > len(words) * 0.3:
-                spam_indicators += 1
-        # Determine spam score
-        if spam_indicators >= 2:
-            issues.append(ContentIssue.SPAM_CONTENT)
-            score = 0.2
-        elif spam_indicators == 1:
-            score *= 0.7
-        return score, issues
-    def _assess_content_quality(self, contribution: UserContribution) -> Tuple[float, List[ContentIssue]]:
-        """Assess overall content quality"""
-        issues = []
-        score = 1.0
-        content_data = contribution.content_data
-        # Activity-specific quality checks
-        if contribution.activity_type == ActivityType.MEME:
-            texts = content_data.get('texts', [])
-            if not any(text.strip() for text in texts):
-                issues.append(ContentIssue.INSUFFICIENT_CONTENT)
-                score *= 0.3
-        elif contribution.activity_type == ActivityType.RECIPE:
-            title = content_data.get('title', '')
-            instructions = content_data.get('instructions', '')
-            ingredients = content_data.get('ingredients', [])
-            if len(title.strip()) < 3:
-                issues.append(ContentIssue.LOW_QUALITY)
-                score *= 0.7
-            if len(instructions.strip()) < 20:
-                issues.append(ContentIssue.INSUFFICIENT_CONTENT)
-                score *= 0.5
-            valid_ingredients = [ing for ing in ingredients if isinstance(ing, dict) and ing.get('name', '').strip()]
-            if len(valid_ingredients) < 2:
-                issues.append(ContentIssue.INSUFFICIENT_CONTENT)
-                score *= 0.6
-        elif contribution.activity_type == ActivityType.FOLKLORE:
-            story = content_data.get('story', '')
-            if len(story.strip()) < 50:
-                issues.append(ContentIssue.INSUFFICIENT_CONTENT)
-                score *= 0.4
-        elif contribution.activity_type == ActivityType.LANDMARK:
-            description = content_data.get('description', '')
-            if len(description.strip()) < 20:
-                issues.append(ContentIssue.INSUFFICIENT_CONTENT)
-                score *= 0.5
-        # Check cultural context quality
-        cultural_significance = contribution.cultural_context.get('cultural_significance', '')
-        if len(cultural_significance.strip()) < 10:
-            score *= 0.9  # Minor penalty for missing cultural context
-        return score, issues
-    def _check_cultural_sensitivity(self, text: str, language: str) -> Tuple[float, List[ContentIssue]]:
-        """Check for cultural sensitivity issues"""
-        issues = []
-        score = 1.0
-        text_lower = text.lower()
-        # Check for potentially sensitive topics
-        sensitive_count = 0
-        for keyword in self.cultural_sensitivity_keywords:
-            if keyword in text_lower:
-                sensitive_count += 1
-        # If multiple sensitive keywords, flag for review
-        if sensitive_count >= 3:
-            issues.append(ContentIssue.CULTURAL_INSENSITIVITY)
-            score *= 0.8  # Requires careful review, not necessarily rejection
-        # Use AI to suggest cultural tags and check for appropriateness
-        try:
-            cultural_tags = self.ai_service.suggest_cultural_tags(text, language)
-            # If AI suggests concerning tags, reduce score slightly
-            concerning_tags = ['controversial', 'sensitive', 'political']
-            if any(tag in cultural_tags for tag in concerning_tags):
-                score *= 0.9
-        except:
-            pass  # AI analysis is optional
-        return score, issues
-    def _check_privacy_safety(self, text: str) -> Tuple[float, List[ContentIssue]]:
-        """Check for privacy violations and personal information"""
-        issues = []
-        score = 1.0
-        # Check for privacy-sensitive patterns
-        for pattern in self.privacy_patterns:
-            if re.search(pattern, text):
-                issues.append(ContentIssue.PRIVACY_VIOLATION)
-                score *= 0.3
-                break
-        # Check for email addresses
-        email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
-        if re.search(email_pattern, text):
-            issues.append(ContentIssue.PRIVACY_VIOLATION)
-            score *= 0.5
-        return score, issues
-    def _validate_activity_specific(self, contribution: UserContribution) -> Tuple[float, List[ContentIssue]]:
-        """Perform activity-specific validation"""
-        issues = []
-        score = 1.0
-        # Use the existing validation from base activity
-        try:
-            if contribution.activity_type == ActivityType.MEME:
-                from corpus_collection_engine.activities.meme_creator import MemeCreatorActivity
-                activity = MemeCreatorActivity()
-            elif contribution.activity_type == ActivityType.RECIPE:
-                from corpus_collection_engine.activities.recipe_exchange import RecipeExchangeActivity
-                activity = RecipeExchangeActivity()
-            elif contribution.activity_type == ActivityType.FOLKLORE:
-                from corpus_collection_engine.activities.folklore_collector import FolkloreCollectorActivity
-                activity = FolkloreCollectorActivity()
-            elif contribution.activity_type == ActivityType.LANDMARK:
-                from corpus_collection_engine.activities.landmark_identifier import LandmarkIdentifierActivity
-                activity = LandmarkIdentifierActivity()
-            else:
-                return score, issues
-            is_valid, message = activity.validate_content(contribution.content_data)
-            if not is_valid:
-                issues.append(ContentIssue.LOW_QUALITY)
-                score *= 0.4
-        except Exception as e:
-            self.logger.warning(f"Activity-specific validation failed: {e}")
-            score *= 0.9
-        return score, issues
-    def _generate_suggestions(self, issues: List[ContentIssue], contribution: UserContribution) -> List[str]:
-        """Generate improvement suggestions based on detected issues"""
-        suggestions = []
-        if ContentIssue.INSUFFICIENT_CONTENT in issues:
-            if contribution.activity_type == ActivityType.MEME:
-                suggestions.append("Add more descriptive text to your meme captions")
-            elif contribution.activity_type == ActivityType.RECIPE:
-                suggestions.append("Provide more detailed cooking instructions and ingredients")
-            elif contribution.activity_type == ActivityType.FOLKLORE:
-                suggestions.append("Expand your story with more details and context")
-            elif contribution.activity_type == ActivityType.LANDMARK:
-                suggestions.append("Add more descriptive details about the landmark")
-        if ContentIssue.LOW_QUALITY in issues:
-            suggestions.append("Improve the quality and clarity of your content")
-            suggestions.append("Check spelling and grammar")
-        if ContentIssue.INAPPROPRIATE_LANGUAGE in issues:
-            suggestions.append("Please use respectful and appropriate language")
-        if ContentIssue.SPAM_CONTENT in issues:
-            suggestions.append("Remove promotional content and focus on cultural sharing")
-        if ContentIssue.CULTURAL_INSENSITIVITY in issues:
-            suggestions.append("Please ensure your content is culturally sensitive and respectful")
-        if ContentIssue.PRIVACY_VIOLATION in issues:
-            suggestions.append("Remove personal information like phone numbers, addresses, or ID numbers")
-        # General suggestions
-        suggestions.append("Add more cultural context and significance")
-        suggestions.append("Share personal stories or family connections")
-        return suggestions
-    def _determine_action(self, quality_score: float, issues: List[ContentIssue]) -> Tuple[ModerationAction, float, str]:
-        """Determine the appropriate moderation action"""
-        # Critical issues that require rejection
-        critical_issues = [
-            ContentIssue.INAPPROPRIATE_LANGUAGE,
-            ContentIssue.PRIVACY_VIOLATION
-        ]
-        if any(issue in issues for issue in critical_issues):
-            return (
-                ModerationAction.REJECT,
-                0.9,
-                "Content contains inappropriate language or privacy violations"
-            )
-        # High quality content - auto approve
-        if quality_score >= self.quality_thresholds['auto_approve_score'] and len(issues) == 0:
-            return (
-                ModerationAction.APPROVE,
-                0.95,
-                "High quality content approved automatically"
-            )
-        # Medium quality - approve with minor issues
-        if quality_score >= self.quality_thresholds['review_score'] and len(issues) <= 2:
-            return (
-                ModerationAction.APPROVE,
-                0.8,
-                "Content approved with minor quality issues"
-            )
-        # Low quality but not critical - request edit
-        if quality_score >= self.quality_thresholds['minimum_score']:
-            return (
-                ModerationAction.REQUEST_EDIT,
-                0.7,
-                "Content needs improvement before approval"
-            )
-        # Very low quality - flag for review
-        return (
-            ModerationAction.FLAG_REVIEW,
-            0.6,
-            "Content requires manual review due to quality concerns"
-        )
-    def check_duplicate_content(self, contribution: UserContribution,
-                              existing_contributions: List[UserContribution]) -> Tuple[bool, float]:
-        """Check for duplicate or very similar content"""
-        if not existing_contributions:
-            return False, 0.0
-        current_text = self._extract_text_content(contribution)
-        if len(current_text.strip()) < 20:
-            return False, 0.0
-        # Simple similarity check based on common words
-        current_words = set(current_text.lower().split())
-        max_similarity = 0.0
-        for existing in existing_contributions:
-            if existing.activity_type != contribution.activity_type:
-                continue
-            existing_text = self._extract_text_content(existing)
-            existing_words = set(existing_text.lower().split())
-            if len(existing_words) == 0:
-                continue
-            # Calculate Jaccard similarity
-            intersection = len(current_words.intersection(existing_words))
-            union = len(current_words.union(existing_words))
-            if union > 0:
-                similarity = intersection / union
-                max_similarity = max(max_similarity, similarity)
-        is_duplicate = max_similarity >= self.similarity_threshold
-        return is_duplicate, max_similarity
-    def get_moderation_statistics(self, contributions: List[UserContribution]) -> Dict[str, Any]:
-        """Get moderation statistics for a set of contributions"""
-        if not contributions:
-            return {}
-        stats = {
-            'total_contributions': len(contributions),
-            'by_status': {},
-            'by_activity': {},
-            'quality_distribution': {'high': 0, 'medium': 0, 'low': 0},
-            'common_issues': {},
-            'average_quality_score': 0.0
-        }
-        total_quality = 0.0
-        for contrib in contributions:
-            # Count by status
-            status = contrib.validation_status.value
-            stats['by_status'][status] = stats['by_status'].get(status, 0) + 1
-            # Count by activity
-            activity = contrib.activity_type.value
-            stats['by_activity'][activity] = stats['by_activity'].get(activity, 0) + 1
-            # Moderate to get quality score
-            try:
-                result = self.moderate_contribution(contrib)
-                total_quality += result.quality_score
-                # Quality distribution
-                if result.quality_score >= 0.8:
-                    stats['quality_distribution']['high'] += 1
-                elif result.quality_score >= 0.5:
-                    stats['quality_distribution']['medium'] += 1
-                else:
-                    stats['quality_distribution']['low'] += 1
-                # Common issues
-                for issue in result.issues:
-                    issue_name = issue.value
-                    stats['common_issues'][issue_name] = stats['common_issues'].get(issue_name, 0) + 1
-            except Exception as e:
-                self.logger.warning(f"Error moderating contribution {contrib.id}: {e}")
-        stats['average_quality_score'] = total_quality / len(contributions) if contributions else 0.0
-        return stats

intern_project/corpus_collection_engine/utils/__init__.py DELETED Viewed

	@@ -1 +0,0 @@
1	- # Utils package for Corpus Collection Engine

intern_project/corpus_collection_engine/utils/error_handler.py DELETED Viewed

@@ -1,557 +0,0 @@
-"""
-Comprehensive error handling system for the Corpus Collection Engine
-"""
-import streamlit as st
-import logging
-import traceback
-import sys
-from typing import Dict, Any, Optional, Callable, List
-from datetime import datetime
-from enum import Enum
-import json
-from functools import wraps
-from corpus_collection_engine.config import PWA_CONFIG
-class ErrorSeverity(Enum):
-    """Error severity levels"""
-    LOW = "low"
-    MEDIUM = "medium"
-    HIGH = "high"
-    CRITICAL = "critical"
-class ErrorCategory(Enum):
-    """Error categories"""
-    NETWORK = "network"
-    AI_SERVICE = "ai_service"
-    STORAGE = "storage"
-    VALIDATION = "validation"
-    USER_INPUT = "user_input"
-    SYSTEM = "system"
-    PERFORMANCE = "performance"
-class ErrorHandler:
-    """Comprehensive error handling and recovery system"""
-    def __init__(self):
-        self.logger = logging.getLogger(__name__)
-        self.config = PWA_CONFIG
-        # Initialize error tracking
-        if 'error_history' not in st.session_state:
-            st.session_state.error_history = []
-        if 'error_stats' not in st.session_state:
-            st.session_state.error_stats = {
-                'total_errors': 0,
-                'errors_by_category': {},
-                'errors_by_severity': {},
-                'last_error_time': None
-            }
-        # Error recovery strategies
-        self.recovery_strategies = {
-            ErrorCategory.NETWORK: self._handle_network_error,
-            ErrorCategory.AI_SERVICE: self._handle_ai_service_error,
-            ErrorCategory.STORAGE: self._handle_storage_error,
-            ErrorCategory.VALIDATION: self._handle_validation_error,
-            ErrorCategory.USER_INPUT: self._handle_user_input_error,
-            ErrorCategory.SYSTEM: self._handle_system_error,
-            ErrorCategory.PERFORMANCE: self._handle_performance_error
-        }
-        # User-friendly error messages
-        self.error_messages = {
-            ErrorCategory.NETWORK: {
-                ErrorSeverity.LOW: "Connection seems slow. Some features may be limited.",
-                ErrorSeverity.MEDIUM: "Network connection issues detected. Working in offline mode.",
-                ErrorSeverity.HIGH: "Unable to connect to services. Please check your internet connection.",
-                ErrorSeverity.CRITICAL: "No network connection. App is running in offline-only mode."
-            },
-            ErrorCategory.AI_SERVICE: {
-                ErrorSeverity.LOW: "AI service is running slower than usual.",
-                ErrorSeverity.MEDIUM: "AI service temporarily unavailable. Using fallback options.",
-                ErrorSeverity.HIGH: "AI features are currently disabled due to service issues.",
-                ErrorSeverity.CRITICAL: "All AI services are unavailable. Manual input required."
-            },
-            ErrorCategory.STORAGE: {
-                ErrorSeverity.LOW: "Data saving is slightly delayed.",
-                ErrorSeverity.MEDIUM: "Some data couldn't be saved. Will retry automatically.",
-                ErrorSeverity.HIGH: "Storage issues detected. Data saved locally only.",
-                ErrorSeverity.CRITICAL: "Unable to save data. Please try again later."
-            },
-            ErrorCategory.VALIDATION: {
-                ErrorSeverity.LOW: "Minor validation issues detected.",
-                ErrorSeverity.MEDIUM: "Some content needs review before submission.",
-                ErrorSeverity.HIGH: "Content validation failed. Please check your input.",
-                ErrorSeverity.CRITICAL: "Content cannot be processed due to validation errors."
-            },
-            ErrorCategory.USER_INPUT: {
-                ErrorSeverity.LOW: "Please check your input.",
-                ErrorSeverity.MEDIUM: "Some required fields are missing or invalid.",
-                ErrorSeverity.HIGH: "Input format is not supported.",
-                ErrorSeverity.CRITICAL: "Unable to process the provided input."
-            },
-            ErrorCategory.SYSTEM: {
-                ErrorSeverity.LOW: "Minor system issue detected.",
-                ErrorSeverity.MEDIUM: "System performance may be affected.",
-                ErrorSeverity.HIGH: "System error occurred. Some features may be unavailable.",
-                ErrorSeverity.CRITICAL: "Critical system error. Please refresh the page."
-            },
-            ErrorCategory.PERFORMANCE: {
-                ErrorSeverity.LOW: "Performance is slightly degraded.",
-                ErrorSeverity.MEDIUM: "Performance optimizations applied.",
-                ErrorSeverity.HIGH: "Significant performance issues detected.",
-                ErrorSeverity.CRITICAL: "System is running very slowly. Consider refreshing."
-            }
-        }
-    def handle_error(
-        self,
-        error: Exception,
-        category: ErrorCategory,
-        severity: ErrorSeverity = ErrorSeverity.MEDIUM,
-        context: Optional[Dict[str, Any]] = None,
-        show_user_message: bool = True,
-        recovery_action: Optional[Callable] = None
-    ) -> bool:
-        """
-        Handle an error with appropriate logging, user notification, and recovery
-        Returns:
-            bool: True if error was handled successfully, False otherwise
-        """
-        try:
-            # Log the error
-            self._log_error(error, category, severity, context)
-            # Record error statistics
-            self._record_error_stats(category, severity)
-            # Show user-friendly message
-            if show_user_message:
-                self._show_user_error_message(category, severity, context)
-            # Attempt recovery
-            recovery_success = self._attempt_recovery(error, category, severity, recovery_action)
-            # Store error in history
-            self._store_error_history(error, category, severity, context, recovery_success)
-            return recovery_success
-        except Exception as handler_error:
-            # Error in error handler - log but don't recurse
-            self.logger.critical(f"Error in error handler: {handler_error}")
-            return False
-    def _log_error(
-        self,
-        error: Exception,
-        category: ErrorCategory,
-        severity: ErrorSeverity,
-        context: Optional[Dict[str, Any]] = None
-    ):
-        """Log error with appropriate level"""
-        error_info = {
-            'error_type': type(error).__name__,
-            'error_message': str(error),
-            'category': category.value,
-            'severity': severity.value,
-            'context': context or {},
-            'traceback': traceback.format_exc(),
-            'timestamp': datetime.now().isoformat()
-        }
-        log_message = f"[{category.value.upper()}] {severity.value.upper()}: {error}"
-        if severity == ErrorSeverity.CRITICAL:
-            self.logger.critical(log_message, extra=error_info)
-        elif severity == ErrorSeverity.HIGH:
-            self.logger.error(log_message, extra=error_info)
-        elif severity == ErrorSeverity.MEDIUM:
-            self.logger.warning(log_message, extra=error_info)
-        else:
-            self.logger.info(log_message, extra=error_info)
-    def _record_error_stats(self, category: ErrorCategory, severity: ErrorSeverity):
-        """Record error statistics"""
-        # Ensure error_stats is initialized
-        if 'error_stats' not in st.session_state:
-            st.session_state.error_stats = {
-                'total_errors': 0,
-                'errors_by_category': {},
-                'errors_by_severity': {},
-                'last_error_time': None
-            }
-        stats = st.session_state.error_stats
-        stats['total_errors'] += 1
-        stats['last_error_time'] = datetime.now()
-        # Category stats
-        if category.value not in stats['errors_by_category']:
-            stats['errors_by_category'][category.value] = 0
-        stats['errors_by_category'][category.value] += 1
-        # Severity stats
-        if severity.value not in stats['errors_by_severity']:
-            stats['errors_by_severity'][severity.value] = 0
-        stats['errors_by_severity'][severity.value] += 1
-    def _show_user_error_message(
-        self,
-        category: ErrorCategory,
-        severity: ErrorSeverity,
-        context: Optional[Dict[str, Any]] = None
-    ):
-        """Show user-friendly error message"""
-        message = self.error_messages.get(category, {}).get(
-            severity,
-            "An unexpected error occurred. Please try again."
-        )
-        # Add context-specific information
-        if context and 'user_message' in context:
-            message = context['user_message']
-        # Show message based on severity
-        if severity == ErrorSeverity.CRITICAL:
-            st.error(f"🚨 {message}")
-        elif severity == ErrorSeverity.HIGH:
-            st.error(f"❌ {message}")
-        elif severity == ErrorSeverity.MEDIUM:
-            st.warning(f"⚠️ {message}")
-        else:
-            st.info(f"ℹ️ {message}")
-        # Show recovery suggestions
-        self._show_recovery_suggestions(category, severity)
-    def _show_recovery_suggestions(self, category: ErrorCategory, severity: ErrorSeverity):
-        """Show recovery suggestions to user"""
-        suggestions = self._get_recovery_suggestions(category, severity)
-        if suggestions and severity in [ErrorSeverity.HIGH, ErrorSeverity.CRITICAL]:
-            with st.expander("💡 What can you do?"):
-                for suggestion in suggestions:
-                    st.markdown(f"• {suggestion}")
-    def _get_recovery_suggestions(self, category: ErrorCategory, severity: ErrorSeverity) -> List[str]:
-        """Get recovery suggestions for error category and severity"""
-        suggestions = {
-            ErrorCategory.NETWORK: [
-                "Check your internet connection",
-                "Try refreshing the page",
-                "Switch to offline mode if available",
-                "Use mobile data if on WiFi (or vice versa)"
-            ],
-            ErrorCategory.AI_SERVICE: [
-                "Try again in a few moments",
-                "Use manual input instead of AI suggestions",
-                "Check if the service is temporarily down",
-                "Try a simpler request"
-            ],
-            ErrorCategory.STORAGE: [
-                "Check available storage space",
-                "Try saving again",
-                "Clear browser cache",
-                "Export your data as backup"
-            ],
-            ErrorCategory.VALIDATION: [
-                "Review your input for errors",
-                "Check required fields",
-                "Ensure content meets guidelines",
-                "Try a different format"
-            ],
-            ErrorCategory.USER_INPUT: [
-                "Check for typos or formatting issues",
-                "Ensure all required fields are filled",
-                "Try uploading a different file",
-                "Reduce file size if too large"
-            ],
-            ErrorCategory.SYSTEM: [
-                "Refresh the page",
-                "Clear browser cache",
-                "Try a different browser",
-                "Contact support if issue persists"
-            ],
-            ErrorCategory.PERFORMANCE: [
-                "Close other browser tabs",
-                "Check your internet speed",
-                "Try using a faster connection",
-                "Reduce image quality settings"
-            ]
-        }
-        return suggestions.get(category, ["Try refreshing the page", "Contact support if issue persists"])
-    def _attempt_recovery(
-        self,
-        error: Exception,
-        category: ErrorCategory,
-        severity: ErrorSeverity,
-        custom_recovery: Optional[Callable] = None
-    ) -> bool:
-        """Attempt to recover from error"""
-        try:
-            # Try custom recovery first
-            if custom_recovery:
-                return custom_recovery(error, category, severity)
-            # Use category-specific recovery
-            recovery_func = self.recovery_strategies.get(category)
-            if recovery_func:
-                return recovery_func(error, severity)
-            return False
-        except Exception as recovery_error:
-            self.logger.error(f"Recovery attempt failed: {recovery_error}")
-            return False
-    def _handle_network_error(self, error: Exception, severity: ErrorSeverity) -> bool:
-        """Handle network-related errors"""
-        if severity in [ErrorSeverity.HIGH, ErrorSeverity.CRITICAL]:
-            # Enable offline mode
-            st.session_state.offline_mode = True
-            st.session_state.network_error_count = st.session_state.get('network_error_count', 0) + 1
-            # Show offline indicator
-            st.sidebar.error("🔌 Offline Mode Active")
-            return True
-        return False
-    def _handle_ai_service_error(self, error: Exception, severity: ErrorSeverity) -> bool:
-        """Handle AI service errors"""
-        if severity >= ErrorSeverity.MEDIUM:
-            # Disable AI features temporarily
-            st.session_state.ai_service_disabled = True
-            st.session_state.ai_fallback_mode = True
-            # Set retry timer
-            st.session_state.ai_retry_time = datetime.now().timestamp() + 300  # 5 minutes
-            return True
-        return False
-    def _handle_storage_error(self, error: Exception, severity: ErrorSeverity) -> bool:
-        """Handle storage-related errors"""
-        if severity >= ErrorSeverity.MEDIUM:
-            # Enable local-only storage
-            st.session_state.local_storage_only = True
-            # Queue for later sync
-            if 'storage_queue' not in st.session_state:
-                st.session_state.storage_queue = []
-            return True
-        return False
-    def _handle_validation_error(self, error: Exception, severity: ErrorSeverity) -> bool:
-        """Handle validation errors"""
-        # Validation errors usually require user action
-        return False
-    def _handle_user_input_error(self, error: Exception, severity: ErrorSeverity) -> bool:
-        """Handle user input errors"""
-        # User input errors require user correction
-        return False
-    def _handle_system_error(self, error: Exception, severity: ErrorSeverity) -> bool:
-        """Handle system errors"""
-        if severity == ErrorSeverity.CRITICAL:
-            # Suggest page refresh
-            st.error("Critical system error. Please refresh the page.")
-            if st.button("🔄 Refresh Page"):
-                st.rerun()
-            return True
-        return False
-    def _handle_performance_error(self, error: Exception, severity: ErrorSeverity) -> bool:
-        """Handle performance-related errors"""
-        if severity >= ErrorSeverity.MEDIUM:
-            # Enable aggressive optimizations
-            st.session_state.performance_mode = 'aggressive'
-            st.session_state.connection_speed = 'slow_2g'  # Force slow mode optimizations
-            return True
-        return False
-    def _store_error_history(
-        self,
-        error: Exception,
-        category: ErrorCategory,
-        severity: ErrorSeverity,
-        context: Optional[Dict[str, Any]],
-        recovery_success: bool
-    ):
-        """Store error in history for analysis"""
-        error_entry = {
-            'timestamp': datetime.now(),
-            'error_type': type(error).__name__,
-            'error_message': str(error),
-            'category': category.value,
-            'severity': severity.value,
-            'context': context or {},
-            'recovery_success': recovery_success,
-            'traceback': traceback.format_exc()
-        }
-        st.session_state.error_history.append(error_entry)
-        # Keep only last 50 errors
-        if len(st.session_state.error_history) > 50:
-            st.session_state.error_history = st.session_state.error_history[-50:]
-    def get_error_stats(self) -> Dict[str, Any]:
-        """Get error statistics"""
-        # Ensure error_stats is initialized
-        if 'error_stats' not in st.session_state:
-            st.session_state.error_stats = {
-                'total_errors': 0,
-                'errors_by_category': {},
-                'errors_by_severity': {},
-                'last_error_time': None
-            }
-        return st.session_state.error_stats.copy()
-    def get_error_history(self) -> List[Dict[str, Any]]:
-        """Get error history"""
-        return st.session_state.error_history.copy()
-    def clear_error_history(self):
-        """Clear error history and reset stats"""
-        st.session_state.error_history = []
-        st.session_state.error_stats = {
-            'total_errors': 0,
-            'errors_by_category': {},
-            'errors_by_severity': {},
-            'last_error_time': None
-        }
-    def render_error_dashboard(self):
-        """Render error monitoring dashboard"""
-        st.subheader("🚨 Error Monitoring")
-        stats = self.get_error_stats()
-        history = self.get_error_history()
-        # Error statistics
-        col1, col2, col3, col4 = st.columns(4)
-        with col1:
-            st.metric("Total Errors", stats['total_errors'])
-        with col2:
-            last_error = stats.get('last_error_time')
-            if last_error:
-                time_since = datetime.now() - last_error
-                st.metric("Last Error", f"{time_since.seconds // 60}m ago")
-            else:
-                st.metric("Last Error", "None")
-        with col3:
-            most_common_category = max(
-                stats['errors_by_category'].items(),
-                key=lambda x: x[1],
-                default=("None", 0)
-            )
-            st.metric("Most Common", most_common_category[0].title())
-        with col4:
-            critical_errors = stats['errors_by_severity'].get('critical', 0)
-            st.metric("Critical Errors", critical_errors)
-        # Error breakdown
-        if stats['total_errors'] > 0:
-            col1, col2 = st.columns(2)
-            with col1:
-                st.write("**Errors by Category:**")
-                for category, count in stats['errors_by_category'].items():
-                    percentage = (count / stats['total_errors']) * 100
-                    st.write(f"• {category.title()}: {count} ({percentage:.1f}%)")
-            with col2:
-                st.write("**Errors by Severity:**")
-                for severity, count in stats['errors_by_severity'].items():
-                    percentage = (count / stats['total_errors']) * 100
-                    st.write(f"• {severity.title()}: {count} ({percentage:.1f}%)")
-        # Recent errors
-        if history:
-            st.write("**Recent Errors:**")
-            for error in history[-5:]:  # Show last 5 errors
-                with st.expander(f"{error['timestamp'].strftime('%H:%M:%S')} - {error['error_type']}"):
-                    st.write(f"**Category:** {error['category']}")
-                    st.write(f"**Severity:** {error['severity']}")
-                    st.write(f"**Message:** {error['error_message']}")
-                    st.write(f"**Recovery:** {'✅ Success' if error['recovery_success'] else '❌ Failed'}")
-        # Clear errors button
-        if st.button("🗑️ Clear Error History"):
-            self.clear_error_history()
-            st.success("Error history cleared!")
-            st.rerun()
-def error_handler_decorator(
-    category: ErrorCategory,
-    severity: ErrorSeverity = ErrorSeverity.MEDIUM,
-    show_user_message: bool = True,
-    recovery_action: Optional[Callable] = None
-):
-    """Decorator for automatic error handling"""
-    def decorator(func):
-        @wraps(func)
-        def wrapper(*args, **kwargs):
-            try:
-                return func(*args, **kwargs)
-            except Exception as e:
-                handler = ErrorHandler()
-                handler.handle_error(
-                    e,
-                    category,
-                    severity,
-                    context={'function': func.__name__},
-                    show_user_message=show_user_message,
-                    recovery_action=recovery_action
-                )
-                # Re-raise if critical
-                if severity == ErrorSeverity.CRITICAL:
-                    raise
-                return None
-        return wrapper
-    return decorator
-def safe_execute(
-    func: Callable,
-    category: ErrorCategory,
-    severity: ErrorSeverity = ErrorSeverity.MEDIUM,
-    default_return: Any = None,
-    context: Optional[Dict[str, Any]] = None
-) -> Any:
-    """Safely execute a function with error handling"""
-    try:
-        return func()
-    except Exception as e:
-        handler = ErrorHandler()
-        handler.handle_error(e, category, severity, context)
-        return default_return
-# Global error handler instance
-global_error_handler = ErrorHandler()

intern_project/corpus_collection_engine/utils/performance_dashboard.py DELETED Viewed

@@ -1,468 +0,0 @@
-"""
-Performance monitoring dashboard for the Corpus Collection Engine
-"""
-import streamlit as st
-import plotly.graph_objects as go
-import plotly.express as px
-import pandas as pd
-from datetime import datetime, timedelta
-from typing import Dict, List, Any
-import logging
-from corpus_collection_engine.utils.performance_optimizer import PerformanceOptimizer
-class PerformanceDashboard:
-    """Dashboard for monitoring application performance"""
-    def __init__(self):
-        self.logger = logging.getLogger(__name__)
-        self.optimizer = PerformanceOptimizer()
-        # Initialize performance tracking
-        if 'performance_history' not in st.session_state:
-            st.session_state.performance_history = []
-        if 'connection_history' not in st.session_state:
-            st.session_state.connection_history = []
-    def render_dashboard(self):
-        """Render the complete performance dashboard"""
-        st.header("📊 Performance Dashboard")
-        # Current performance overview
-        self._render_current_performance()
-        # Performance metrics over time
-        self._render_performance_trends()
-        # Connection quality analysis
-        self._render_connection_analysis()
-        # Optimization recommendations
-        self._render_optimization_recommendations()
-        # Performance settings
-        self._render_performance_settings()
-    def _render_current_performance(self):
-        """Render current performance metrics"""
-        st.subheader("🚀 Current Performance")
-        # Get current stats
-        stats = self.optimizer.get_optimization_stats()
-        col1, col2, col3, col4 = st.columns(4)
-        with col1:
-            connection_speed = stats.get('connection_speed', 'unknown')
-            connection_emoji = {
-                'slow_2g': '🐌',
-                '2g': '🚶',
-                '3g': '🚗',
-                '4g': '🚀',
-                'unknown': '❓'
-            }.get(connection_speed, '❓')
-            st.metric(
-                "Connection Speed",
-                f"{connection_emoji} {connection_speed.upper()}",
-                help="Detected connection speed"
-            )
-        with col2:
-            optimization_level = stats.get('optimization_level', 'default')
-            st.metric(
-                "Optimization Level",
-                optimization_level.title(),
-                help="Current optimization level applied"
-            )
-        with col3:
-            performance_metrics = st.session_state.get('performance_metrics', {})
-            avg_load_time = sum(performance_metrics.values()) / len(performance_metrics) if performance_metrics else 0
-            st.metric(
-                "Avg Load Time",
-                f"{avg_load_time:.2f}s",
-                help="Average operation load time"
-            )
-        with col4:
-            optimizations = stats.get('optimizations_applied', {})
-            active_optimizations = sum(1 for opt in optimizations.values() if opt)
-            st.metric(
-                "Active Optimizations",
-                f"{active_optimizations}/{len(optimizations)}",
-                help="Number of active performance optimizations"
-            )
-        # Detailed optimization status
-        with st.expander("🔧 Optimization Details"):
-            optimizations = stats.get('optimizations_applied', {})
-            col1, col2 = st.columns(2)
-            with col1:
-                st.write("**Active Optimizations:**")
-                for opt_name, is_active in optimizations.items():
-                    status = "✅" if is_active else "❌"
-                    st.write(f"{status} {opt_name.replace('_', ' ').title()}")
-            with col2:
-                st.write("**Performance Metrics:**")
-                performance_metrics = st.session_state.get('performance_metrics', {})
-                for operation, duration in performance_metrics.items():
-                    st.write(f"⏱️ {operation}: {duration:.3f}s")
-    def _render_performance_trends(self):
-        """Render performance trends over time"""
-        st.subheader("📈 Performance Trends")
-        performance_history = st.session_state.get('performance_history', [])
-        if not performance_history:
-            st.info("No performance data available yet. Use the app to generate performance metrics.")
-            return
-        # Create DataFrame from history
-        df = pd.DataFrame(performance_history)
-        if len(df) > 0:
-            # Performance over time chart
-            fig = go.Figure()
-            for metric in df.columns:
-                if metric != 'timestamp':
-                    fig.add_trace(go.Scatter(
-                        x=df['timestamp'],
-                        y=df[metric],
-                        mode='lines+markers',
-                        name=metric.replace('_', ' ').title(),
-                        line=dict(width=2)
-                    ))
-            fig.update_layout(
-                title="Performance Metrics Over Time",
-                xaxis_title="Time",
-                yaxis_title="Duration (seconds)",
-                hovermode='x unified',
-                height=400
-            )
-            st.plotly_chart(fig, use_container_width=True)
-            # Performance statistics
-            col1, col2 = st.columns(2)
-            with col1:
-                st.write("**Performance Statistics:**")
-                for metric in df.columns:
-                    if metric != 'timestamp':
-                        avg_val = df[metric].mean()
-                        max_val = df[metric].max()
-                        min_val = df[metric].min()
-                        st.write(f"**{metric.replace('_', ' ').title()}:**")
-                        st.write(f"  - Average: {avg_val:.3f}s")
-                        st.write(f"  - Max: {max_val:.3f}s")
-                        st.write(f"  - Min: {min_val:.3f}s")
-            with col2:
-                # Performance distribution
-                if len(df) > 1:
-                    metric_to_plot = st.selectbox(
-                        "Select metric for distribution:",
-                        [col for col in df.columns if col != 'timestamp']
-                    )
-                    if metric_to_plot:
-                        fig_hist = px.histogram(
-                            df,
-                            x=metric_to_plot,
-                            title=f"Distribution of {metric_to_plot.replace('_', ' ').title()}",
-                            nbins=20
-                        )
-                        fig_hist.update_layout(height=300)
-                        st.plotly_chart(fig_hist, use_container_width=True)
-    def _render_connection_analysis(self):
-        """Render connection quality analysis"""
-        st.subheader("📡 Connection Analysis")
-        connection_history = st.session_state.get('connection_history', [])
-        if not connection_history:
-            st.info("No connection data available yet.")
-            return
-        # Connection speed distribution
-        connection_df = pd.DataFrame(connection_history)
-        if len(connection_df) > 0:
-            col1, col2 = st.columns(2)
-            with col1:
-                # Connection speed pie chart
-                speed_counts = connection_df['speed'].value_counts()
-                fig_pie = px.pie(
-                    values=speed_counts.values,
-                    names=speed_counts.index,
-                    title="Connection Speed Distribution"
-                )
-                st.plotly_chart(fig_pie, use_container_width=True)
-            with col2:
-                # Connection quality over time
-                fig_line = px.line(
-                    connection_df,
-                    x='timestamp',
-                    y='quality_score',
-                    title="Connection Quality Over Time",
-                    markers=True
-                )
-                fig_line.update_layout(height=300)
-                st.plotly_chart(fig_line, use_container_width=True)
-            # Connection statistics
-            st.write("**Connection Statistics:**")
-            avg_quality = connection_df['quality_score'].mean()
-            connection_stability = connection_df['speed'].nunique()
-            col1, col2, col3 = st.columns(3)
-            with col1:
-                st.metric("Average Quality", f"{avg_quality:.1f}/10")
-            with col2:
-                st.metric("Connection Changes", connection_stability)
-            with col3:
-                current_speed = connection_df['speed'].iloc[-1] if len(connection_df) > 0 else 'unknown'
-                st.metric("Current Speed", current_speed.upper())
-    def _render_optimization_recommendations(self):
-        """Render optimization recommendations"""
-        st.subheader("💡 Optimization Recommendations")
-        stats = self.optimizer.get_optimization_stats()
-        connection_speed = stats.get('connection_speed', 'unknown')
-        performance_metrics = st.session_state.get('performance_metrics', {})
-        recommendations = []
-        # Connection-based recommendations
-        if connection_speed in ['slow_2g', '2g']:
-            recommendations.extend([
-                "🔧 **Enable Aggressive Image Compression**: Reduce image quality to 30-50% for faster loading",
-                "📱 **Use Offline Mode**: Work offline when connection is very slow",
-                "⚡ **Minimize Uploads**: Upload smaller files or compress before uploading",
-                "🎯 **Focus on Text**: Prioritize text-based activities over image-heavy ones"
-            ])
-        elif connection_speed == '3g':
-            recommendations.extend([
-                "🖼️ **Moderate Image Optimization**: Balance quality and speed",
-                "📊 **Lazy Load Content**: Load content progressively",
-                "🔄 **Enable Sync**: Use sync features when connection improves"
-            ])
-        # Performance-based recommendations
-        if performance_metrics:
-            slow_operations = [op for op, duration in performance_metrics.items() if duration > 2.0]
-            if slow_operations:
-                recommendations.append(f"⚠️ **Optimize Slow Operations**: {', '.join(slow_operations)} are taking longer than expected")
-        # General recommendations
-        recommendations.extend([
-            "💾 **Clear Cache**: Clear browser cache if experiencing issues",
-            "🔄 **Restart App**: Refresh the page to reset optimizations",
-            "📊 **Monitor Usage**: Check this dashboard regularly for performance insights"
-        ])
-        if recommendations:
-            for rec in recommendations:
-                st.markdown(rec)
-        else:
-            st.success("✅ Performance is optimal! No recommendations at this time.")
-    def _render_performance_settings(self):
-        """Render performance settings and controls"""
-        st.subheader("⚙️ Performance Settings")
-        col1, col2 = st.columns(2)
-        with col1:
-            st.write("**Manual Optimization Controls:**")
-            # Force optimization level
-            optimization_levels = ['auto', 'minimal', 'moderate', 'aggressive']
-            current_level = st.session_state.get('manual_optimization_level', 'auto')
-            new_level = st.selectbox(
-                "Force Optimization Level:",
-                optimization_levels,
-                index=optimization_levels.index(current_level),
-                help="Override automatic optimization detection"
-            )
-            if new_level != current_level:
-                st.session_state.manual_optimization_level = new_level
-                st.success(f"Optimization level set to: {new_level}")
-            # Image quality override
-            quality_levels = {
-                'Auto': None,
-                'High (85%)': 85,
-                'Medium (70%)': 70,
-                'Low (50%)': 50,
-                'Very Low (30%)': 30
-            }
-            quality_choice = st.selectbox(
-                "Image Quality Override:",
-                list(quality_levels.keys()),
-                help="Override automatic image quality optimization"
-            )
-            if quality_levels[quality_choice] is not None:
-                st.session_state.manual_image_quality = quality_levels[quality_choice]
-        with col2:
-            st.write("**Performance Actions:**")
-            # Clear performance data
-            if st.button("🗑️ Clear Performance Data"):
-                st.session_state.performance_history = []
-                st.session_state.connection_history = []
-                st.session_state.performance_metrics = {}
-                st.success("Performance data cleared!")
-                st.rerun()
-            # Export performance data
-            if st.button("📊 Export Performance Data"):
-                self._export_performance_data()
-            # Reset optimizations
-            if st.button("🔄 Reset Optimizations"):
-                st.session_state.performance_initialized = False
-                st.session_state.manual_optimization_level = 'auto'
-                if 'manual_image_quality' in st.session_state:
-                    del st.session_state.manual_image_quality
-                st.success("Optimizations reset!")
-                st.rerun()
-            # Performance test
-            if st.button("🧪 Run Performance Test"):
-                self._run_performance_test()
-    def _export_performance_data(self):
-        """Export performance data as JSON"""
-        import json
-        export_data = {
-            'performance_history': st.session_state.get('performance_history', []),
-            'connection_history': st.session_state.get('connection_history', []),
-            'performance_metrics': st.session_state.get('performance_metrics', {}),
-            'optimization_stats': self.optimizer.get_optimization_stats(),
-            'export_timestamp': datetime.now().isoformat()
-        }
-        json_str = json.dumps(export_data, indent=2, default=str)
-        st.download_button(
-            label="📥 Download Performance Data",
-            data=json_str,
-            file_name=f"performance_data_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json",
-            mime="application/json"
-        )
-    def _run_performance_test(self):
-        """Run a simple performance test"""
-        import time
-        with st.spinner("Running performance test..."):
-            # Test various operations
-            test_results = {}
-            # Test image optimization
-            start_time = time.time()
-            from PIL import Image
-            test_image = Image.new('RGB', (800, 600), color='red')
-            self.optimizer.optimize_image(test_image, 'default')
-            test_results['image_optimization'] = time.time() - start_time
-            # Test JSON compression
-            start_time = time.time()
-            test_data = {'test': 'data' * 1000}
-            self.optimizer.compress_json_data(test_data)
-            test_results['json_compression'] = time.time() - start_time
-            # Test lazy loading
-            start_time = time.time()
-            test_content = list(range(100))
-            self.optimizer.lazy_load_content(test_content)
-            test_results['lazy_loading'] = time.time() - start_time
-        # Display results
-        st.success("Performance test completed!")
-        col1, col2, col3 = st.columns(3)
-        with col1:
-            st.metric("Image Optimization", f"{test_results['image_optimization']:.3f}s")
-        with col2:
-            st.metric("JSON Compression", f"{test_results['json_compression']:.3f}s")
-        with col3:
-            st.metric("Lazy Loading", f"{test_results['lazy_loading']:.3f}s")
-        # Store test results
-        if 'performance_metrics' not in st.session_state:
-            st.session_state.performance_metrics = {}
-        st.session_state.performance_metrics.update(test_results)
-    def record_performance_metric(self, operation: str, duration: float):
-        """Record a performance metric"""
-        # Store in current metrics
-        if 'performance_metrics' not in st.session_state:
-            st.session_state.performance_metrics = {}
-        st.session_state.performance_metrics[operation] = duration
-        # Add to history
-        if 'performance_history' not in st.session_state:
-            st.session_state.performance_history = []
-        # Create history entry
-        history_entry = {
-            'timestamp': datetime.now(),
-            operation: duration
-        }
-        st.session_state.performance_history.append(history_entry)
-        # Keep only last 100 entries
-        if len(st.session_state.performance_history) > 100:
-            st.session_state.performance_history = st.session_state.performance_history[-100:]
-    def record_connection_quality(self, speed: str, quality_score: float):
-        """Record connection quality measurement"""
-        if 'connection_history' not in st.session_state:
-            st.session_state.connection_history = []
-        connection_entry = {
-            'timestamp': datetime.now(),
-            'speed': speed,
-            'quality_score': quality_score
-        }
-        st.session_state.connection_history.append(connection_entry)
-        # Keep only last 50 entries
-        if len(st.session_state.connection_history) > 50:
-            st.session_state.connection_history = st.session_state.connection_history[-50:]

intern_project/corpus_collection_engine/utils/performance_optimizer.py DELETED Viewed

@@ -1,716 +0,0 @@
-"""
-Performance optimization utilities for low-bandwidth environments
-"""
-import streamlit as st
-from typing import Dict, Any, Tuple, List
-import base64
-import io
-from PIL import Image
-import logging
-import time
-import gzip
-import json
-from corpus_collection_engine.config import PWA_CONFIG
-class PerformanceOptimizer:
-    """Utilities for optimizing performance in low-bandwidth environments"""
-    def __init__(self):
-        self.logger = logging.getLogger(__name__)
-        self.config = PWA_CONFIG
-        # Performance thresholds
-        self.bandwidth_thresholds = {
-            'slow_2g': 0.05,  # 50 Kbps
-            '2g': 0.25,       # 250 Kbps
-            '3g': 1.5,        # 1.5 Mbps
-            '4g': 10.0        # 10 Mbps
-        }
-        # Optimization settings
-        self.optimization_settings = {
-            'image_quality': {
-                'slow_2g': 30,
-                '2g': 50,
-                '3g': 70,
-                '4g': 85,
-                'default': 85
-            },
-            'image_max_size': {
-                'slow_2g': (400, 300),
-                '2g': (600, 450),
-                '3g': (800, 600),
-                '4g': (1200, 900),
-                'default': (800, 600)
-            },
-            'lazy_loading_threshold': {
-                'slow_2g': 1,
-                '2g': 3,
-                '3g': 5,
-                '4g': 10,
-                'default': 5
-            }
-        }
-        # Initialize performance state
-        if 'performance_initialized' not in st.session_state:
-            st.session_state.performance_initialized = False
-            st.session_state.connection_speed = 'unknown'
-            st.session_state.optimization_level = 'default'
-    def initialize_performance_optimization(self):
-        """Initialize performance optimization"""
-        if st.session_state.performance_initialized:
-            return
-        try:
-            # Inject performance monitoring and optimization scripts
-            self._inject_performance_monitoring()
-            # Apply initial optimizations
-            self._apply_initial_optimizations()
-            st.session_state.performance_initialized = True
-            self.logger.info("Performance optimization initialized")
-        except Exception as e:
-            self.logger.error(f"Performance optimization initialization failed: {e}")
-    def _inject_performance_monitoring(self):
-        """Inject performance monitoring scripts"""
-        monitoring_script = """
-        <script>
-            // Connection speed detection
-            function detectConnectionSpeed() {
-                if ('connection' in navigator) {
-                    const connection = navigator.connection || navigator.mozConnection || navigator.webkitConnection;
-                    const effectiveType = connection.effectiveType;
-                    const downlink = connection.downlink; // Mbps
-                    console.log('Connection detected:', effectiveType, downlink + ' Mbps');
-                    // Send to Streamlit
-                    window.parent.postMessage({
-                        type: 'CONNECTION_SPEED',
-                        effectiveType: effectiveType,
-                        downlink: downlink
-                    }, '*');
-                    // Apply optimizations based on connection
-                    applyConnectionOptimizations(effectiveType, downlink);
-                } else {
-                    console.log('Network Information API not supported');
-                    // Fallback speed test
-                    performSpeedTest();
-                }
-            }
-            // Simple speed test fallback
-            function performSpeedTest() {
-                const startTime = performance.now();
-                const testImage = new Image();
-                testImage.onload = function() {
-                    const endTime = performance.now();
-                    const duration = endTime - startTime;
-                    const imageSize = 50000; // Approximate size in bytes
-                    const speed = (imageSize * 8) / (duration / 1000) / 1000000; // Mbps
-                    let effectiveType = 'unknown';
-                    if (speed < 0.1) effectiveType = 'slow-2g';
-                    else if (speed < 0.5) effectiveType = '2g';
-                    else if (speed < 2) effectiveType = '3g';
-                    else effectiveType = '4g';
-                    console.log('Speed test result:', speed.toFixed(2) + ' Mbps', effectiveType);
-                    window.parent.postMessage({
-                        type: 'CONNECTION_SPEED',
-                        effectiveType: effectiveType,
-                        downlink: speed
-                    }, '*');
-                    applyConnectionOptimizations(effectiveType, speed);
-                };
-                testImage.onerror = function() {
-                    console.log('Speed test failed, assuming slow connection');
-                    applyConnectionOptimizations('2g', 0.25);
-                };
-                testImage.src = 'data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/2wBDAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/wAARCAABAAEDASIAAhEBAxEB/8QAFQABAQAAAAAAAAAAAAAAAAAAAAv/xAAUEAEAAAAAAAAAAAAAAAAAAAAA/8QAFQEBAQAAAAAAAAAAAAAAAAAAAAX/xAAUEQEAAAAAAAAAAAAAAAAAAAAA/9oADAMBAAIRAxEAPwA/8A';
-            }
-            // Apply optimizations based on connection speed
-            function applyConnectionOptimizations(effectiveType, downlink) {
-                const body = document.body;
-                // Add connection class to body
-                body.className = body.className.replace(/connection-\\w+/g, '');
-                body.classList.add('connection-' + effectiveType);
-                // Apply specific optimizations
-                if (effectiveType === 'slow-2g' || effectiveType === '2g') {
-                    // Aggressive optimizations for slow connections
-                    applySlowConnectionOptimizations();
-                } else if (effectiveType === '3g') {
-                    // Moderate optimizations
-                    applyModerateOptimizations();
-                } else {
-                    // Minimal optimizations for fast connections
-                    applyMinimalOptimizations();
-                }
-            }
-            function applySlowConnectionOptimizations() {
-                console.log('Applying slow connection optimizations');
-                // Reduce image quality
-                const images = document.querySelectorAll('img');
-                images.forEach(img => {
-                    if (!img.dataset.optimized) {
-                        img.style.filter = 'blur(0.5px)';
-                        img.loading = 'lazy';
-                        img.dataset.optimized = 'true';
-                    }
-                });
-                // Disable animations
-                const style = document.createElement('style');
-                style.textContent = `
-                    *, *::before, *::after {
-                        animation-duration: 0.01ms !important;
-                        animation-iteration-count: 1 !important;
-                        transition-duration: 0.01ms !important;
-                    }
-                    .stSpinner > div {
-                        display: none !important;
-                    }
-                `;
-                document.head.appendChild(style);
-                // Compress text rendering
-                document.body.style.textRendering = 'optimizeSpeed';
-                document.body.style.fontDisplay = 'swap';
-            }
-            function applyModerateOptimizations() {
-                console.log('Applying moderate optimizations');
-                // Lazy load images
-                const images = document.querySelectorAll('img');
-                images.forEach(img => {
-                    img.loading = 'lazy';
-                });
-                // Reduce animation duration
-                const style = document.createElement('style');
-                style.textContent = `
-                    * {
-                        animation-duration: 0.3s !important;
-                        transition-duration: 0.2s !important;
-                    }
-                `;
-                document.head.appendChild(style);
-            }
-            function applyMinimalOptimizations() {
-                console.log('Applying minimal optimizations');
-                // Just enable lazy loading
-                const images = document.querySelectorAll('img');
-                images.forEach(img => {
-                    img.loading = 'lazy';
-                });
-            }
-            // Monitor performance
-            function monitorPerformance() {
-                if ('performance' in window) {
-                    const navigation = performance.getEntriesByType('navigation')[0];
-                    if (navigation) {
-                        const loadTime = navigation.loadEventEnd - navigation.fetchStart;
-                        console.log('Page load time:', loadTime + 'ms');
-                        if (loadTime > 5000) {
-                            console.log('Slow page load detected, applying additional optimizations');
-                            applySlowConnectionOptimizations();
-                        }
-                    }
-                }
-            }
-            // Initialize monitoring
-            detectConnectionSpeed();
-            // Monitor performance after page load
-            window.addEventListener('load', monitorPerformance);
-            // Re-check connection periodically
-            setInterval(detectConnectionSpeed, 60000); // Every minute
-        </script>
-        <style>
-            /* Base optimizations for all connections */
-            img {
-                max-width: 100%;
-                height: auto;
-                loading: lazy;
-            }
-            /* Slow connection optimizations */
-            .connection-slow-2g img,
-            .connection-2g img {
-                max-height: 300px;
-                object-fit: cover;
-                filter: blur(0.5px);
-            }
-            .connection-slow-2g .stImage,
-            .connection-2g .stImage {
-                max-height: 300px;
-            }
-            /* Disable heavy animations on slow connections */
-            .connection-slow-2g *,
-            .connection-2g * {
-                animation-duration: 0.01ms !important;
-                transition-duration: 0.01ms !important;
-            }
-            /* Optimize text rendering */
-            .connection-slow-2g,
-            .connection-2g {
-                text-rendering: optimizeSpeed;
-                font-display: swap;
-            }
-            /* Progressive enhancement for faster connections */
-            .connection-4g .stImage img {
-                transition: transform 0.3s ease;
-            }
-            .connection-4g .stImage img:hover {
-                transform: scale(1.02);
-            }
-            /* Loading indicators for slow connections */
-            .connection-slow-2g .stSpinner,
-            .connection-2g .stSpinner {
-                display: none !important;
-            }
-            /* Bandwidth indicator */
-            .bandwidth-indicator {
-                position: fixed;
-                top: 10px;
-                right: 10px;
-                background: rgba(0, 0, 0, 0.7);
-                color: white;
-                padding: 4px 8px;
-                border-radius: 4px;
-                font-size: 12px;
-                z-index: 9999;
-                display: none;
-            }
-            .connection-slow-2g .bandwidth-indicator,
-            .connection-2g .bandwidth-indicator {
-                display: block;
-                background: #ff4444;
-            }
-            .connection-3g .bandwidth-indicator {
-                display: block;
-                background: #ff9800;
-            }
-        </style>
-        <div class="bandwidth-indicator" id="bandwidth-indicator">
-            📡 Optimizing for your connection...
-        </div>
-        """
-        st.components.v1.html(monitoring_script, height=0)
-    def _apply_initial_optimizations(self):
-        """Apply initial performance optimizations"""
-        # Streamlit-specific optimizations
-        optimization_css = """
-        <style>
-            /* Streamlit performance optimizations */
-            .stApp {
-                max-width: 1200px;
-                margin: 0 auto;
-            }
-            /* Optimize form rendering */
-            .stForm {
-                border: none;
-                padding: 0;
-            }
-            /* Optimize button rendering */
-            .stButton > button {
-                transition: background-color 0.1s ease;
-            }
-            /* Optimize text area rendering */
-            .stTextArea textarea {
-                resize: vertical;
-            }
-            /* Optimize file uploader */
-            .stFileUploader {
-                border: 2px dashed #ccc;
-                border-radius: 8px;
-                padding: 20px;
-                text-align: center;
-            }
-            /* Optimize metrics display */
-            .stMetric {
-                background: #f8f9fa;
-                padding: 12px;
-                border-radius: 6px;
-                border: 1px solid #e9ecef;
-            }
-            /* Optimize expander */
-            .streamlit-expanderHeader {
-                font-weight: 600;
-            }
-            /* Optimize columns */
-            .stColumn {
-                padding: 0 8px;
-            }
-            /* Optimize sidebar */
-            .stSidebar {
-                background: #f8f9fa;
-            }
-            /* Loading optimizations */
-            .stSpinner {
-                text-align: center;
-                padding: 20px;
-            }
-            /* Mobile optimizations */
-            @media (max-width: 768px) {
-                .stApp {
-                    padding: 1rem 0.5rem;
-                }
-                .stColumn {
-                    padding: 0 4px;
-                }
-                .stButton > button {
-                    width: 100%;
-                    margin: 4px 0;
-                }
-            }
-        </style>
-        """
-        st.components.v1.html(optimization_css, height=0)
-    def optimize_image(self, image: Image.Image, connection_speed: str = 'default') -> Image.Image:
-        """Optimize image based on connection speed"""
-        try:
-            # Get optimization settings for connection speed
-            quality = self.optimization_settings['image_quality'].get(connection_speed, 85)
-            max_size = self.optimization_settings['image_max_size'].get(connection_speed, (800, 600))
-            # Create a copy to avoid modifying original
-            optimized_image = image.copy()
-            # Resize if necessary
-            if optimized_image.size[0] > max_size[0] or optimized_image.size[1] > max_size[1]:
-                optimized_image.thumbnail(max_size, Image.Resampling.LANCZOS)
-            # Convert to RGB if necessary (for JPEG compression)
-            if optimized_image.mode in ('RGBA', 'LA', 'P'):
-                # Create white background
-                background = Image.new('RGB', optimized_image.size, (255, 255, 255))
-                if optimized_image.mode == 'P':
-                    optimized_image = optimized_image.convert('RGBA')
-                background.paste(optimized_image, mask=optimized_image.split()[-1] if optimized_image.mode == 'RGBA' else None)
-                optimized_image = background
-            return optimized_image
-        except Exception as e:
-            self.logger.error(f"Error optimizing image: {e}")
-            return image
-    def compress_image_to_base64(self, image: Image.Image, connection_speed: str = 'default') -> str:
-        """Compress image to base64 with connection-appropriate quality"""
-        try:
-            # Optimize image first
-            optimized_image = self.optimize_image(image, connection_speed)
-            # Get quality setting
-            quality = self.optimization_settings['image_quality'].get(connection_speed, 85)
-            # Compress to bytes
-            buffer = io.BytesIO()
-            optimized_image.save(buffer, format="JPEG", quality=quality, optimize=True)
-            # Convert to base64
-            img_bytes = buffer.getvalue()
-            img_base64 = base64.b64encode(img_bytes).decode()
-            # Log compression results
-            original_size = len(base64.b64encode(self._image_to_bytes(image)).decode())
-            compressed_size = len(img_base64)
-            compression_ratio = (1 - compressed_size / original_size) * 100 if original_size > 0 else 0
-            self.logger.info(f"Image compressed: {original_size} -> {compressed_size} bytes ({compression_ratio:.1f}% reduction)")
-            return img_base64
-        except Exception as e:
-            self.logger.error(f"Error compressing image: {e}")
-            # Fallback to basic conversion
-            return self._image_to_base64_basic(image)
-    def _image_to_bytes(self, image: Image.Image) -> bytes:
-        """Convert image to bytes"""
-        buffer = io.BytesIO()
-        image.save(buffer, format="PNG")
-        return buffer.getvalue()
-    def _image_to_base64_basic(self, image: Image.Image) -> str:
-        """Basic image to base64 conversion without optimization"""
-        buffer = io.BytesIO()
-        image.save(buffer, format="JPEG", quality=85)
-        return base64.b64encode(buffer.getvalue()).decode()
-    def compress_json_data(self, data: Dict[str, Any]) -> str:
-        """Compress JSON data for transmission"""
-        try:
-            # Convert to JSON string
-            json_str = json.dumps(data, separators=(',', ':'), ensure_ascii=False)
-            # Compress with gzip
-            compressed = gzip.compress(json_str.encode('utf-8'))
-            # Convert to base64 for transmission
-            compressed_b64 = base64.b64encode(compressed).decode()
-            # Log compression results
-            original_size = len(json_str.encode('utf-8'))
-            compressed_size = len(compressed)
-            compression_ratio = (1 - compressed_size / original_size) * 100 if original_size > 0 else 0
-            self.logger.info(f"JSON compressed: {original_size} -> {compressed_size} bytes ({compression_ratio:.1f}% reduction)")
-            return compressed_b64
-        except Exception as e:
-            self.logger.error(f"Error compressing JSON data: {e}")
-            return json.dumps(data)
-    def decompress_json_data(self, compressed_data: str) -> Dict[str, Any]:
-        """Decompress JSON data"""
-        try:
-            # Decode from base64
-            compressed_bytes = base64.b64decode(compressed_data)
-            # Decompress
-            decompressed_bytes = gzip.decompress(compressed_bytes)
-            # Parse JSON
-            json_str = decompressed_bytes.decode('utf-8')
-            return json.loads(json_str)
-        except Exception as e:
-            self.logger.error(f"Error decompressing JSON data: {e}")
-            # Fallback to direct JSON parsing
-            try:
-                return json.loads(compressed_data)
-            except Exception:
-                return {}
-    def render_performance_indicator(self):
-        """Render performance and connection indicator"""
-        connection_speed = st.session_state.get('connection_speed', 'unknown')
-        if connection_speed in ['slow_2g', '2g']:
-            st.info("📡 Slow connection detected. App optimized for your bandwidth.")
-        elif connection_speed == '3g':
-            st.info("📡 Moderate connection detected. Some optimizations applied.")
-        # Performance tips for slow connections
-        if connection_speed in ['slow_2g', '2g']:
-            with st.expander("💡 Tips for Better Performance"):
-                st.markdown("""
-                **Optimizations Applied:**
-                - Images automatically compressed
-                - Animations disabled
-                - Lazy loading enabled
-                - Text rendering optimized
-                **Tips for Better Experience:**
-                - Use WiFi when available
-                - Close other apps/tabs
-                - Upload smaller images when possible
-                - Work offline when connection is very slow
-                """)
-    def lazy_load_content(self, content_list: List[Any], page_size: int = None) -> Tuple[List[Any], bool]:
-        """Implement lazy loading for content lists"""
-        if not content_list:
-            return [], False
-        # Determine page size based on connection speed
-        connection_speed = st.session_state.get('connection_speed', 'default')
-        if page_size is None:
-            page_size = self.optimization_settings['lazy_loading_threshold'].get(connection_speed, 5)
-        # Get current page from session state
-        page_key = f"lazy_load_page_{id(content_list)}"
-        current_page = st.session_state.get(page_key, 0)
-        # Calculate slice
-        start_idx = current_page * page_size
-        end_idx = start_idx + page_size
-        # Get current page content
-        current_content = content_list[start_idx:end_idx]
-        has_more = end_idx < len(content_list)
-        # Load more button
-        if has_more:
-            if st.button(f"📄 Load More ({len(content_list) - end_idx} remaining)", key=f"load_more_{id(content_list)}"):
-                st.session_state[page_key] = current_page + 1
-                st.rerun()
-        return current_content, has_more
-    def optimize_streamlit_config(self):
-        """Apply Streamlit-specific optimizations"""
-        # Inject Streamlit optimizations
-        streamlit_optimizations = """
-        <script>
-            // Optimize Streamlit rendering
-            function optimizeStreamlit() {
-                // Disable unnecessary Streamlit features for performance
-                const style = document.createElement('style');
-                style.textContent = `
-                    /* Hide Streamlit branding for performance */
-                    .stDeployButton {
-                        display: none;
-                    }
-                    /* Optimize form rendering */
-                    .stForm {
-                        border: none;
-                        box-shadow: none;
-                    }
-                    /* Optimize button hover effects */
-                    .stButton > button:hover {
-                        transform: none;
-                        box-shadow: none;
-                    }
-                    /* Optimize text input focus */
-                    .stTextInput > div > div > input:focus {
-                        box-shadow: 0 0 0 1px #FF6B35;
-                    }
-                    /* Optimize selectbox */
-                    .stSelectbox > div > div {
-                        border-radius: 4px;
-                    }
-                    /* Optimize progress bars */
-                    .stProgress > div > div {
-                        transition: width 0.1s ease;
-                    }
-                `;
-                document.head.appendChild(style);
-            }
-            // Apply optimizations when DOM is ready
-            if (document.readyState === 'loading') {
-                document.addEventListener('DOMContentLoaded', optimizeStreamlit);
-            } else {
-                optimizeStreamlit();
-            }
-            // Re-apply optimizations when Streamlit updates the page
-            const observer = new MutationObserver(function(mutations) {
-                let shouldOptimize = false;
-                mutations.forEach(function(mutation) {
-                    if (mutation.type === 'childList' && mutation.addedNodes.length > 0) {
-                        shouldOptimize = true;
-                    }
-                });
-                if (shouldOptimize) {
-                    setTimeout(optimizeStreamlit, 100);
-                }
-            });
-            observer.observe(document.body, {
-                childList: true,
-                subtree: true
-            });
-        </script>
-        """
-        st.components.v1.html(streamlit_optimizations, height=0)
-    def get_optimization_stats(self) -> Dict[str, Any]:
-        """Get current optimization statistics"""
-        return {
-            'connection_speed': st.session_state.get('connection_speed', 'unknown'),
-            'optimization_level': st.session_state.get('optimization_level', 'default'),
-            'performance_initialized': st.session_state.get('performance_initialized', False),
-            'optimizations_applied': {
-                'image_compression': True,
-                'lazy_loading': True,
-                'animation_reduction': st.session_state.get('connection_speed') in ['slow_2g', '2g'],
-                'text_optimization': True
-            }
-        }
-    def measure_performance(self, operation_name: str):
-        """Context manager for measuring operation performance"""
-        return PerformanceMeasurement(operation_name, self.logger)
-class PerformanceMeasurement:
-    """Context manager for measuring performance"""
-    def __init__(self, operation_name: str, logger):
-        self.operation_name = operation_name
-        self.logger = logger
-        self.start_time = None
-    def __enter__(self):
-        self.start_time = time.time()
-        return self
-    def __exit__(self, exc_type, exc_val, exc_tb):
-        if self.start_time:
-            duration = time.time() - self.start_time
-            self.logger.info(f"Performance: {self.operation_name} took {duration:.3f}s")
-            # Store in session state for analytics
-            if 'performance_metrics' not in st.session_state:
-                st.session_state.performance_metrics = {}
-            st.session_state.performance_metrics[self.operation_name] = duration

intern_project/corpus_collection_engine/utils/session_manager.py DELETED Viewed

@@ -1,482 +0,0 @@
-"""
-Session management utilities for the Corpus Collection Engine
-"""
-import streamlit as st
-import uuid
-import json
-from datetime import datetime, timedelta
-from typing import Dict, Any, Optional, List
-import logging
-from corpus_collection_engine.models.data_models import UserContribution, ActivityType
-from corpus_collection_engine.services.storage_service import StorageService
-from corpus_collection_engine.utils.error_handler import global_error_handler, ErrorCategory, ErrorSeverity
-class SessionManager:
-    """Manages user sessions and state across the application"""
-    def __init__(self):
-        self.logger = logging.getLogger(__name__)
-        self.storage_service = StorageService()
-        self._initialize_session()
-    def _initialize_session(self):
-        """Initialize session state variables"""
-        # Core session data
-        if 'session_id' not in st.session_state:
-            st.session_state.session_id = str(uuid.uuid4())
-        if 'user_id' not in st.session_state:
-            st.session_state.user_id = f"user_{str(uuid.uuid4())[:8]}"
-        if 'session_start_time' not in st.session_state:
-            st.session_state.session_start_time = datetime.now()
-        # User preferences
-        if 'user_preferences' not in st.session_state:
-            st.session_state.user_preferences = {
-                'preferred_language': 'en',
-                'preferred_region': None,
-                'theme': 'light',
-                'notifications_enabled': True,
-                'auto_save': True
-            }
-        # Activity tracking
-        if 'activity_history' not in st.session_state:
-            st.session_state.activity_history = []
-        if 'current_activity_start' not in st.session_state:
-            st.session_state.current_activity_start = None
-        # Contribution tracking
-        if 'session_contributions' not in st.session_state:
-            st.session_state.session_contributions = []
-        if 'total_session_contributions' not in st.session_state:
-            st.session_state.total_session_contributions = 0
-        # Progress tracking
-        if 'session_progress' not in st.session_state:
-            st.session_state.session_progress = {
-                'activities_started': set(),
-                'activities_completed': set(),
-                'languages_used': set(),
-                'regions_contributed': set(),
-                'achievements_unlocked': set(),
-                'streak_days': 0,
-                'total_time_spent': timedelta()
-            }
-        # Application state
-        if 'app_state' not in st.session_state:
-            st.session_state.app_state = {
-                'privacy_consent_given': False,
-                'onboarding_completed': False,
-                'tutorial_completed': False,
-                'first_contribution_made': False,
-                'feedback_given': False
-            }
-        # Error and performance tracking
-        if 'session_errors' not in st.session_state:
-            st.session_state.session_errors = []
-        if 'performance_metrics' not in st.session_state:
-            st.session_state.performance_metrics = {}
-    def get_session_id(self) -> str:
-        """Get current session ID"""
-        return st.session_state.session_id
-    def get_user_id(self) -> str:
-        """Get current user ID"""
-        return st.session_state.user_id
-    def start_activity(self, activity_type: ActivityType) -> None:
-        """Record the start of an activity"""
-        try:
-            start_time = datetime.now()
-            st.session_state.current_activity_start = start_time
-            st.session_state.session_progress['activities_started'].add(activity_type.value)
-            # Add to activity history
-            activity_record = {
-                'activity_type': activity_type.value,
-                'start_time': start_time,
-                'end_time': None,
-                'duration': None,
-                'completed': False,
-                'contributions_made': 0
-            }
-            st.session_state.activity_history.append(activity_record)
-            self.logger.info(f"Activity started: {activity_type.value}")
-        except Exception as e:
-            global_error_handler.handle_error(
-                e,
-                ErrorCategory.SYSTEM,
-                ErrorSeverity.LOW,
-                context={'component': 'session_manager', 'action': 'start_activity'}
-            )
-    def complete_activity(self, activity_type: ActivityType, contributions_made: int = 0) -> None:
-        """Record the completion of an activity"""
-        try:
-            end_time = datetime.now()
-            st.session_state.session_progress['activities_completed'].add(activity_type.value)
-            # Update the most recent activity record
-            if st.session_state.activity_history:
-                last_activity = st.session_state.activity_history[-1]
-                if last_activity['activity_type'] == activity_type.value and not last_activity['completed']:
-                    last_activity['end_time'] = end_time
-                    last_activity['completed'] = True
-                    last_activity['contributions_made'] = contributions_made
-                    if st.session_state.current_activity_start:
-                        duration = end_time - st.session_state.current_activity_start
-                        last_activity['duration'] = duration
-                        st.session_state.session_progress['total_time_spent'] += duration
-            st.session_state.current_activity_start = None
-            # Check for achievements
-            self._check_achievements()
-            self.logger.info(f"Activity completed: {activity_type.value}")
-        except Exception as e:
-            global_error_handler.handle_error(
-                e,
-                ErrorCategory.SYSTEM,
-                ErrorSeverity.LOW,
-                context={'component': 'session_manager', 'action': 'complete_activity'}
-            )
-    def record_contribution(self, contribution: UserContribution) -> None:
-        """Record a user contribution in the session"""
-        try:
-            # Add to session contributions
-            contribution_record = {
-                'id': contribution.contribution_id,
-                'activity_type': contribution.activity_type,
-                'language': contribution.language,
-                'region': contribution.region,
-                'timestamp': contribution.timestamp,
-                'content_type': contribution.content_type
-            }
-            st.session_state.session_contributions.append(contribution_record)
-            st.session_state.total_session_contributions += 1
-            # Update progress tracking
-            if contribution.language:
-                st.session_state.session_progress['languages_used'].add(contribution.language)
-            if contribution.region:
-                st.session_state.session_progress['regions_contributed'].add(contribution.region)
-            # Mark first contribution milestone
-            if not st.session_state.app_state['first_contribution_made']:
-                st.session_state.app_state['first_contribution_made'] = True
-                self._unlock_achievement('first_contribution')
-            # Check for other achievements
-            self._check_achievements()
-            self.logger.info(f"Contribution recorded: {contribution.contribution_id}")
-        except Exception as e:
-            global_error_handler.handle_error(
-                e,
-                ErrorCategory.SYSTEM,
-                ErrorSeverity.MEDIUM,
-                context={'component': 'session_manager', 'action': 'record_contribution'}
-            )
-    def update_user_preferences(self, preferences: Dict[str, Any]) -> None:
-        """Update user preferences"""
-        try:
-            st.session_state.user_preferences.update(preferences)
-            # Save preferences to storage for persistence
-            self.storage_service.save_user_preferences(
-                st.session_state.user_id,
-                st.session_state.user_preferences
-            )
-            self.logger.info("User preferences updated")
-        except Exception as e:
-            global_error_handler.handle_error(
-                e,
-                ErrorCategory.SYSTEM,
-                ErrorSeverity.LOW,
-                context={'component': 'session_manager', 'action': 'update_preferences'}
-            )
-    def get_session_summary(self) -> Dict[str, Any]:
-        """Get a summary of the current session"""
-        try:
-            current_time = datetime.now()
-            session_duration = current_time - st.session_state.session_start_time
-            # Calculate active time (time spent in activities)
-            active_time = st.session_state.session_progress['total_time_spent']
-            if st.session_state.current_activity_start:
-                active_time += current_time - st.session_state.current_activity_start
-            return {
-                'session_id': st.session_state.session_id,
-                'user_id': st.session_state.user_id,
-                'session_duration': session_duration,
-                'active_time': active_time,
-                'activities_started': len(st.session_state.session_progress['activities_started']),
-                'activities_completed': len(st.session_state.session_progress['activities_completed']),
-                'total_contributions': st.session_state.total_session_contributions,
-                'languages_used': list(st.session_state.session_progress['languages_used']),
-                'regions_contributed': list(st.session_state.session_progress['regions_contributed']),
-                'achievements_unlocked': list(st.session_state.session_progress['achievements_unlocked']),
-                'completion_rate': self._calculate_completion_rate(),
-                'engagement_score': self._calculate_engagement_score()
-            }
-        except Exception as e:
-            global_error_handler.handle_error(
-                e,
-                ErrorCategory.SYSTEM,
-                ErrorSeverity.LOW,
-                context={'component': 'session_manager', 'action': 'get_summary'}
-            )
-            return {}
-    def _calculate_completion_rate(self) -> float:
-        """Calculate activity completion rate"""
-        started = len(st.session_state.session_progress['activities_started'])
-        completed = len(st.session_state.session_progress['activities_completed'])
-        if started == 0:
-            return 0.0
-        return (completed / started) * 100
-    def _calculate_engagement_score(self) -> float:
-        """Calculate user engagement score"""
-        try:
-            score = 0.0
-            # Base score for participation
-            score += min(st.session_state.total_session_contributions * 10, 50)
-            # Bonus for activity diversity
-            activities_tried = len(st.session_state.session_progress['activities_started'])
-            score += min(activities_tried * 15, 60)
-            # Bonus for language diversity
-            languages_used = len(st.session_state.session_progress['languages_used'])
-            score += min(languages_used * 10, 30)
-            # Bonus for completion rate
-            completion_rate = self._calculate_completion_rate()
-            score += completion_rate * 0.6
-            # Bonus for achievements
-            achievements = len(st.session_state.session_progress['achievements_unlocked'])
-            score += min(achievements * 5, 25)
-            return min(score, 100.0)  # Cap at 100
-        except Exception:
-            return 0.0
-    def _check_achievements(self) -> None:
-        """Check and unlock achievements based on current progress"""
-        try:
-            progress = st.session_state.session_progress
-            # Contribution milestones
-            contributions = st.session_state.total_session_contributions
-            if contributions >= 5 and 'contributor' not in progress['achievements_unlocked']:
-                self._unlock_achievement('contributor')
-            if contributions >= 10 and 'active_contributor' not in progress['achievements_unlocked']:
-                self._unlock_achievement('active_contributor')
-            if contributions >= 25 and 'super_contributor' not in progress['achievements_unlocked']:
-                self._unlock_achievement('super_contributor')
-            # Activity diversity
-            activities_completed = len(progress['activities_completed'])
-            if activities_completed >= 2 and 'explorer' not in progress['achievements_unlocked']:
-                self._unlock_achievement('explorer')
-            if activities_completed >= 4 and 'cultural_ambassador' not in progress['achievements_unlocked']:
-                self._unlock_achievement('cultural_ambassador')
-            # Language diversity
-            languages_used = len(progress['languages_used'])
-            if languages_used >= 2 and 'polyglot' not in progress['achievements_unlocked']:
-                self._unlock_achievement('polyglot')
-            if languages_used >= 3 and 'language_champion' not in progress['achievements_unlocked']:
-                self._unlock_achievement('language_champion')
-            # Regional diversity
-            regions = len(progress['regions_contributed'])
-            if regions >= 2 and 'regional_expert' not in progress['achievements_unlocked']:
-                self._unlock_achievement('regional_expert')
-        except Exception as e:
-            self.logger.error(f"Error checking achievements: {e}")
-    def _unlock_achievement(self, achievement_id: str) -> None:
-        """Unlock an achievement"""
-        try:
-            st.session_state.session_progress['achievements_unlocked'].add(achievement_id)
-            # Show achievement notification
-            achievement_info = self._get_achievement_info(achievement_id)
-            st.success(f"🏆 Achievement Unlocked: {achievement_info['title']}!")
-            st.info(achievement_info['description'])
-            self.logger.info(f"Achievement unlocked: {achievement_id}")
-        except Exception as e:
-            self.logger.error(f"Error unlocking achievement {achievement_id}: {e}")
-    def _get_achievement_info(self, achievement_id: str) -> Dict[str, str]:
-        """Get information about an achievement"""
-        achievements = {
-            'first_contribution': {
-                'title': 'First Steps',
-                'description': 'Made your first contribution to preserving Indian culture!'
-            },
-            'contributor': {
-                'title': 'Contributor',
-                'description': 'Made 5 contributions - you\'re making a difference!'
-            },
-            'active_contributor': {
-                'title': 'Active Contributor',
-                'description': 'Made 10 contributions - your dedication is inspiring!'
-            },
-            'super_contributor': {
-                'title': 'Super Contributor',
-                'description': 'Made 25 contributions - you\'re a cultural preservation hero!'
-            },
-            'explorer': {
-                'title': 'Cultural Explorer',
-                'description': 'Completed 2 different activities - exploring our rich heritage!'
-            },
-            'cultural_ambassador': {
-                'title': 'Cultural Ambassador',
-                'description': 'Completed all activities - you\'re a true cultural ambassador!'
-            },
-            'polyglot': {
-                'title': 'Polyglot',
-                'description': 'Contributed in 2 languages - celebrating linguistic diversity!'
-            },
-            'language_champion': {
-                'title': 'Language Champion',
-                'description': 'Contributed in 3+ languages - preserving multilingual heritage!'
-            },
-            'regional_expert': {
-                'title': 'Regional Expert',
-                'description': 'Contributed from multiple regions - showcasing India\'s diversity!'
-            }
-        }
-        return achievements.get(achievement_id, {
-            'title': 'Unknown Achievement',
-            'description': 'You\'ve accomplished something special!'
-        })
-    def save_session_data(self) -> None:
-        """Save session data to persistent storage"""
-        try:
-            session_data = {
-                'session_id': st.session_state.session_id,
-                'user_id': st.session_state.user_id,
-                'session_start_time': st.session_state.session_start_time.isoformat(),
-                'user_preferences': st.session_state.user_preferences,
-                'session_progress': {
-                    'activities_started': list(st.session_state.session_progress['activities_started']),
-                    'activities_completed': list(st.session_state.session_progress['activities_completed']),
-                    'languages_used': list(st.session_state.session_progress['languages_used']),
-                    'regions_contributed': list(st.session_state.session_progress['regions_contributed']),
-                    'achievements_unlocked': list(st.session_state.session_progress['achievements_unlocked']),
-                    'total_time_spent': str(st.session_state.session_progress['total_time_spent'])
-                },
-                'app_state': st.session_state.app_state,
-                'total_contributions': st.session_state.total_session_contributions
-            }
-            self.storage_service.save_session_data(session_data)
-            self.logger.info("Session data saved successfully")
-        except Exception as e:
-            global_error_handler.handle_error(
-                e,
-                ErrorCategory.STORAGE,
-                ErrorSeverity.MEDIUM,
-                context={'component': 'session_manager', 'action': 'save_session'}
-            )
-    def load_session_data(self, session_id: str) -> bool:
-        """Load session data from persistent storage"""
-        try:
-            session_data = self.storage_service.load_session_data(session_id)
-            if session_data:
-                # Restore session state
-                st.session_state.session_id = session_data['session_id']
-                st.session_state.user_id = session_data['user_id']
-                st.session_state.session_start_time = datetime.fromisoformat(session_data['session_start_time'])
-                st.session_state.user_preferences = session_data['user_preferences']
-                st.session_state.app_state = session_data['app_state']
-                st.session_state.total_session_contributions = session_data.get('total_contributions', 0)
-                # Restore progress (convert lists back to sets)
-                progress_data = session_data['session_progress']
-                st.session_state.session_progress.update({
-                    'activities_started': set(progress_data['activities_started']),
-                    'activities_completed': set(progress_data['activities_completed']),
-                    'languages_used': set(progress_data['languages_used']),
-                    'regions_contributed': set(progress_data['regions_contributed']),
-                    'achievements_unlocked': set(progress_data['achievements_unlocked'])
-                })
-                self.logger.info(f"Session data loaded successfully: {session_id}")
-                return True
-            return False
-        except Exception as e:
-            global_error_handler.handle_error(
-                e,
-                ErrorCategory.STORAGE,
-                ErrorSeverity.MEDIUM,
-                context={'component': 'session_manager', 'action': 'load_session'}
-            )
-            return False
-    def cleanup_session(self) -> None:
-        """Clean up session data on exit"""
-        try:
-            # Save final session data
-            self.save_session_data()
-            # Log session summary
-            summary = self.get_session_summary()
-            self.logger.info(f"Session ended: {json.dumps(summary, default=str)}")
-        except Exception as e:
-            self.logger.error(f"Error during session cleanup: {e}")
-# Global session manager instance
-session_manager = SessionManager()

intern_project/data/corpus_collection.db DELETED Viewed

Binary file (53.2 kB)

intern_project/main.py DELETED Viewed

@@ -1,6 +0,0 @@
-def main():
-    print("Hello from intern-project!")
-if __name__ == "__main__":
-    main()