|
|
--- |
|
|
title: Corpus Collection Engine |
|
|
emoji: ๐ |
|
|
colorFrom: purple |
|
|
colorTo: indigo |
|
|
sdk: gradio |
|
|
sdk_version: 5.42.0 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
license: mit |
|
|
short_description: AI-powered platform for preserving Indian cultural heritage |
|
|
--- |
|
|
|
|
|
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |
|
|
|
|
|
# ๐ฎ๐ณ Corpus Collection Engine |
|
|
|
|
|
Team Information |
|
|
- **Team Name**: Heritage Collectors |
|
|
- **Team Members**: |
|
|
- Member 1: Singaraju Saiteja (Role: Streamlit app development) |
|
|
- Member 2: Muthyapu Sudeepthi (Role: AI Integration) |
|
|
- Member 3: Rithika Sadhu (Role: Documentation) |
|
|
- Member 4: Golla Bharath Kumar (Role: developement stratergy) |
|
|
- Member 5: k. Vamshi Kumar (Role: App design and user experience) |
|
|
|
|
|
**AI-powered platform for preserving Indian cultural heritage through interactive data collection** |
|
|
|
|
|
## ๐ Setup & Installation |
|
|
|
|
|
### Prerequisites |
|
|
- Python 3.8 or higher |
|
|
- pip package manager |
|
|
- Git (for cloning the repository) |
|
|
|
|
|
### Quick Start |
|
|
|
|
|
1. **Clone the Repository** |
|
|
```bash |
|
|
git clone [repository-url] |
|
|
cd corpus-collection-engine |
|
|
``` |
|
|
|
|
|
2. **Create Virtual Environment** |
|
|
```bash |
|
|
python -m venv venv |
|
|
|
|
|
# On Windows |
|
|
venv\Scripts\activate |
|
|
|
|
|
# On macOS/Linux |
|
|
source venv/bin/activate |
|
|
``` |
|
|
|
|
|
3. **Install Dependencies** |
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
4. **Run the Application** |
|
|
```bash |
|
|
streamlit run corpus_collection_engine/main.py |
|
|
``` |
|
|
|
|
|
5. **Access the App** |
|
|
Open your browser and navigate to localhost:8501 |
|
|
|
|
|
### Alternative Installation Methods |
|
|
|
|
|
#### Using Docker |
|
|
```bash |
|
|
docker build -t corpus-collection-engine . |
|
|
docker run -p 8501:8501 corpus-collection-engine |
|
|
``` |
|
|
|
|
|
#### Using the Smart Installer |
|
|
```bash |
|
|
python install_dependencies.py |
|
|
python start_app.py |
|
|
``` |
|
|
|
|
|
## ๐ What is this? |
|
|
|
|
|
The Corpus Collection Engine is an innovative Streamlit application designed to collect and preserve diverse data about Indian languages, history, and culture. Through engaging activities, users contribute to building culturally-aware AI systems while helping preserve India's rich heritage. |
|
|
|
|
|
## ๐ฏ Features |
|
|
|
|
|
### ๐ญ Interactive Cultural Activities |
|
|
- **Meme Creator**: Generate culturally relevant memes in Indian languages |
|
|
- **Recipe Collector**: Share traditional recipes with cultural context |
|
|
- **Folklore Archive**: Preserve stories, legends, and oral traditions |
|
|
- **Landmark Identifier**: Document historical and cultural landmarks |
|
|
|
|
|
### ๐ Multi-language Support |
|
|
- Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia, Assamese |
|
|
- Native script support and cultural context preservation |
|
|
|
|
|
### ๐ Real-time Analytics |
|
|
- Contribution tracking and cultural impact metrics |
|
|
- Language diversity and regional distribution analysis |
|
|
- User engagement and platform growth insights |
|
|
|
|
|
### ๐ Privacy-First Design |
|
|
- No authentication required - start contributing immediately |
|
|
- Minimal data collection with full transparency |
|
|
- User-controlled privacy settings |
|
|
|
|
|
## ๐ How to Use |
|
|
|
|
|
1. **Choose an Activity**: Select from meme creation, recipe sharing, folklore collection, or landmark documentation |
|
|
2. **Select Your Language**: Pick from 11 supported Indian languages |
|
|
3. **Contribute Content**: Share your cultural knowledge and creativity |
|
|
4. **Add Context**: Provide cultural significance and regional information |
|
|
5. **Submit**: Your contribution helps build culturally-aware AI! |
|
|
|
|
|
## ๐จ Activities Overview |
|
|
|
|
|
### ๐ญ Meme Creator |
|
|
Create humorous content that reflects Indian culture, festivals, traditions, and daily life. Perfect for capturing contemporary cultural expressions. |
|
|
|
|
|
### ๐ Recipe Collector |
|
|
Share traditional family recipes, regional specialties, and festival foods. Include cultural significance, occasions, and regional variations. |
|
|
|
|
|
### ๐ Folklore Archive |
|
|
Preserve oral traditions, folk tales, legends, and cultural stories. Help maintain the rich narrative heritage of India. |
|
|
|
|
|
### ๐๏ธ Landmark Identifier |
|
|
Document historical sites, cultural landmarks, and places of significance. Share stories and cultural importance of locations. |
|
|
|
|
|
## ๐ ๏ธ Technical Architecture |
|
|
|
|
|
### Built With |
|
|
- **Frontend**: Streamlit with custom components |
|
|
- **Backend**: Python with modular service architecture |
|
|
- **AI Integration**: Fallback text generation for public deployment |
|
|
- **Storage**: SQLite for local development, extensible for production |
|
|
- **Analytics**: Real-time metrics and reporting |
|
|
- **PWA**: Progressive Web App features for offline access |
|
|
|
|
|
### Project Structure |
|
|
``` |
|
|
corpus_collection_engine/ |
|
|
โโโ main.py # Application entry point |
|
|
โโโ config.py # Configuration settings |
|
|
โโโ activities/ # Activity implementations |
|
|
โ โโโ meme_creator.py |
|
|
โ โโโ recipe_collector.py |
|
|
โ โโโ folklore_collector.py |
|
|
โ โโโ landmark_identifier.py |
|
|
โโโ services/ # Core services |
|
|
โ โโโ ai_service.py |
|
|
โ โโโ analytics_service.py |
|
|
โ โโโ engagement_service.py |
|
|
โ โโโ privacy_service.py |
|
|
โโโ models/ # Data models |
|
|
โโโ utils/ # Utility functions |
|
|
โโโ pwa/ # Progressive Web App files |
|
|
``` |
|
|
|
|
|
## ๐งช Testing |
|
|
|
|
|
Run the test suite: |
|
|
```bash |
|
|
python -m pytest tests/ |
|
|
``` |
|
|
|
|
|
Run specific tests: |
|
|
```bash |
|
|
python test_app_startup.py |
|
|
``` |
|
|
|
|
|
## ๐ Deployment |
|
|
|
|
|
### Hugging Face Spaces |
|
|
1. Upload files to your Hugging Face Space |
|
|
2. Use `app.py` as the entry point |
|
|
3. Ensure `requirements.txt` and `.streamlit/config.toml` are included |
|
|
|
|
|
### Local Production |
|
|
```bash |
|
|
streamlit run corpus_collection_engine/main.py --server.port 8501 |
|
|
``` |
|
|
|
|
|
## ๐ค Contributing |
|
|
|
|
|
We welcome contributions! Please see CONTRIBUTING.md for guidelines. |
|
|
|
|
|
## ๐ License |
|
|
|
|
|
This project is licensed under the MIT License - see the LICENSE file for details. |
|
|
|
|
|
## ๐ Why Contribute? |
|
|
|
|
|
- **Preserve Culture**: Help maintain India's diverse cultural heritage for future generations |
|
|
- **Build Better AI**: Contribute to creating more culturally-aware and inclusive AI systems |
|
|
- **Share Knowledge**: Connect with others who value cultural preservation |
|
|
- **Make Impact**: See real-time analytics of your cultural preservation impact |
|
|
|
|
|
## ๐ Platform Impact |
|
|
|
|
|
Track the collective impact of cultural preservation efforts: |
|
|
- Total contributions across all languages |
|
|
- Geographic distribution of cultural content |
|
|
- Language diversity metrics |
|
|
- Cultural significance scoring |
|
|
|
|
|
## ๐ง Development |
|
|
|
|
|
### Environment Setup |
|
|
```bash |
|
|
# Install development dependencies |
|
|
pip install -r requirements-dev.txt |
|
|
|
|
|
# Run linting |
|
|
flake8 corpus_collection_engine/ |
|
|
|
|
|
# Run type checking |
|
|
mypy corpus_collection_engine/ |
|
|
``` |
|
|
|
|
|
### Configuration |
|
|
- Copy `.env.example` to `.env` and configure your settings |
|
|
- Modify `corpus_collection_engine/config.py` for application settings |
|
|
|
|
|
## ๐ Support |
|
|
|
|
|
- **Issues**: Report bugs and request features via GitHub Issues |
|
|
- **Documentation**: Check our comprehensive guides in the docs folder |
|
|
- **Community**: Join our discussions via GitHub Discussions |
|
|
|
|
|
--- |
|
|
|
|
|
**Start preserving Indian culture today! ๐ฎ๐ณโจ** |
|
|
|
|
|
*Every contribution matters in building a more culturally-aware digital future.* |
|
|
|