Spaces:
Running
Running
A newer version of the Streamlit SDK is available:
1.43.2
metadata
title: Mehfil-e-Sukhan
emoji: π
colorFrom: red
colorTo: gray
sdk: streamlit
sdk_version: 1.43.0
app_file: app.py
pinned: false
Mehfil-e-Sukhan: Har Lafz Ek Mehfil
An AI-powered Roman Urdu poetry generation application using BiLSTM neural networks.
Overview
Mehfil-e-Sukhan ("Poetry Gathering" in Urdu) is an interactive application that generates Roman Urdu poetry based on a starting word or phrase provided by the user. The application uses a Bidirectional LSTM neural network trained on a curated dataset of Roman Urdu poetry.
Features
- Custom Poetry Generation: Generate Roman Urdu poetry from any starting word or phrase.
- Adjustable Parameters:
- Number of Words: Control the length of generated poetry (12-48 words).
- Creativity (Temperature): Adjust the randomness in word selection (0.5-2.0).
- Focus (Top-p): Fine-tune how closely the model adheres to probable word sequences (0.5-1.0).
- Elegant Interface: Dark-themed UI designed specifically for poetry presentation.
- Automatic Formatting: Output is automatically formatted into poetic lines.
How to Use
- Enter a starting word or phrase in Roman Urdu (e.g., "ishq", "zindagi", "mohabbat").
- Adjust the generation parameters:
- Number of Words: Select how many words you want in your poem.
- Creativity: Higher values (>1.0) produce more unique but potentially less coherent poetry. Lower values (<1.0) create more predictable output.
- Focus: Higher values make the AI stick to more probable word combinations.
- Click "Generate Poetry" and wait for your custom poem to appear.
Technical Details
- Model: Bidirectional LSTM with 3 layers
- Tokenization: SentencePiece with BPE encoding
- Vocabulary Size: 12,000 tokens
- Text Generation: Nucleus (top-p) sampling for balanced creativity and coherence
Installation for Local Development
If you want to run the application locally:
# Clone the repository
git clone https://github.com/yourusername/Mehfil-e-Sukhan.git
cd Mehfil-e-Sukhan
# Create and activate a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate # On Linux/Mac
# or
venv\Scripts\activate # On Windows
# Install dependencies
pip install -r requirements.txt
# Run the application
streamlit run app.py
Requirements
- Python 3.8+
- torch==2.6.0
- sentencepiece==0.2.0
- huggingface-hub==0.29.3
- streamlit==1.43.0
Project Structure
Mehfil-e-Sukhan/
βββ app.py # Main application file
βββ requirements.txt # Python dependencies
βββ README.md # This documentation
The model weights and SentencePiece model are stored on Hugging Face Hub and are downloaded automatically when the application runs.
How It Works
- Data Processing: The model was trained on a curated dataset of Roman Urdu poetry lines.
- Tokenization: Text was tokenized using SentencePiece's BPE algorithm.
- Model Training: A Bidirectional LSTM architecture was trained to predict the next token in a sequence.
- Text Generation: At inference time, nucleus sampling is used to select the next word with a balance of creativity and coherence.
- Formatting: Generated text is automatically formatted into lines with alternating indentation for aesthetic presentation.
Model and Dataset
- Model: You can find the complete model, weights, and training notebooks on Hugging Face: Mehfil-e-Sukhan on Hugging Face
- Dataset: The model was trained on the Roman Urdu Poetry dataset available on Kaggle: Roman Urdu Poetry Dataset
Limitations
- The current model was trained on a relatively small dataset (~1300 lines), which may occasionally result in repetitive patterns.
- Roman Urdu is not standardized, so the model may struggle with unusual spellings or transliterations.
- Generation speed depends on available computational resources.
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Contact
- LinkedIn: Muhammad Huzaifa Saqib
- GitHub: zaiffishiekh01
- Email: [email protected]
Acknowledgements
- Poetry is the rhythmical creation of beauty in words - Edgar Allan Poe