metadata

title: Mehfil-e-Sukhan
emoji: 📜
colorFrom: red
colorTo: gray
sdk: streamlit
sdk_version: 1.43.0
app_file: app.py
pinned: false

Mehfil-e-Sukhan: Har Lafz Ek Mehfil

An AI-powered Roman Urdu poetry generation application using BiLSTM neural networks.

Overview

Mehfil-e-Sukhan ("Poetry Gathering" in Urdu) is an interactive application that generates Roman Urdu poetry based on a starting word or phrase provided by the user. The application uses a Bidirectional LSTM neural network trained on a curated dataset of Roman Urdu poetry.

Features

Custom Poetry Generation: Generate Roman Urdu poetry from any starting word or phrase.
Adjustable Parameters:
- Number of Words: Control the length of generated poetry (12-48 words).
- Creativity (Temperature): Adjust the randomness in word selection (0.5-2.0).
- Focus (Top-p): Fine-tune how closely the model adheres to probable word sequences (0.5-1.0).
Elegant Interface: Dark-themed UI designed specifically for poetry presentation.
Automatic Formatting: Output is automatically formatted into poetic lines.

How to Use

Enter a starting word or phrase in Roman Urdu (e.g., "ishq", "zindagi", "mohabbat").
Adjust the generation parameters:
- Number of Words: Select how many words you want in your poem.
- Creativity: Higher values (>1.0) produce more unique but potentially less coherent poetry. Lower values (<1.0) create more predictable output.
- Focus: Higher values make the AI stick to more probable word combinations.
Click "Generate Poetry" and wait for your custom poem to appear.

Technical Details

Model: Bidirectional LSTM with 3 layers
Tokenization: SentencePiece with BPE encoding
Vocabulary Size: 12,000 tokens
Text Generation: Nucleus (top-p) sampling for balanced creativity and coherence

Installation for Local Development

If you want to run the application locally:

# Clone the repository
git clone https://github.com/yourusername/Mehfil-e-Sukhan.git
cd Mehfil-e-Sukhan

# Create and activate a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate  # On Linux/Mac
# or
venv\Scripts\activate  # On Windows

# Install dependencies
pip install -r requirements.txt

# Run the application
streamlit run app.py

Requirements

Python 3.8+
torch==2.6.0
sentencepiece==0.2.0
huggingface-hub==0.29.3
streamlit==1.43.0

Project Structure

Mehfil-e-Sukhan/
├── app.py              # Main application file
├── requirements.txt    # Python dependencies
└── README.md           # This documentation

The model weights and SentencePiece model are stored on Hugging Face Hub and are downloaded automatically when the application runs.

How It Works

Data Processing: The model was trained on a curated dataset of Roman Urdu poetry lines.
Tokenization: Text was tokenized using SentencePiece's BPE algorithm.
Model Training: A Bidirectional LSTM architecture was trained to predict the next token in a sequence.
Text Generation: At inference time, nucleus sampling is used to select the next word with a balance of creativity and coherence.
Formatting: Generated text is automatically formatted into lines with alternating indentation for aesthetic presentation.

Model and Dataset

Model: You can find the complete model, weights, and training notebooks on Hugging Face: Mehfil-e-Sukhan on Hugging Face
Dataset: The model was trained on the Roman Urdu Poetry dataset available on Kaggle: Roman Urdu Poetry Dataset

Limitations

The current model was trained on a relatively small dataset (~1300 lines), which may occasionally result in repetitive patterns.
Roman Urdu is not standardized, so the model may struggle with unusual spellings or transliterations.
Generation speed depends on available computational resources.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Contact

Acknowledgements

Poetry is the rhythmical creation of beauty in words - Edgar Allan Poe