Mehfil-e-Sukhan / README.md
zaiffi's picture
Add model and dataset links to README
4354715

A newer version of the Streamlit SDK is available: 1.43.2

Upgrade
metadata
title: Mehfil-e-Sukhan
emoji: πŸ“œ
colorFrom: red
colorTo: gray
sdk: streamlit
sdk_version: 1.43.0
app_file: app.py
pinned: false

Mehfil-e-Sukhan: Har Lafz Ek Mehfil

An AI-powered Roman Urdu poetry generation application using BiLSTM neural networks.

Overview

Mehfil-e-Sukhan ("Poetry Gathering" in Urdu) is an interactive application that generates Roman Urdu poetry based on a starting word or phrase provided by the user. The application uses a Bidirectional LSTM neural network trained on a curated dataset of Roman Urdu poetry.

Features

  • Custom Poetry Generation: Generate Roman Urdu poetry from any starting word or phrase.
  • Adjustable Parameters:
    • Number of Words: Control the length of generated poetry (12-48 words).
    • Creativity (Temperature): Adjust the randomness in word selection (0.5-2.0).
    • Focus (Top-p): Fine-tune how closely the model adheres to probable word sequences (0.5-1.0).
  • Elegant Interface: Dark-themed UI designed specifically for poetry presentation.
  • Automatic Formatting: Output is automatically formatted into poetic lines.

How to Use

  1. Enter a starting word or phrase in Roman Urdu (e.g., "ishq", "zindagi", "mohabbat").
  2. Adjust the generation parameters:
    • Number of Words: Select how many words you want in your poem.
    • Creativity: Higher values (>1.0) produce more unique but potentially less coherent poetry. Lower values (<1.0) create more predictable output.
    • Focus: Higher values make the AI stick to more probable word combinations.
  3. Click "Generate Poetry" and wait for your custom poem to appear.

Technical Details

  • Model: Bidirectional LSTM with 3 layers
  • Tokenization: SentencePiece with BPE encoding
  • Vocabulary Size: 12,000 tokens
  • Text Generation: Nucleus (top-p) sampling for balanced creativity and coherence

Installation for Local Development

If you want to run the application locally:

# Clone the repository
git clone https://github.com/yourusername/Mehfil-e-Sukhan.git
cd Mehfil-e-Sukhan

# Create and activate a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate  # On Linux/Mac
# or
venv\Scripts\activate  # On Windows

# Install dependencies
pip install -r requirements.txt

# Run the application
streamlit run app.py

Requirements

  • Python 3.8+
  • torch==2.6.0
  • sentencepiece==0.2.0
  • huggingface-hub==0.29.3
  • streamlit==1.43.0

Project Structure

Mehfil-e-Sukhan/
β”œβ”€β”€ app.py              # Main application file
β”œβ”€β”€ requirements.txt    # Python dependencies
└── README.md           # This documentation

The model weights and SentencePiece model are stored on Hugging Face Hub and are downloaded automatically when the application runs.

How It Works

  1. Data Processing: The model was trained on a curated dataset of Roman Urdu poetry lines.
  2. Tokenization: Text was tokenized using SentencePiece's BPE algorithm.
  3. Model Training: A Bidirectional LSTM architecture was trained to predict the next token in a sequence.
  4. Text Generation: At inference time, nucleus sampling is used to select the next word with a balance of creativity and coherence.
  5. Formatting: Generated text is automatically formatted into lines with alternating indentation for aesthetic presentation.

Model and Dataset

Limitations

  • The current model was trained on a relatively small dataset (~1300 lines), which may occasionally result in repetitive patterns.
  • Roman Urdu is not standardized, so the model may struggle with unusual spellings or transliterations.
  • Generation speed depends on available computational resources.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Contact

Acknowledgements

  • Poetry is the rhythmical creation of beauty in words - Edgar Allan Poe