A deep learning model that generates Roman Urdu poetry using LSTM neural networks. This project includes both the model training pipeline and a user-friendly web interface for generating poetry.
- Overview
- Features
- Demo
- Project Structure
- Installation
- Usage
- Model Architecture
- Training Process
- Results
- Future Improvements
- Contributing
- License
This project uses deep learning to generate Urdu poetry in Roman script. It employs a sequence-to-sequence LSTM model trained on a dataset of Roman Urdu poetry. The model learns patterns, rhythms, and language structures from existing poetry to generate new, creative verses.
- Poetry Generation: Generate Urdu poetry in Roman script from seed words or phrases
- Customizable Output: Control the length of generated poetry
- User-friendly Interface: Easy-to-use Streamlit web application
- Text-to-Speech: Listen to the generated poetry (browser-based)
- Temperature Control: Adjust the creativity/randomness of the generated text
To use the poetry generator:
- Run the Streamlit app using
streamlit run app.py
- Enter a seed word or phrase (e.g., "Muhabbat" for love)
- Adjust the number of words to generate
- Click "Generate Poetry"
- View and listen to the generated verses
Poetry-Generation-Model/
├── app.py # Streamlit web application
├── model-training.ipynb # Jupyter notebook with model training code
├── requirements.txt # Python dependencies
├── roman_urdu_poetry_model.keras # Trained model file
├── tokenizer.pkl # Tokenizer for text processing
└── README.md # This documentation file
-
Clone the repository:
git clone https://github.com/yourusername/Poetry-Generation-Model.git cd Poetry-Generation-Model
-
Install dependencies:
pip install -r requirements.txt
streamlit run app.py
This will start the Streamlit server and open the application in your default web browser.
You can also use the model directly in your Python code:
import tensorflow as tf
import pickle
import numpy as np
# Load model and tokenizer
model = tf.keras.models.load_model("roman_urdu_poetry_model.keras",
custom_objects={"perplexity": perplexity})
with open("tokenizer.pkl", "rb") as handle:
tokenizer = pickle.load(handle)
# Generate poetry function (see app.py for implementation details)
# ...
The poetry generation model uses a sequence-to-sequence architecture with:
- Embedding layer (128 dimensions)
- Stacked LSTM layers (256 units each)
- Dropout for regularization (0.3)
- Dense output layer with softmax activation
The model is designed to predict the next word in a sequence, which allows it to generate poetry one word at a time.
The model was trained using:
- A dataset of Roman Urdu poetry
- Text preprocessing including normalization and tokenization
- Sequence generation for training data
- Adam optimizer with learning rate scheduling
- Early stopping and learning rate reduction to prevent overfitting
- Custom perplexity metric for evaluation
The training process is documented in detail in the model-training.ipynb
notebook.
The model achieves:
- Training perplexity: ~330
- Validation perplexity: ~3400
While the validation perplexity is high, the model still produces coherent and creative poetry. The high perplexity is expected due to the creative nature of poetry generation.
Potential enhancements for the project:
- Incorporate attention mechanisms for better context awareness
- Implement beam search for improved text generation
- Add more poetry styles and formats
- Improve the UI with additional customization options
- Expand the training dataset for better generalization
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Created with ❤️ for Urdu poetry and machine learning