Paraphraser.io

Overview

This project implements a paraphrase generation model using the T5 transformer. The model is trained on the PAWS dataset, consisting of over 49,000 labeled sentence pairs, leveraging transfer learning to enhance text generation accuracy and fluency. The pipeline includes data preprocessing, model fine-tuning, and performance evaluation, ensuring optimal training efficiency while preserving generalization.

Features

Paraphrase Generation: Generates diverse and fluent paraphrases for input sentences.
Fine-tuned T5 Model: Trained using the PAWS dataset for high-quality paraphrasing.
Optimized Training Process: Achieved a 20% reduction in fine-tuning time.
NLP Pipeline: Covers data preprocessing, model training, optimization, and evaluation.
Framework Compatibility: Works seamlessly with state-of-the-art NLP libraries.

Technologies Used

Python
PyTorch
Hugging Face Transformers
PAWS Dataset
Google Colab / Jupyter Notebook

Dataset

The PAWS (Paraphrase Adversaries from Word Scrambling) dataset is used for training, which consists of:

49,000+ labeled sentence pairs.
Designed to improve paraphrase generation by reducing lexical overlap biases.

Performance Optimization

Subset Sampling: Used a 3,600-sample subset for initial fine-tuning to ensure efficient training while maintaining generalization.
Accelerated Fine-Tuning: Achieved a 20% reduction in training time by optimizing model parameters.

Future Enhancements

Experiment with larger T5 variants (T5-base, T5-large) for improved performance.
Implement beam search and top-k sampling for diverse paraphrase generation.
Fine-tune on additional datasets for domain-specific applications.

Contributing

Contributions are welcome! Feel free to open issues or submit pull requests.

License

This project is licensed under the MIT License.

Acknowledgments

Google Research for the PAWS dataset.
Hugging Face for the Transformers library.
PyTorch for providing an efficient deep learning framework.

📬 Contact

📧 Email: utkarshranaa06@gmail.com
🔗 GitHub: utkarshranaa
🔗 LinkedIn: www.linkedin.com/in/utkarshranaa
🔗 X/Twitter: @utkarshranaa

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
Paraphrase_Generator.ipynb		Paraphrase_Generator.ipynb
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Paraphraser.io

Overview

Features

Technologies Used

Dataset

Performance Optimization

Future Enhancements

Contributing

License

Acknowledgments

📬 Contact

About

Uh oh!

Releases

Packages

Languages

License

utkarshranaa/Paraphraser.io

Folders and files

Latest commit

History

Repository files navigation

Paraphraser.io

Overview

Features

Technologies Used

Dataset

Performance Optimization

Future Enhancements

Contributing

License

Acknowledgments

📬 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages