Skip to content

Paraphraser.io is a T5 transformer-based paraphrase generation model fine-tuned on the PAWS dataset for fluent and diverse text rewriting. It optimizes training efficiency while maintaining high-quality

License

Notifications You must be signed in to change notification settings

utkarshranaa/Paraphraser.io

Repository files navigation

Paraphraser.io

Overview

This project implements a paraphrase generation model using the T5 transformer. The model is trained on the PAWS dataset, consisting of over 49,000 labeled sentence pairs, leveraging transfer learning to enhance text generation accuracy and fluency. The pipeline includes data preprocessing, model fine-tuning, and performance evaluation, ensuring optimal training efficiency while preserving generalization.

Features

  • Paraphrase Generation: Generates diverse and fluent paraphrases for input sentences.
  • Fine-tuned T5 Model: Trained using the PAWS dataset for high-quality paraphrasing.
  • Optimized Training Process: Achieved a 20% reduction in fine-tuning time.
  • NLP Pipeline: Covers data preprocessing, model training, optimization, and evaluation.
  • Framework Compatibility: Works seamlessly with state-of-the-art NLP libraries.

Technologies Used

  • Python
  • PyTorch
  • Hugging Face Transformers
  • PAWS Dataset
  • Google Colab / Jupyter Notebook

Dataset

The PAWS (Paraphrase Adversaries from Word Scrambling) dataset is used for training, which consists of:

  • 49,000+ labeled sentence pairs.
  • Designed to improve paraphrase generation by reducing lexical overlap biases.

Performance Optimization

  • Subset Sampling: Used a 3,600-sample subset for initial fine-tuning to ensure efficient training while maintaining generalization.
  • Accelerated Fine-Tuning: Achieved a 20% reduction in training time by optimizing model parameters.

Future Enhancements

  • Experiment with larger T5 variants (T5-base, T5-large) for improved performance.
  • Implement beam search and top-k sampling for diverse paraphrase generation.
  • Fine-tune on additional datasets for domain-specific applications.

Contributing

Contributions are welcome! Feel free to open issues or submit pull requests.

License

This project is licensed under the MIT License.

Acknowledgments

  • Google Research for the PAWS dataset.
  • Hugging Face for the Transformers library.
  • PyTorch for providing an efficient deep learning framework.

📬 Contact

📧 Email: utkarshranaa06@gmail.com
🔗 GitHub: utkarshranaa
🔗 LinkedIn: www.linkedin.com/in/utkarshranaa
🔗 X/Twitter: @utkarshranaa

About

Paraphraser.io is a T5 transformer-based paraphrase generation model fine-tuned on the PAWS dataset for fluent and diverse text rewriting. It optimizes training efficiency while maintaining high-quality

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published