This repository contains Python scripts for text processing and generation tasks. The scripts leverage various techniques such as data fetching, text preprocessing, and text generation using machine learning models.
-
Web Text Craper: This script demonstrates web scraping using BeautifulSoup to extract unique words from a Wikipedia page on machine learning.
-
Text Generation and TF-IDF Analysis: The script showcases text preprocessing techniques such as tokenization, lemmatization, and TF-IDF computation using NLTK and scikit-learn libraries.
-
FastText Analysis on Yelp Dataset: Utilizing NLTK and Gensim, this script preprocesses text data from the Yelp dataset and trains a FastText model for text classification.
-
CNN Text Classification on Sentiment Analysis Dataset: The script reads a dataset using Dask and trains a CNN model for sentiment analysis on the Twitter Sentiment Analysis dataset.
-
Wikipedia Text Processing using RNN: This script fetches content from Wikipedia, preprocesses it, and trains SimpleRNN models for character-based and word-based text generation.