This project implements a Bidirectional LSTM (BiLSTM) model for text classification using TensorFlow/Keras. The notebook includes data preprocessing, tokenization, sequence padding, model training, and evaluation. The dataset consists of labeled text data, and the model is trained to classify text into different categories.
- Text Preprocessing: Cleans text by removing URLs, HTML tags, emojis, and punctuation
- Tokenization & Padding: Converts text to numerical sequences using
Tokenizer
andpad_sequences
- Deep Learning Model: Uses a Bidirectional LSTM with an embedding layer for classification
- Evaluation Metrics: Computes accuracy, precision, recall, and F1-score
- Embedding Layer: Converts words into dense vectors
- BiLSTM Layers: Captures context from both past and future words
- Dropout Layers: Prevents overfitting
*Fully Connected Layers: Outputs final classification probabilities
Install the necessary dependencies using:
pip install pandas numpy scikit-learn tensorflow keras nltk emoji joblib torch
Name | Github Profile |
---|---|
Jayanth Srinivas Bommisetty | GitHub profile |
Sarvan Dattu Perumalla | GitHub profile |
- Clone the repository or download the
sarcasm&irony.ipynb
file. - Open the Jupyter Notebook and execute cells sequentially.
- The model will preprocess the dataset, train using BiLSTM, and evaluate performance.
- Modify hyperparameters such as the LSTM units, dropout rates, and learning rate for experimentation.
The model is evaluated using categorical cross-entropy loss and accuracy. It also computes precision, recall, and F1-score for performance analysis.
Contributions are welcome! Feel free to add enhancements or optimize the existing model.