This repository contains 4 Machine Learning projects that I have done during my college time. The projects are as follows:
This project aims to classify emails as either spam or ham (not spam) using machine learning techniques. The dataset used for this project contains labeled emails, and various text processing and machine learning algorithms are applied to build an effective spam detection model.
The project requires the following dependencies:
- pandas
- numpy
- scikit-learn
You can install these dependencies using the following command:
pip install pandas numpy scikit-learn
The dataset used for this project is the Homework dataset from Kaggle. The dataset contains two columns:
text
- the email content
label
- the label (spam or ham)
label_num
- the numerical representation of the label (0 for ham, 1 for spam)
The dataset is loaded and preprocessed to remove duplicates and handle missing values. The text data is extracted and labeled for further processing.
Various feature extraction techniques are used to convert text data into numerical representations:
- CountVectorizer with different n-gram ranges
- Character-level analysis
- TF-IDF Vectorizer
Several machine learning models are trained and evaluated using GridSearchCV to find the best hyperparameters:
- Support Vector Machine (SVM)
- Multinomial Naive Bayes
- Random Forest
- K-Nearest Neighbors (KNN)
The best model is selected based on the evaluation metrics, and its performance is reported using precision, recall, and F1-score.
This project aims to classify emotions in Urdu speech using machine learning techniques. The dataset used for this project contains labeled audio files, and various audio processing and machine learning algorithms are applied to build an effective emotion detection model.
The project requires the following dependencies:
- pandas
- numpy
- scikit-learn
- librosa
- matplotlib
- seaborn
You can install these dependencies using the following command:
pip install pandas numpy scikit-learn librosa matplotlib seaborn
The dataset used for this project is available on Kaggle. The dataset contains audio files labeled with different emotions in Urdu.
Dataset link
The dataset is loaded and preprocessed to split the data into training and test sets. The audio files are organized into respective directories for further processing.
Various feature extraction techniques are used to convert audio data into numerical representations:
- MFCCs (Mel-frequency cepstral coefficients)
- Mel-frequency spectrogram
Several machine learning models are trained and evaluated using different random states and hyperparameters:
- Support Vector Machine (SVM)
- Multi-layer Perceptron (MLP)
- K-Nearest Neighbors (KNN)
The best model is selected based on the evaluation metrics, and its performance is reported using precision, recall, and F1-score.
This project aims to classify different types of voices using machine learning techniques. The dataset used for this project contains labeled audio files, and various audio processing and machine learning algorithms are applied to build an effective voice classification model.
The project requires the following dependencies:
- numpy
- pandas
- matplotlib
- seaborn
- librosa
- scikit-learn
You can install these dependencies using the following command:
pip install numpy pandas matplotlib seaborn librosa scikit-learn
The dataset used for this project is available on Kaggle. The dataset contains audio files labeled with different types of voices.
Dataset link
Various feature extraction techniques are used to convert audio data into numerical representations:
- MFCCs (Mel-frequency cepstral coefficients)
- Mel-frequency spectrogram
- Chroma features
- Spectral contrast
- Tonnetz
Several machine learning models are trained and evaluated using different random states and hyperparameters:
- Support Vector Machine (SVM)
- Multi-layer Perceptron (MLP)
- K-Nearest
- Random Forest
- Multinomial Naive Bayes
The best model is selected based on the evaluation metrics, and its performance is reported using precision, recall, and F1-score.
The implementation of the voice classification project is available in the Jupyter notebook: Voice classification.ipynb
This project focuses on iris recognition using machine learning techniques. The goal is to accurately identify individuals based on their iris patterns.
- numpy
- pandas
- os
- cv2
- matplotlib
The dataset used for this project can be found on Kaggle: Iris Recognition Dataset
The following features are extracted from the iris images:
- Mean
- Standard Deviation
- Skewness
- Kurtosis
- Entropy
Several machine learning models are trained and evaluated using different random states and hyperparameters:
- Support Vector Machine (SVM)
- Multi-layer Perceptron (MLP)
- K-Nearest Neighbors (KNN)
- Random Forest
- Multinomial Naive Bayes
- Click on "Browse files" to upload an image of an iris.
- After uploading the image click on "Start the identification process", the model will predict whether the image is of an iris or not.
If the image is not of an iris, it will display "Image uploaded is invalid".
-
If the image is of an iris, the model will search the database for a match and display the result.
-
If no match is found, it will display "No matches from the dataset, user does not exist".
- If a match is found, it will display the user ID and the image of the matched iris.
The best model is selected based on the evaluation metrics, and its performance is reported using precision, recall, and F1-score.
The implementation of the iris recognition project is available in the Jupyter notebook: ei-notebook.ipynb
These projects demonstrate the application of machine learning techniques to solve real-world problems. The projects cover a wide range of topics, including text classification, audio processing, and biometric recognition. By working on these projects, I have gained valuable experience in data preprocessing, feature extraction, model training, and evaluation. I hope these projects inspire you to explore the exciting field of machine learning and develop your own innovative solutions.
These were group projects, and the authors are mentioned in each project's respective notebook.