Through the summer of 2024, I had the amazing opportunity to be a part of Cornell's Break Through Tech AI program where I learned machine learning fundamentals and tools to tackle real-world problems. I used technologies like Python and Jupyter Notebook to research datasets and build AI models.
The AI models and datasets I used over the 10 topics I learned can be found in this repository.
In this topic, I learned about the business behind ML and what kinds of problems it can solve with the right datasets.
I also learned the basics of important machine learning libraries in Python like NumPy and Pandas. I used these libraries to manipulate dataframes and read large datasets.
In this topic, I learned how to prepare and clean large datasets, so I could use them to train AI models.
I used NumPy and Pandas to remove outliers, handle cases of missing data, perform one-hot encoding to categorical features, and analyze correlations between the features and the label.
In this topic, I learned what Decision Trees (DTs) and K-Nearest Neighbors (KNNs) are as well as how they work and how to tune their hyperparameters.
I used the Scikit-Learn library in Python to train multiple DTs and KNNs with different hyperparameters and compared their accuracy on the same set of data. I found that DTs tend to be more accurate.
In this topic, I learned about Logistic Regression and how to optimize them to maximize the accuracy score.
I used the Scikit-Learn library to train a Logistic Regression model while tuning its hyperparameters to achieve the highest accuracy score possible.
In this topic, I learned how to efficiently find the best values for the hyperparameters for models to ensure their accuracy in predictions. Then, I learned how to evaluate a model and refine it before re-evaluation and deployment.
I trained a KNN model and used a Grid Search technique to find the best hyperparameter values. Then, I used this technique on a Logistic Regression model before evaluating it.
In this topic, I learn what unsupervised ML was and how to implement it using clustering. I also learned about Ensemble Models which provide higher accuracy in predictions.
I used a KMeans Clustering model to group examples in a dataset together. I also used different Ensemble Models like Stacking, Random Forests, and Gradient Boosted Decision Trees to make accuracte predictions on data.
In this topic, I learned what Neural Networks are and how they work. I also learned how Convulational Neural Networks work to process images.
I used the Keras library in Python to build a both a normal neural network and a convulational neural network to make predictions.
In this topic, I learned about Natural Language Processing which can read text and derive the sentient behind it.
I used the Keras library to build a neural network to read the sentiment in pieces of text.
For my final project, I chose to create a KMeans Model, which is an unsupervised model, to group together examples within a dataset to see if the groups closely aligned with their correct label.