This Python script implements a basic K-Nearest Neighbors (KNN) classifier using the famous Iris dataset. The script performs the following steps:
- Load the Dataset: The Iris dataset is loaded using Pandas, and the data is split into training and testing sets.
- Data Preprocessing: The dataset is divided randomly into training and testing sets. Each item is then numbered for easy tracking.
- Distance Calculation: The Euclidean distance is calculated between the testing data and the training data for each test sample.
- K-Nearest Neighbors: The script finds the nearest 10 neighbors for each test sample and performs classification based on the majority vote.
- Accuracy Calculation: The accuracy of the classifier is determined by comparing the predicted labels with the actual labels from the testing set.
- Python 3.x
- Pandas library
- Math and Statistics libraries (both are standard Python libraries)
Iris.csv
: Contains the Iris dataset with the following columns: Sepal Length, Sepal Width, Petal Length, Petal Width, and Class (label).
- Data Loading: The dataset is read from
Iris.csv
usingpandas.read_csv()
. - Training and Testing Split: The data is split into a random 100-point training set and the rest as a testing set.
- Distance Calculation: The script calculates the Euclidean distance between each test sample and all training samples.
- Find Nearest Neighbors: The nearest 10 neighbors for each test sample are identified.
- Label Prediction: The class label for each test sample is predicted based on the majority vote from its nearest neighbors.
- Accuracy: The percentage of correctly predicted labels is printed as the final result.
- Ensure you have Python installed along with the required libraries.
- Place the
Iris.csv
file in the same directory as the Python script. - Run the script by executing
python knn_classifier.py
in the terminal. - The script will output the accuracy of the classifier based on the K-Nearest Neighbors algorithm.