Skip to content

Commit 616e424

Browse files
committed
Initial project setup and refactor with KNN implementation
- Implemented K-Nearest Neighbors (KNN) classifier in `k_nearest_neighbors.py`. - Added supporting modules: `cross_validator.py`, `data_loader.py`, `preprocessor.py`. - Refactored project structure and moved source files to `src/knn/` and `src/app/`. - Added comprehensive test suite with sample data in `tests/data/sample_iris.csv`. - Updated UML diagrams: class, system interaction, and sequence diagrams. - Updated README with project details, instructions, and PyPI installation link. - Created distribution packages and added `setup.py` for project installation. - Miscellaneous fixes and improvements, including `.gitignore` and `pytest.ini` setup.
0 parents  commit 616e424

31 files changed

+1339
-0
lines changed

.gitignore

Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# C extensions
7+
*.so
8+
9+
# Distribution / packaging
10+
.Python
11+
build/
12+
develop-eggs/
13+
dist/
14+
downloads/
15+
eggs/
16+
.eggs/
17+
lib/
18+
lib64/
19+
parts/
20+
sdist/
21+
var/
22+
wheels/
23+
share/python-wheels/
24+
*.egg-info/
25+
.installed.cfg
26+
*.egg
27+
MANIFEST
28+
29+
# PyInstaller
30+
# Usually these files are written by a python script from a template
31+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
32+
*.manifest
33+
*.spec
34+
35+
# Installer logs
36+
pip-log.txt
37+
pip-delete-this-directory.txt
38+
39+
# Unit test / coverage reports
40+
htmlcov/
41+
.tox/
42+
.nox/
43+
.coverage
44+
.coverage.*
45+
.cache
46+
nosetests.xml
47+
coverage.xml
48+
*.cover
49+
*.py,cover
50+
.hypothesis/
51+
.pytest_cache/
52+
cover/
53+
54+
# Translations
55+
*.mo
56+
*.pot
57+
58+
# Django stuff:
59+
*.log
60+
local_settings.py
61+
db.sqlite3
62+
db.sqlite3-journal
63+
64+
# Flask stuff:
65+
instance/
66+
.webassets-cache
67+
68+
# Scrapy stuff:
69+
.scrapy
70+
71+
# Sphinx documentation
72+
docs/_build/
73+
74+
# PyBuilder
75+
.pybuilder/
76+
target/
77+
78+
# Jupyter Notebook
79+
.ipynb_checkpoints
80+
81+
# IPython
82+
profile_default/
83+
ipython_config.py
84+
85+
# pyenv
86+
# For a library or package, you might want to ignore these files since the code is
87+
# intended to run in multiple environments; otherwise, check them in:
88+
# .python-version
89+
90+
# pipenv
91+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
92+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
93+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
94+
# install all needed dependencies.
95+
#Pipfile.lock
96+
97+
# poetry
98+
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
99+
# This is especially recommended for binary packages to ensure reproducibility, and is more
100+
# commonly ignored for libraries.
101+
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
102+
#poetry.lock
103+
104+
# pdm
105+
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
106+
#pdm.lock
107+
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
108+
# in version control.
109+
# https://pdm.fming.dev/#use-with-ide
110+
.pdm.toml
111+
112+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
113+
__pypackages__/
114+
115+
# Celery stuff
116+
celerybeat-schedule
117+
celerybeat.pid
118+
119+
# SageMath parsed files
120+
*.sage.py
121+
122+
# Environments
123+
.env
124+
.venv
125+
env/
126+
venv/
127+
ENV/
128+
env.bak/
129+
venv.bak/
130+
venv_iris_knn/
131+
venv*/
132+
133+
# Spyder project settings
134+
.spyderproject
135+
.spyproject
136+
137+
# Rope project settings
138+
.ropeproject
139+
140+
# mkdocs documentation
141+
/site
142+
143+
# mypy
144+
.mypy_cache/
145+
.dmypy.json
146+
dmypy.json
147+
148+
# Pyre type checker
149+
.pyre/
150+
151+
# pytype static type analyzer
152+
.pytype/
153+
154+
# Cython debug symbols
155+
cython_debug/
156+
157+
# PyCharm
158+
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
159+
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
160+
# and can be added to the global gitignore or merged into this file. For a more nuclear
161+
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
162+
#.idea/
163+
.vscode/
164+
*.code-workspace

.python-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.9.0

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2023 Scott Miner
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

MANIFEST.in

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
include README.md
2+
include LICENSE
3+
include requirements.txt
4+
include setup.py
5+
recursive-include src *
6+
recursive-include data *
7+
exclude .gitignore
8+
global-exclude __pycache__/*
9+
global-exclude *.pyc
10+
global-exclude *.egg-info
11+
global-exclude *.egg-info/*

README.md

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
# Iris Flowers Classification with k-Nearest Neighbors
2+
3+
This repository contains a Python package that implements the k-Nearest Neighbors (k-NN) algorithm for classifying Iris flowers into three species: setosa, versicolor, and virginica. The package uses the Iris dataset, which consists of 150 samples with 4 features each: sepal length, sepal width, petal length, and petal width.
4+
5+
The k-NN model is trained on the Iris dataset using 5-fold cross-validation and 10 neighbors, which can be changed in the script. After training, the model allows you to input your own values for sepal length, sepal width, petal length, and petal width. The model will then predict the flower category (setosa, versicolor, or virginica) based on the input provided by the user.
6+
7+
## Installation
8+
9+
You can install the package directly from [PyPI](https://pypi.org/project/IrisKNNClassifier/1.0.0/) using `pip`:
10+
11+
```sh
12+
pip install IrisKNNClassifier
13+
```
14+
15+
Alternatively, you can install it from the source distribution:
16+
17+
1. Download the package from the repository or from PyPI.
18+
2. Extract the contents of the `.tar.gz` file.
19+
3. Navigate to the directory containing `setup.py`.
20+
4. Run the following command:
21+
22+
```sh
23+
pip install .
24+
```
25+
26+
## Running the Script
27+
28+
After installing the package, you can run the classifier directly:
29+
30+
```sh
31+
iris-classifier
32+
```
33+
34+
## Class Diagram
35+
36+
Below is an overview of the classes and their interactions in the script:
37+
38+
![Class Diagram](./docs/output/Iris%20Classifier%20System%20%20Class%20Diagram.png)
39+
40+
* **KNearestNeighbors**: This class is responsible for the k-NN algorithm implementation, including training the model and making predictions.
41+
42+
* **DataLoader**: This class loads the Iris dataset from a file.
43+
44+
* **Preprocessor**: This class normalizes the dataset to ensure that all features are on the same scale, which is important for distance-based algorithms like k-NN, converts string values to float, and converts class labels to integers.
45+
46+
* **CrossValidator**: This class evaluates the k-NN model using cross-validation. It splits the dataset into a specified number of folds and computes the accuracy for each fold. It then returns a list of accuracy scores.
47+
48+
The `DataLoader` and `Preprocessor` classes provide the data and preprocessing needed for the `KNearestNeighbors` class. The `CrossValidator` class evaluates the k-NN model using the provided data.
49+
50+
To customize the script, you can modify the `n_folds` and `num_neighbors` variables, which represent the number of cross-validation folds and the number of neighbors in the k-NN algorithm, respectively.
51+
52+
To get started, follow the instructions in the "Getting Started" section of the README. Make sure to install the required libraries using the provided `requirements.txt` file. After running the script, you can input your own values for sepal length, sepal width, petal length, and petal width to see the model's predictions.
53+
54+
## Getting Started
55+
56+
These instructions will guide you on how to run the script on your local machine.
57+
58+
### Running the Script
59+
60+
1. Clone this repository to your local machine:
61+
62+
```sh
63+
git clone https://github.com/sminerport/iris-knn-classifier.git
64+
```
65+
66+
2. Navigate to the repository's directory:
67+
68+
```sh
69+
cd iris-knn-classifier
70+
```
71+
72+
3. Run the script:
73+
74+
```sh
75+
python src/app/iris_classifier_app.py
76+
```
77+
78+
The script will train the k-NN model on the Iris dataset, using 5-fold cross-validation and 10 neighbors (you can change these values). It will print the accuracy for each fold and the mean accuracy.
79+
80+
![Data Loading Sequence Diagram](./docs/output/Sequence%20Diagram%20%20Data%20Loading%20and%20Preprocessing.png)
81+
82+
After training the model, you can input your own values for sepal length, sepal width, petal length, and petal width. The model will predict the flower category (setosa, versicolor, or virginica) based on the input provided by the user.
83+
84+
![Prediction Sequence Diagram](./docs/output/Sequence%20Diagram%20%20Prediction.png)
85+
86+
![Program Interface](./images/IrisFlowerClassifier.png)
87+
88+
![Prediction Result](./images/Prediction.png)
89+
90+
## Customizing the Script
91+
92+
You can customize the number of cross-validation folds and the number of neighbors in the k-NN algorithm by modifying the `n_folds` and `num_neighbors` variables in the script.
93+
94+
## License
95+
96+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
97+
98+
## Acknowledgments
99+
100+
* The Iris dataset was introduced by the British statistician and biologist Ronald Fisher in his 1936 paper "The use of multiple measurements in taxonomic problems."
101+
* The k-Nearest Neighbors algorithm is a simple yet powerful classification technique, particularly suitable for problems with small datasets and relatively low-dimensional feature spaces.

0 commit comments

Comments
 (0)