This library helps with detecting anomalies in a dataset with Apple Watch data. To detect anomalies, we use reduce dimensionality using Principal Component Analysis, then use Mahalanobis distance to detect the outliers, and finally use standard deviation to calculate the threshold. This library uses an SQL database to store data.
The library consists of 3 files:
- SQL_Interface.py: Helps with creating a connection to a test database. In production, replace with actual database that will be used. Returns a connection and a cursor to execute the queries.
- XML_to_SQL.py: Reads multiple Apple Health kit exports (zip files), extracts the required file in-memory, and adds the user data to the database
- Healthkit.py: The main crux of the project; it reads in the data from the database and figures out the metric for anomaly detection.
This library was developed for Merck in collaboration with Purdue Data mine.
Simply clone the current branch in the git repo:
git clone -b anomaly_detection https://github.com/ParadoxicalNerd/datamine-merck-biometrics-ds.git
Then install the requirements
pip install -r requirements.txt
Change the dataset path to point to a folder with a structure like this (the number in the parenthesis will be the assigned user id):
dataset
├── export (0).zip
├── export (1).zip
├── export (2).zip
└── export (3).zip
An example implementation of the library can be seen in example.py.
Note: You need a web browser to see the 3d plot generated by example.py for your data. An example of the plot generated can be seen in scatter_plot.html
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update README as appropriate.
Anomaly detection overview: Describes the overview of the procedure needed to conduct anomaly detection
Anomaly detection code: Provides a nice overview of the code we need to conduct anomaly detection
PCA: View unsupervised Learning chapter in “Introduction to Machine Learning with Python: A Guide for Data Scientists” by O'Riley
Mahalanobis Distance Math: Explains the math behind Mahalanobis distance and why we use it
Mahalanobis Distance SciPy:This resource talks about the SciPy Mahalanobis distance module
Pankaj Meghani — meghanipankaj5@gmail.com