Skip to content

yotamfre/Sleep-Stage-Challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sleep Stages Challenge

This project stemmed from a Kaggle competition hosted by Child Mind Institute. The purpose of the project is to create an ML model that is capable of detecting onset and wakeup time using accelerometer data.

Installation and Setup Instructions

Dependencies:

Clone down this repository. There are multiple Python libraries used throughout the code all of which need to be installed. The libraries are:

  • PyTorch
  • NumPy
  • matplotlib
  • pandas

Data:

Unfortunately, the data files are too large for a normal repository. Also, I am not sure I am allowed to include them in a public repository. For those reasons, the data must be installed separately.

You can download the data directly from Kaggle here. You can download the data in any directory.

Once you have the data downloaded, create a new folder and name it data_path and inside of it, create a new Python file and name it data_path.py. Inside the newly created Python file, create a string named dataPath and assign to it the data path of the folder containing your data.

Other needed directories:

The last step is to create a folder named batched_data which is where your tensors of batched data will be downloaded after running the code.

Running the code:

The only 2 files that need to be run for the model to work are:

  • batching.ipynb
  • resize_batched_data.ipynb
  • training.ipynb These are all located in the essential files folder. You should run all cells in each file in the order of the files shown above.

If you would like to see some of the data visualizations used in the development you are free to run files in the supplementary files folder.

Development process and scope:

Data Manipulation and Visualization:

1The first step was to better understand the data. After loading the files onto dataframes using Pandas, the tables looked like this:

image

image

The first table is containing the features while the the second table contains the labels. The first step was to merge the 2 tables over step, timestamp, and series_ID using a left merge. For development purposes, sample_data can be set to True which would perform the merge for only 10 different people's data. After that, the following 2graphs were created using the 30 minute intervals around an event:

image

image

These figures demonstrate that there is a significant change during this interval around the event.

To fit the data into a model, it had to be cleaned first. First, the beginning and end of each different individual measurement were cut off so that every person had data that could be divided into a whole number of series. Then, Labels had to be made. Because the given dataset originated from sleep logs, the exact timestamp of the event is unreliable. Because of this, it would be difficult to construct a model that predicts the exact time when an event occurred. Instead, it would be wiser to create a model that predicts whether or not the person is awake for any given timestamp. It was chosen that the labels will have the same dimension as the series length with 0's representing an awake state while 1's representing a state of sleep.

The Model:

The model chosen was a Long Short-Term Memory model. This is because LSTMs have a design well suited for time-series problems like this. However, 24 hours worth of data is too much context for an LSTM to learn on in this case. Because of this it was decided to use a period of 1 hour around a label with the label being at a random spot throughout the series.

image

This approach worked because after training the model the following results were achieved:

image

image

image

Footnotes

  1. To understand the structure of the dataset, click here.

  2. These graphs can be seen in graphs.ipynb.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •