This is the Pytorch implementation for the empirical evaluation of data augmentation methods for wearable data.
🚩News(May 06, 2022): Quantitatively measure the effectiveness of Data Augmentation methods.
🚩News(April 06, 2022): Implement the Differentiable Automatic Data Augmentation method.
🚩News(Mar 22, 2022): Implement the RandAugment method.
🚩News(Mar 09, 2022): Edit the Readme file, and upload the project to Github
🚩News(Mar 08, 2022): Clean, rebuild the whole project, and all the backbone and DA methods have passed the test example.
- Python 3.7
- numpy
- Pytorch == 1.7 and later
- scikit_learn
The data used in this project comes from two main sources:
- the UCR Time Series Archive, which contains 128 univariate time series datasets (including ECG data)
- The PhysioNet, which contains various physiological datasets.
Figure 1. Examples of different data augmentation methods.
To run a model on one dataset, you should issue the following command:
python ./main.py --jitter --data_dir=./data/ --train_data_file=data.tsv --test_data_file=data.tsv --model_weights_dir=./experiments --log_dir=./logs --num_epoches=50 --lr=0.001 --decay_rate=0.9 --step_size=5 --batch_size=100 --cuda_device=cuda:0 --model=resnet-1d --input_length=750 --input_channels=1 --num_classes=42
The detailed descriptions about the arguments are as following:
Parameter name | Description of parameter |
---|---|
jitter | using jitter augmentation method, other options include scaling , permutation ,rotation ,magwarp ,timewarp ,windowslice ,windowwarp |
data_dir | folder to the data. |
train_data_file | filename of the training data |
test_data_file | filename of the testing data |
model_weights_dir | folder to save model parameters |
log_dir | folder to save log file |
num_epoches | training epoches |
lr | learning rate |
decay_rate | decay rate for learning rate |
step_size | learning rate will be decay by every step_size epoches |
batch_size | batch size |
cuda_device | the name of cuda device |
model | select one backbone model to use, including "mlp", "conv-1d", "resnet-1d" |
input_length | the length of the input sequence |
input_channels | the number of channels for the time series data |
num_classes | the number of classes |
Table 1. Performance in terms of 8 different DA methods and three backbones (MLP, Conv-1D and ResNet-1D) are reported on night datasets. Bold numbers indicate the best performance.Examples of different data augmentation methods. The top three most effective DA methods are colored in gray, and the grayscale represents the corresponding improvements.
Here are some related papers and Github projects.
- Data Augmentation of Wearable Sensor Data for Parkinson’s Disease Monitoring using Convolutional Neural Networks (2017)[Link]
- Data Augmentation for Time Series Classification using Convolutional Neural Networks (2016)[Link]
- Data Augmentation with Suboptimal Warping for Time-Series Classification (2020)[Link]
- Time Series Data Augmentation for Deep Learning: A Survey (IJCAI-2021)[Link]
- Role of Data Augmentation Strategies in Knowledge Distillation for Wearable Sensor Data (2022)[Link]
- Transformer Networks for Data Augmentation of Human Physical Activity Recognition (2021)[Link]
- ActivityGAN: generative adversarial networks for data augmentation in sensor-based human activity recognition (2020)[Link]
To be continued.
None.
We would like to thank the dataset providers of the UCR Time Series Archive and the PhysioNet. Part of the code credits to github project Time Series Augmentation and Deep Learning for Time Series Classification