Skip to content

Integrated oversampling

mfrdixon edited this page Mar 14, 2017 · 23 revisions

Background

Sequence classification problems are ubiquitous and arise when the data exhibits a spatial-temporal structure. Examples include predicting traffic, earthquake prediction and even predicting the result from auctioning systems such as those in the financial markets. Recurrent Neural networks, such as Long Short-Term Memory (LSTM) networks are well suited to these types of problems. Oftentimes, however, the sequence is strongly imbalanced and the challenge is how to sample the training set while preserving the temporal structure. Integrated sampling provides a solution to this problem.

Related work

Hong C., Xiao-Li L., Yew-Kwong W. and See-Kiong Ng, D. (2013) Integrated Oversampling for Imbalanced Time Series Classification, IEEE Transactions on Knowledge and Data Engineering, vol 25 (12). Liang G., Zhang C. (2012) A Comparative Study of Sampling Methods and Algorithms for Imbalanced Time Series Classification. In: Thielscher M., Zhang D. (eds) AI 2012: Advances in Artificial Intelligence. AI 2012. Lecture Notes in Computer Science, vol 7691. Springer, Berlin, Heidelberg

Details of this project

The goal of this project will be to implement, assess and refine the method of integrated sampling. The technique shall be demonstrated with LSTMs applied to various imbalanced time series data sets including, traffic prediction and high frequency trading.

Expected impact

An integrated oversampling package will support the application of LSTMs and other RNNs to real world time series problems plagued by class imbalance.

Mentors

Please contact Matthew Dixon if you are a student interested in this project.

Tests

TBA.

Clone this wiki locally