Skip to content

Matrix Caching

jcanny edited this page May 20, 2014 · 26 revisions

Overview

Memory on a GPU is a much scarcer resource than on a CPU (currently 3-12GB is typical). Not only that, but at least in CUDA, GPU memory allocation through the C API is very expensive. It involves interaction between CPU and GPU, and has a high almost-fixed overhead. e.g. a regression algorithm on the GPU (without caching) which allocated 1 MB blocks spent 98% of its time in malloc, and only 2% of its time computing. So even with a custom garbage collector, fully-automatic storage management on the GPU seems very difficult.

Fortunately, in machine learning applications data is often consumed in fixed-size minibatches. The minibatches, and all the intermediate results derived from them, have fixed size from one minibatch update to the next.

Clone this wiki locally