Matrix Caching

Overview

Memory on a GPU is a much scarcer resource than on a CPU (currently 3-12GB is typical). Not only that, but at least in CUDA, GPU memory allocation through the C API is very expensive. It involves interaction between CPU and GPU, and has a high almost-fixed overhead. e.g. a regression algorithm on the GPU (without caching) which allocated 1 MB blocks spent 98% of its time in malloc, and only 2% of its time computing. So even with a custom garbage collector, fully-automatic storage management on the GPU seems very difficult.

Fortunately, in machine learning applications data is often consumed in fixed-size minibatches. The minibatches, and all the intermediate results derived from them, have fixed size from one minibatch update to the next.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Matrix Caching

Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally