Skip to content

Matrix Caching

jcanny edited this page May 20, 2014 · 26 revisions

Table of Contents

Overview

Memory on a GPU is a much scarcer resource than on a CPU (currently 3-12GB is typical). Not only that, but at least in CUDA, GPU memory allocation through the C API is very expensive. It involves interaction between CPU and GPU, and has a high almost-fixed overhead. e.g. a regression algorithm on the GPU (without caching) which allocated 1 MB blocks spent 98% of its time in malloc, and only 2% of its time computing. So even with a custom garbage collector, fully-automatic storage management on the GPU seems very difficult.

Fortunately, in machine learning applications data is often consumed in fixed-size minibatches. The minibatches, and all the intermediate results derived from them, have fixed size from one minibatch update to the next. This suggests that caching could be a very effective strategy for GPU storage management. Caching is enabled globally setting Mat.useCache=true.

To support caching, every matrix has a unique, 64-bit GUID field which is created when the matrix is created. Whenever an operator or function is applied to matrix arguments, a cache key is created based on the GUIDs of the arguments and the function or operator to be applied. e.g.

> val c = a * b

with FMat arguments a and b will cause a check of a matrix cache with key

(a.GUID, b.GUID, "*".##)

where the third argument is a hash of the string "*" representing the operator. If the cache entry is non-null, it should be a container of appropriate size to hold the result c. If the cache entry is null, then a new matrix of appropriate size is created, and then saved in the cache. This works because the sizes (but not necessarily the contents) of matrices are immutable, and because the size of the result of an operator like * depends on the sizes of its arguments. Thus the size of the result c must always be the same.

Functional Programming

Explicit Assignment

Safe Iteration

Clone this wiki locally