-
Notifications
You must be signed in to change notification settings - Fork 73
Matrix Caching
Memory on a GPU is a much scarcer resource than on a CPU (currently 3-12GB is typical). Not only that, but at least in CUDA, GPU memory allocation through the C API is very expensive. It involves interaction between CPU and GPU, and has a high almost-fixed overhead. e.g. a regression algorithm on the GPU (without caching) which allocated 1 MB blocks spent 98% of its time in malloc, and only 2% of its time computing. So even with a custom garbage collector, fully-automatic storage management on the GPU seems very difficult.
Fortunately, in machine learning applications data is often consumed in fixed-size minibatches. The minibatches, and all the intermediate results derived from them, have fixed size from one minibatch update to the next. This suggests that caching could be a very effective strategy for GPU storage management. Caching is enabled globally setting Mat.useCache=true
.
To support caching, every matrix has a unique, 64-bit GUID field which is created when the matrix is created. Whenever an operator or function is applied to matrix arguments, a cache key is created based on the GUIDs of the arguments and the function or operator to be applied. e.g.
> val c = a * b
with FMat arguments a and b will cause a check of a matrix cache with key
(a.GUID, b.GUID, "*".##)
where the third argument is a hash of the string "*" representing the operator. If the cache entry is non-null, it should be a container of appropriate size to hold the result c
. If the cache entry is null, then a new matrix of appropriate size is created, and then saved in the cache. This works because the sizes (but not necessarily the contents) of matrices are immutable, and because the size of the result of an operator like * depends on the sizes of its arguments. Thus the size of the result c
must always be the same.
As a final postscript, we have found that using caching improves the performance of CPU code as well as GPU code, sometimes significantly. e.g. on laptops with limited memory, caching sometime doubles or triples performance.
Caching is designed to support algorithms that work with fixed size blocks of (matrix) data. With that constraint, it is quite easy to use but has some hazards which we cover here.
In a functional language, data objects are immutable and computation involves deriving new objects from old using operators and functions. Functional languages are referentially transparent meaning <math>A+B</math> always means the same thing in the scope of a particular pair of declarations for <math>A</math> and <math>B</math>. BIDMat, like its parent language Scala, encourages functional programming but is not "pure". Some objects, matrices in particular, are mutable for very good reasons (performance being high among them). Most of the functions and operators in BIDMat can be used functionally, but also support mutation for efficiency. e.g.
> val b = sin(a)
creates a new container for b
while
> sin(a,b)
stores its result into an existing b
, mutating b
as a result. You can use either approach, but functional style is almost always easier to read and less error-prone. This is true in general, but especially when using caching because there are subtle interactions that can occur with cached data. e.g.
> val c = a * b // creates cached c with key (a.GUID, b.GUID, "*".##) > c(ii) = vv // a mutating operation on c, bad idea > val d = (a * b) - x // (a*b) container in cache, gets rewritten modifying c
that is, there is only one container for a*b
and any changes to it will be overwritten by any later occurrence of that expression.
These problems can only occur when mutating operations are used. With functional steps only, no containers are modified, and results will always be correct. However, there will always be a need to mutate matrix contents. We quickly review mutating operations, and then discuss how to structure code to use mutation safely.