-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Description
I noticed there's a lot of sparse matrix code that does things like:
for col = 1:n, p = A.colptr[col]:(A.colptr[col+1]-1)
C.nzval[p] = A.nzval[p] * b[A.rowval[p]]
end
The compiler presently assumes that the contents of C.nzval
, A.nzval
, and A.rowval
can alias the parts of C
and A
that contain the pointers to the array objects, and thus needs to reload those pointers on each loop iteration. The performance cost depends on how much is in cache, but for medium-sized sparse matrices this can be ~30% faster on my system:
Anzval = A.nzval
Arowval = A.rowval
Cnzval = C.nzval
for col = 1:n, p = A.colptr[col]:(A.colptr[col+1]-1)
Cnzval[p] = Anzval[p] * b[Arowval[p]]
end
If @inbounds
is added, the performance difference is even larger, since the optimizer can hoist the loads of both the pointer to the array object and the pointer to the array data in the second case but not the first.
Since #8867, I suspect the performance gap would disappear if SparseMatrixCSC were made immutable. However, there is code that relies on its mutability, and code patterns like this are probably also present outside of Base. Thus, I wonder whether we could place some restrictions on when array pointers and mutable objects can alias each other, so that extracting variables isn't necessary to achieve optimal performance.
I wonder whether it ever happens that arrays alias objects. This is possible using pointer_to_array(convert(Ptr{T}, pointer_from_objref(x)), ...)
but probably isn't common. It would also be sufficient to tell TBAA that arrays never alias portions of objects that contain pointers.