Replies: 1 comment 2 replies
-
I think the issue here is that sparse |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
I need to process batches from a sparse document-term matrix, retrieving both the densified rows and their corresponding indices. To enable JIT compilation, JAX requires me to convert the matrix into a sparse BCOO format.
However, indexing the sparse matrix efficiently is challenging. The best performance occurs when I first fully densify the matrix and then extract the batched rows. Unfortunately, this approach is impractical for large matrices, as it consumes too much memory. On the other hand, if I attempt to index the sparse matrix first and then densify only the selected rows, I run out of memory (OOM).
You find a minimal example below. I am running the code on AWS ml.g5.2xlarge instance. When executing the
get_batch_2()
function, I get the error:XlaRuntimeError: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 148330749608 bytes.
When executing the
get_batch_1()
function, things work. This is curious from my understanding since this function densifies the whole matrix, whereas theget_batch_2()
function is only densifiyng the batch. If i double the number of documents, both approaches fail.Can anyone help?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions