Memory Issues with Sparse Vectors in XGBoost4j-Spark: Disabling Sparse-to-Dense Conversion #11467

stepanov1997 · 2025-05-18T09:08:05Z

stepanov1997
May 18, 2025

Hi everyone,

I’m using sparse vectors with about 10 features out of a possible 50 million. However, the conversion to dense vectors is causing heap exhaustion. Is there a way to disable the sparse-to-dense conversion?

Right now, I can’t even train on a small batch of vectors without running into memory issues, but I ultimately need to train on 200 million rows.

Any help would be greatly appreciated. I’m using XGBoost4j-Spark version 3.0.0 with the Java.

Thanks!

trivialfis · 2025-05-18T10:59:33Z

trivialfis
May 18, 2025
Maintainer

cc @wbo4958

0 replies

stepanov1997 · 2025-05-19T14:38:53Z

stepanov1997
May 19, 2025
Author

Hey @trivialfis and @wbo4958, the lack of proper handling for SparseVector (as noted in the TODO) causes memory issues. If I have 50 million features, converting it to a dense vector can lead to serious problems. With proper sparse vector support, this wouldn’t be an issue

xgboost/jvm-packages/xgboost4j-spark/src/main/scala/ml/dmlc/xgboost4j/scala/spark/Utils.scala

Line 52 in 94bb1da

// TODO support sparsevector

Can we expect this issue to be fixed in the near future, or should I consider giving up on distributed training for now?

2 replies

trivialfis May 19, 2025
Maintainer

It's in the backlog. But may I ask how did you get the 50 million features? Are you using some kind of encoding?

stepanov1997 May 20, 2025
Author

Thx, I'm using Murmurhash to encode features.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Issues with Sparse Vectors in XGBoost4j-Spark: Disabling Sparse-to-Dense Conversion #11467

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Memory Issues with Sparse Vectors in XGBoost4j-Spark: Disabling Sparse-to-Dense Conversion #11467

stepanov1997 May 18, 2025

Replies: 2 comments · 2 replies

trivialfis May 18, 2025 Maintainer

stepanov1997 May 19, 2025 Author

trivialfis May 19, 2025 Maintainer

stepanov1997 May 20, 2025 Author

stepanov1997
May 18, 2025

Replies: 2 comments 2 replies

trivialfis
May 18, 2025
Maintainer

stepanov1997
May 19, 2025
Author

trivialfis May 19, 2025
Maintainer

stepanov1997 May 20, 2025
Author