Some users want to train on systems with GPUs, save the model, and then load it in production on CPU-only systems to do inference. This use case is fully supported, but some users may not be aware of it, have some concerns, and/or think that the results may differ (beyond floating point accumulation errors). Examples include this Twitter thread and several issues in the past few months (dmlc#8362, dmlc#8148, and dmlc#8047 )
This might be a great thing to add to the XGBoost docs (maybe in the Model IO section).