extend mergekit to make it work on xpu #580

yao-matrix · 2025-05-16T06:25:53Z

since PyTorch 2.5, xpu has been a built-in device of PyTorch.

In this PR, I extend the accelerator device of mergekit from CUDA only to Intel XPU(which is the name of Intel's GPU) and potentially to more devices, by adding a new device field in MergeOption.

I've passed the UT cases with

================== 70 passed, 2 warnings in 60.49s (0:01:00) ===================

@cg123 , pls help review and let me know what I need do next step, thx very much.

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

github-actions · 2025-05-16T06:26:04Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

yao-matrix · 2025-05-16T06:26:45Z

I have read the CLA Document and I hereby sign the CLA

yao-matrix · 2025-05-16T06:26:59Z

recheck

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

yao-matrix · 2025-05-19T22:44:16Z

@cg123 , could you pls help review? Thx very much

cg123

Thank you for the PR, this is a very welcome addition! Left some comments but nothing huge.

mergekit/evo/actors.py

cg123 · 2025-05-21T05:52:16Z

mergekit/graph.py

@@ -529,7 +529,7 @@ def _move_tensors(
        self, value: Any, device: torch.device, non_blocking: Optional[bool] = None
    ) -> Any:
        if non_blocking is None:
-            non_blocking = device.type == "cuda"
+            non_blocking = device.type in ["cuda", "xpu"]


mergekit/multigpu_executor.py

cg123 · 2025-05-21T06:02:54Z

mergekit/multigpu_executor.py

-            stream = torch.cuda.Stream(device=device)
-            with torch.cuda.stream(stream):
+            stream = (
+                torch.Stream(device=device)


Maybe we should just use torch.Stream in all cases?

we can. The only concern is torch.Stream is only available from 2.5, but mergekit's torch dependency is >= 2.0 now. So if users installs torch 2.4, it will crash. Pls let me know your insights.

Okay, fair! This is good for now then. I'll look at it if I bump the minimum torch version in the future.

mergekit/options.py

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

yao-matrix · 2025-05-22T08:35:06Z

@cg123 , thx very much for you comments. I've updated per your comments, pls help review again, thx.

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

yao-matrix · 2025-05-26T23:09:49Z

@cg123 , thx very much for you comments. I've updated per your comments, pls help review again, thx.

yao-matrix · 2025-06-02T23:24:14Z

@cg123 , could you pls help review again, thx.

cg123 · 2025-06-06T19:16:29Z

Think this is good to go now. Thanks for your patience and the pr!

yao-matrix added 2 commits May 16, 2025 06:24

extend mergekit to work on xpu

70f07c9

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

Merge branch 'main' into xpu

9655271

fix empty_cache

1b1ee80

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

yao-matrix changed the title ~~extend mergekit to work on xpu~~ extend mergekit to make it work on xpu May 16, 2025

fix format

cfd9b73

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

Merge branch 'main' into xpu

6db889e

cg123 reviewed May 21, 2025

View reviewed changes

fix 1st-round comments

ced573e

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

yao-matrix added 4 commits May 22, 2025 08:37

one more

c57f362

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

keep refining

b7ecbb3

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

keep refining

15b6727

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

fix typo

1fcd025

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

cg123 added 3 commits June 6, 2025 12:12

Fix reference to accelerator_type

30e63f2

Typo

e4bfb30

Typo

b756f5e

cg123 merged commit d86ddbc into arcee-ai:main Jun 6, 2025
5 checks passed

github-actions bot locked and limited conversation to collaborators Jun 6, 2025

yao-matrix deleted the xpu branch June 8, 2025 23:00

extend mergekit to make it work on xpu #580

extend mergekit to make it work on xpu #580

Uh oh!

Conversation

yao-matrix commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yao-matrix commented May 16, 2025

Uh oh!

yao-matrix commented May 16, 2025

Uh oh!

yao-matrix commented May 19, 2025

Uh oh!

cg123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cg123 May 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cg123 May 21, 2025

Choose a reason for hiding this comment

Uh oh!

yao-matrix May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cg123 Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yao-matrix commented May 22, 2025

Uh oh!

yao-matrix commented May 26, 2025

Uh oh!

yao-matrix commented Jun 2, 2025

Uh oh!

cg123 commented Jun 6, 2025

Uh oh!

Uh oh!

Uh oh!

yao-matrix commented May 16, 2025 •

edited

Loading

github-actions bot commented May 16, 2025 •

edited

Loading

yao-matrix May 22, 2025 •

edited

Loading