Skip to content

Commit 6f26df9

Browse files
authored
Document the tpu-v6e-ci cluster used in E2E tests (#155)
1 parent 825320e commit 6f26df9

File tree

1 file changed

+17
-0
lines changed

1 file changed

+17
-0
lines changed

e2e_testing/README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,5 +21,22 @@ model, and runs a few checks. This is implemented in
2121
- Check for specific log strings that indicate training success.
2222
- Check that there is a profile `.pb` file.
2323

24+
## v6e XPK cluster
25+
26+
E2E tests are launched onto an XPK cluster named `tpu-v6e-ci`.
27+
28+
To heal or re-create the cluster, use the following:
29+
30+
```sh
31+
xpk cluster create \
32+
--tpu-type v6e-4 \
33+
--cluster tpu-v6e-ci \
34+
--num-slices 48 \
35+
--on-demand \
36+
--zone us-central2-b \
37+
--project tpu-pytorch \
38+
--default-pool-cpu-machine-type=n2-standard-32
39+
```
40+
2441
[e2e-test]: /.github/workflows/e2e_test.yml
2542
[e2e-check]: /.github/workflows/reusable_e2e_check.yml

0 commit comments

Comments
 (0)