We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 825320e commit 6f26df9Copy full SHA for 6f26df9
e2e_testing/README.md
@@ -21,5 +21,22 @@ model, and runs a few checks. This is implemented in
21
- Check for specific log strings that indicate training success.
22
- Check that there is a profile `.pb` file.
23
24
+## v6e XPK cluster
25
+
26
+E2E tests are launched onto an XPK cluster named `tpu-v6e-ci`.
27
28
+To heal or re-create the cluster, use the following:
29
30
+```sh
31
+xpk cluster create \
32
+ --tpu-type v6e-4 \
33
+ --cluster tpu-v6e-ci \
34
+ --num-slices 48 \
35
+ --on-demand \
36
+ --zone us-central2-b \
37
+ --project tpu-pytorch \
38
+ --default-pool-cpu-machine-type=n2-standard-32
39
+```
40
41
[e2e-test]: /.github/workflows/e2e_test.yml
42
[e2e-check]: /.github/workflows/reusable_e2e_check.yml
0 commit comments