feat(refactor): refactor for easier usage #1462
Triggered via pull request
May 19, 2025 08:12
Status
Cancelled
Total duration
1d 0h 0m 1s
Artifacts
–
e2e_test.yaml
on: pull_request
training_4GPU
0s
training_8GPU_ISP
0s
training_8GPU_ISP_CKPT
1s
training_8GPU_4DP2PP_ZB
1s
Matrix: training_16GPU_4DP2TP2PP_FSP
Matrix: training_16GPU_4DP2TP2PP_MSP
Matrix: training_16GPU_4DP2TP2PP_MTP
Matrix: training_8GPU_4DP2PP
Matrix: training_8GPU_4DP2TP
Matrix: training_8GPU_4DP2TPSP
Matrix: training_llama2
Annotations
11 errors
training_16GPU_4DP2TP2PP_MSP (t_cluster)
The job has exceeded the maximum execution time while awaiting a runner for 24h0m0s
|
training_16GPU_4DP2TP2PP_FSP (t_cluster)
The job has exceeded the maximum execution time while awaiting a runner for 24h0m0s
|
training_8GPU_4DP2TP (t_cluster)
The job has exceeded the maximum execution time while awaiting a runner for 24h0m0s
|
training_8GPU_ISP
The job has exceeded the maximum execution time while awaiting a runner for 24h0m0s
|
training_llama2 (t_cluster)
The job has exceeded the maximum execution time while awaiting a runner for 24h0m0s
|
training_8GPU_4DP2PP (t_cluster)
The job has exceeded the maximum execution time while awaiting a runner for 24h0m0s
|
training_16GPU_4DP2TP2PP_MTP (t_cluster)
The job has exceeded the maximum execution time while awaiting a runner for 24h0m0s
|
training_4GPU
The job has exceeded the maximum execution time while awaiting a runner for 24h0m0s
|
training_8GPU_4DP2PP_ZB
The job has exceeded the maximum execution time while awaiting a runner for 24h0m0s
|
training_8GPU_4DP2TPSP (t_cluster)
The job has exceeded the maximum execution time while awaiting a runner for 24h0m0s
|
training_8GPU_ISP_CKPT
The job has exceeded the maximum execution time while awaiting a runner for 24h0m0s
|