Skip to content

⚠️ Missing global timeout configuration in PipelineRun specs causes resource monopolization risk #1380

@coderabbitai

Description

@coderabbitai

Problem Description

Multiple Tekton PipelineRun specifications in the repository are missing global timeout configuration, which can cause long-running builds to monopolize cluster resources indefinitely if they stall due to network issues, registry outages, or other failures.

Impact Analysis

Resource Monopolization Risk:

  • PipelineRuns without timeouts can run indefinitely if builds stall
  • Network connectivity issues or registry outages can cause builds to hang
  • Cluster resources (CPU, memory, storage) remain allocated until manual intervention
  • Other pipeline runs may be delayed or fail due to resource exhaustion

Inconsistent Configuration:

  • Some pipelines have 8-hour timeouts while others have no timeout limits
  • Creates operational inconsistency and unpredictable behavior
  • Makes troubleshooting and capacity planning more difficult

Affected Files

Files Missing Timeouts (18 total):

  1. .tekton/odh-pipeline-runtime-datascience-cpu-py311-ubi9-push.yaml
  2. .tekton/odh-pipeline-runtime-datascience-cpu-py312-ubi9-push.yaml
  3. .tekton/odh-pipeline-runtime-minimal-cpu-py311-ubi9-push.yaml
  4. .tekton/odh-pipeline-runtime-minimal-cpu-py312-ubi9-push.yaml
  5. .tekton/odh-workbench-codeserver-datascience-cpu-py311-ubi9-push.yaml
  6. .tekton/odh-workbench-codeserver-datascience-cpu-py312-ubi9-push.yaml
  7. .tekton/odh-workbench-jupyter-datascience-cpu-py311-ubi9-push.yaml
  8. .tekton/odh-workbench-jupyter-datascience-cpu-py312-ubi9-push.yaml
  9. .tekton/odh-workbench-jupyter-minimal-cpu-py311-ubi9-push.yaml
  10. .tekton/odh-workbench-jupyter-minimal-cpu-py312-ubi9-push.yaml
  11. .tekton/odh-workbench-jupyter-minimal-cuda-py311-ubi9-push.yaml
  12. .tekton/odh-workbench-jupyter-minimal-cuda-py312-ubi9-push.yaml
  13. .tekton/odh-workbench-jupyter-pytorch-cuda-py311-ubi9-push.yaml
  14. .tekton/odh-workbench-jupyter-pytorch-cuda-py312-ubi9-push.yaml
  15. .tekton/odh-workbench-jupyter-pytorch-rocm-py311-ubi9-push.yaml
  16. .tekton/odh-workbench-jupyter-pytorch-rocm-py312-ubi9-push.yaml
  17. .tekton/odh-workbench-jupyter-tensorflow-cuda-py311-ubi9-push.yaml
  18. .tekton/odh-workbench-jupyter-tensorflow-cuda-py312-ubi9-push.yaml

Files With Correct Timeouts (14 total):
All remaining PipelineRun files already have the standard 8-hour timeout configuration.

Solution

Add a timeouts block to the spec section of each affected PipelineRun, following the established pattern used in other pipelines:

spec:
  timeouts:
    pipeline: 8h
  params:
    # ... existing parameters

Acceptance Criteria

  • All 18 affected PipelineRun files have timeout configuration added
  • Timeout duration is set to 8 hours (pipeline: 8h) to align with existing patterns
  • Timeout block is placed immediately after spec: and before params: for consistency
  • No functional changes to pipeline behavior other than timeout enforcement
  • All affected pipelines can still complete successfully within the 8-hour limit
  • Documentation is updated if necessary to reflect timeout policies

Implementation Guidance

  1. Consistent Placement: Add the timeout block immediately after spec: and before params:
  2. Standard Duration: Use pipeline: 8h to match existing timeout configurations
  3. Batch Processing: Consider grouping changes by pipeline type (runtime vs workbench) for easier review
  4. Testing: Verify that normal builds still complete within the timeout limit
  5. Monitoring: Consider adding alerting for pipelines approaching the timeout limit

Context

Benefits

  • Resource Protection: Prevents indefinite resource allocation from stalled builds
  • Operational Consistency: Standardizes timeout behavior across all pipelines
  • Predictable Behavior: Makes pipeline execution time limits explicit and manageable
  • Improved Troubleshooting: Failed builds timeout rather than hanging indefinitely
  • Capacity Planning: Enables better cluster resource planning and scheduling

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

📋 Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions