|
| 1 | +// Module included in the following assemblies: |
| 2 | +// |
| 3 | +// * nodes/nodes-nodes-jobs.adoc |
| 4 | + |
| 5 | +[id='nodes-nodes-jobs-about_{context}'] |
| 6 | += Understanding jobs and CronJobs in {product-title} |
| 7 | + |
| 8 | +A job tracks the overall progress of a task and updates its status with information |
| 9 | +about active, succeeded, and failed pods. Deleting a job will clean up any pods it created. |
| 10 | +Jobs are part of the Kubernetes API, which can be managed |
| 11 | +with `oc` commands like other object types. |
| 12 | + |
| 13 | +There are two possible resource types that allow creating run-once objects in {product-title}: |
| 14 | + |
| 15 | +Job:: |
| 16 | +A regular job is a run-once object that creates a task and ensures the job finishes. |
| 17 | + |
| 18 | +CronJob:: |
| 19 | + |
| 20 | +A CronJob can be scheduled to run multiple times, use a CronJob. |
| 21 | + |
| 22 | +A _CronJob_ builds on a regular job by allowing you to specify |
| 23 | +how the job should be run. CronJobs are part of the |
| 24 | +link:http://kubernetes.io/docs/user-guide/cron-jobs[Kubernetes] API, which |
| 25 | +can be managed with `oc` commands like other object types. |
| 26 | + |
| 27 | +CronJobs are useful for creating periodic and recurring tasks, like running backups or sending emails. |
| 28 | +CronJobs can also schedule individual tasks for a specific time, such as if you want to schedule a job for a low activity period. |
| 29 | + |
| 30 | +ifdef::openshift-online[] |
| 31 | +[IMPORTANT] |
| 32 | +==== |
| 33 | +CronJobs are only available for _OpenShift Online Pro_. For more information about the |
| 34 | +differences between Starter and Pro tiers, visit the |
| 35 | +link:https://www.openshift.com/pricing/index.html[pricing page]. |
| 36 | +==== |
| 37 | +endif::[] |
| 38 | + |
| 39 | +[WARNING] |
| 40 | +==== |
| 41 | +A CronJob creates a job object approximately once per execution time of its |
| 42 | +schedule, but there are circumstances in which it fails to create a job or |
| 43 | +two jobs might be created. Therefore, jobs must be idempotent and you must |
| 44 | +configure history limits. |
| 45 | +==== |
| 46 | + |
| 47 | +[[jobs-create]] |
| 48 | += Understanding how to create jobs |
| 49 | + |
| 50 | +Both resource types require a job configuration that consists of the following key parts: |
| 51 | + |
| 52 | +- A pod template, which describes the pod that {product-title} creates. |
| 53 | +- An optional `parallelism` parameter, which specifies how many pods running in parallel at any point in time should execute a job. If not specified, this defaults to |
| 54 | + the value in the `completions` parameter. |
| 55 | +- An optional `completions` parameter, specifying how many successful pod completions are needed to finish a job. If not specified, this value defaults to one. |
| 56 | + |
| 57 | +[[jobs-set-max]] |
| 58 | +== Understanding how to set a maximum duration for jobs |
| 59 | + |
| 60 | +When defining a job, you can define its maximum duration by setting |
| 61 | +the `activeDeadlineSeconds` field. It is specified in seconds and is not |
| 62 | +set by default. When not set, there is no maximum duration enforced. |
| 63 | + |
| 64 | +The maximum duration is counted from the time when a first pod gets scheduled in |
| 65 | +the system, and defines how long a job can be active. It tracks overall time of |
| 66 | +an execution. After reaching the specified timeout, the job is terminated by {product-title}. |
| 67 | + |
| 68 | +[[jobs-set-backoff]] |
| 69 | +== Understanding how to set a job back off policy for pod failure |
| 70 | + |
| 71 | +A Job can be considered failed, after a set amount of retries due to a |
| 72 | +logical error in configuration or other similar reasons. Failed Pods associated with the Job are recreated by the controller with |
| 73 | +an exponential back off delay (`10s`, `20s`, `40s` …) capped at six minutes. The |
| 74 | +limit is reset if no new failed pods appear between controller checks. |
| 75 | + |
| 76 | +Use the `spec.backoffLimit` parameter to set the number of retries for a job. |
| 77 | + |
| 78 | +[[jobs-artifacts]] |
| 79 | +== Understanding how to configure a CronJob to remove artifacts |
| 80 | + |
| 81 | +CronJobs can leave behind artifact resources such as jobs or pods. As a user it is important |
| 82 | +to configure history limits so that old jobs and their pods are properly cleaned. There are two fields within CronJob's spec responsible for that: |
| 83 | + |
| 84 | +* `.spec.successfulJobsHistoryLimit`. The number of successful finished jobs to retain (defaults to 3). |
| 85 | + |
| 86 | +* `.spec.failedJobsHistoryLimit`. The number of failed finished jobs to retain (defaults to 1). |
| 87 | + |
| 88 | +[TIP] |
| 89 | +==== |
| 90 | +* Delete CronJobs that you no longer need: |
| 91 | ++ |
| 92 | +[source,bash] |
| 93 | +---- |
| 94 | +$ oc delete cronjob/<cron_job_name> |
| 95 | +---- |
| 96 | ++ |
| 97 | +Doing this prevents them from generating unnecessary artifacts. |
| 98 | +
|
| 99 | +* You can suspend further executions by setting the `spec.suspend` to true. All subsequent executions are suspended until you reset to `false`. |
| 100 | +==== |
| 101 | + |
| 102 | +[[jobs-limits]] |
| 103 | += Known limitations |
| 104 | + |
| 105 | +The job specification restart policy only applies to the _pods_, and not the _job controller_. However, the job controller is hard-coded to keep retrying jobs to completion. |
| 106 | + |
| 107 | +As such, `restartPolicy: Never` or `--restart=Never` results in the same behavior as `restartPolicy: OnFailure` or `--restart=OnFailure`. That is, when a job fails it is restarted automatically until it succeeds (or is manually discarded). The policy only sets which subsystem performs the restart. |
| 108 | + |
| 109 | +With the `Never` policy, the _job controller_ performs the restart. With each attempt, the job controller increments the number of failures in the job status and create new pods. This means that with each failed attempt, the number of pods increases. |
| 110 | + |
| 111 | +With the `OnFailure` policy, _kubelet_ performs the restart. Each attempt does not increment the number of failures in the job status. In addition, kubelet will retry failed jobs starting pods on the same nodes. |
0 commit comments