openshift
diff --git a/‎_topic_map.yml
Lines changed: 16 additions & 0 deletions b/‎_topic_map.yml
Lines changed: 16 additions & 0 deletions
diff --git a/‎modules/nodes-nodes-garbage-collection-configuring.adoc
Lines changed: 59 additions & 0 deletions b/‎modules/nodes-nodes-garbage-collection-configuring.adoc
Lines changed: 59 additions & 0 deletions
diff --git a/‎modules/nodes-nodes-garbage-collection-containers.adoc
Lines changed: 90 additions & 0 deletions b/‎modules/nodes-nodes-garbage-collection-containers.adoc
Lines changed: 90 additions & 0 deletions
diff --git a/‎modules/nodes-nodes-garbage-collection-images.adoc
Lines changed: 51 additions & 0 deletions b/‎modules/nodes-nodes-garbage-collection-images.adoc
Lines changed: 51 additions & 0 deletions
diff --git a/‎modules/nodes-nodes-jobs-about.adoc
Lines changed: 111 additions & 0 deletions b/‎modules/nodes-nodes-jobs-about.adoc
Lines changed: 111 additions & 0 deletions
@@ -212,6 +212,22 @@ Name: Nodes
 Dir: nodes
 Distros: openshift-*
 Topics:
+- Name: Viewing and listing the nodes in your cluster
+  File: nodes-nodes-viewing
+- Name: Working with nodes
+  File: nodes-nodes-working
+- Name: Understanding node rebooting
+  File: nodes-nodes-rebooting
+- Name: Freeing node resources using garbage collection
+  File: nodes-nodes-garbage-collection
+- Name: Allocating resources for nodes
+  File: nodes-nodes-resources-configuring
+- Name: Advertising hidden resources for nodes
+  File: nodes-nodes-opaque-resources
+- Name: Monitoring for problems in your nodes
+  File: nodes-nodes-problem-detector
+- Name: Running tasks in pods using jobs
+  File: nodes-nodes-jobs
 - Name: Using containers
   File: nodes-containers-using
 - Name: Using Init Containers to perform tasks before a pod is deployed
 
@@ -0,0 +1,59 @@
+// Module included in the following assemblies:
+//
+// * nodes/nodes-nodes-garbage-collection.adoc
+
+[id='nodes-nodes-garbage-collection-configuring_{context}']
+= Configuring garbage collection for containers and images
+
+As an administrator, you can configure how {product-title} performs garbage collection for each type of node group.
+
+.Procedure
+
+. Open the appropriate node configuration map. 
++
+* node-config-master
+
+* node-config-infra
+
+* node-config-compute
+
+* node-config-all-in-one
+
+* node-config-master-infra
+
+
+. For containers, you can specify values for these settings in the `*kubeletArguments*` section of
+the node configuration map. Add the section if it does not already exist:
++
+[source,yaml]
+----
+kubeletArguments:
+  minimum-container-ttl-duration: <1>
+    - "10s"
+  maximum-dead-containers-per-container: <2>
+    - "2"
+  maximum-dead-containers: <3>
+    - "240"
+----
+<1> Specify the minimum age that a container is eligible for garbage collection. The
+default is *0*. 
+<2> Specify the number of instances to retain per pod container. The default is *1*.
+<3> Specify the maximum number of total dead containers in the node. The default is *-1*, which means unlimited.
+
+. For images, you can specify values for these settings in the `*kubeletArguments*` section of
+the node configuration map.
++
+[source,yaml]
+----
+kubeletArguments:
+  image-gc-high-threshold: <1>
+    - "85"
+  image-gc-low-threshold: <2>
+    - "80"
+----
+<1> Specify he percent of disk usage (expressed as an integer) which triggers image
+garbage collection. The default is *85*.
+<2> Specify percent of disk usage (expressed as an integer) to which image garbage
+collection attempts to free. Default is *80*.
+
+. Save and close the configuration map. A sync pod on each node of that type picks up and implements the changes.
@@ -0,0 +1,90 @@
+// Module included in the following assemblies:
+//
+// * nodes/nodes-nodes-garbage-collection.adoc
+
+[id='nodes-nodes-garbage-collection-containers_{context}']
+= Understanding how terminated containers are removed though garbage collection
+
+Container garbage collection is enabled by default and happens automatically in
+response to eviction thresholds being reached. The node tries to keep any
+container for any pod accessible from the API. If the pod has been deleted, the
+containers will be as well. Containers are preserved as long the pod is not
+deleted and the eviction threshold is not reached. If the node is under disk
+pressure, it will remove containers and their logs will no longer be accessible
+via `oc logs`.
+
+The policy for container garbage collection is based on three conditions:
+
+* The minimum age that a container is eligible for garbage collection. The
+default is *0*. 
+
+* The number of instances to retain per pod container. The default is *1*.
+
+* The maximum number of total dead containers in the node. The default is *-1*, which means unlimited.
+
+[IMPORTANT]
+====
+Garbage collection only removes the containers that do not have any pods.
+====
+
+For container garbage collection, you can modify any of the following variables in 
+in the `*kubeletArguments*` section of the appropriate node configuration file.
+
+.Variables for configuring container garbage collection
+
+[options="header",cols="1,3"]
+|===
+
+|Setting |Description
+
+|`*minimum-container-ttl-duration*`
+|The minimum age that a container is eligible for garbage collection. The
+default is *0*. Use *0* for no limit. Values for this setting can be
+specified using unit suffixes such as *h* for hour, *m* for minutes, *s* for seconds.
+
+|`*maximum-dead-containers-per-container*`
+|The number of instances to retain per pod container. The default is *1*.
+
+|`*maximum-dead-containers*`
+|The maximum number of total dead containers in the node. The default is *-1*, which means unlimited.
+|===
+
+The `*maximum-dead-containers*` setting takes precedence over the
+`*maximum-dead-containers-per-container*` setting when there is a conflict. For
+example, if retaining the number of `*maximum-dead-containers-per-container*`
+would result in a total number of containers that is greater than
+`*maximum-dead-containers*`, the oldest containers will be removed to satisfy
+the `*maximum-dead-containers*` limit.
+
+When the node removes the dead containers, all files inside those containers are
+removed as well. Only containers created by the node are removed.
+
+ifdef::openshift-origin[]
+[NOTE]
+====
+Currently, Docker and rkt are supported. The following only applies to Docker;
+rkt has its own garbage collection.
+====
+endif::[]
+
+Each spin of the garbage collector loop goes through the following steps:
+
+1. Retrieve a list of available containers.
+2. Filter out all containers that are running or are not alive longer than
+the `*minimum-container-ttl-duration*` parameter.
+3. Classify all remaining containers into equivalence classes based on pod and image name membership.
+4. Remove all unidentified containers (containers that are managed by kubelet but their name is malformed).
+5. For each class that contains more containers than the
+`*maximum-dead-containers-per-container*` parameter, sort containers in the class by
+creation time.
+6. Start removing containers from the oldest first until the
+`*maximum-dead-containers-per-container*` parameter is met.
+7. If there are still more containers in the list than the
+`*maximum-dead-containers*` parameter, the collector starts removing containers
+from each class so the number of containers in each one is not greater than the
+average number of containers per class, or
+`<all_remaining_containers>/<number_of_classes>`.
+8. If this is still not enough, sort all containers in the list and start
+removing containers from the oldest first until the `*maximum-dead-containers*`
+criterion is met.
+
@@ -0,0 +1,51 @@
+// Module included in the following assemblies:
+//
+// * nodes/nodes-nodes-garbage-collection.adoc
+
+[id='nodes-nodes-garbage-collection-images_{context}']
+= Understanding how images are removed though garbage collection
+
+Image garbage collection relies on disk usage as reported by *cAdvisor* on the
+node to decide which images to remove from the node. 
+
+The policy for container garbage collection is based on two conditions:
+
+* The percent of disk usage (expressed as an integer) which triggers image
+garbage collection. The default is *85*.
+
+* The percent of disk usage (expressed as an integer) to which image garbage
+collection attempts to free. Default is *80*.
+
+For image garbage collection, you can modify any of the following variables in 
+in the `*kubeletArguments*` section of the appropriate node configuration file.
+
+.Variables for configuring image garbage collection
+
+[options="header",cols="1,3"]
+|===
+
+|Setting |Description
+
+|`*image-gc-high-threshold*`
+|The percent of disk usage (expressed as an integer) which triggers image
+garbage collection. The default is *85*.
+
+|`*image-gc-low-threshold*`
+|The percent of disk usage (expressed as an integer) to which image garbage
+collection attempts to free. Default is *80*.
+|===
+
+Two lists of images are retrieved in each garbage collector run:
+
+1. A list of images currently running in at least one pod.
+2. A list of images available on a host.
+
+As new containers are run, new images appear. All images are marked with a time
+stamp. If the image is running (the first list above) or is newly detected (the
+second list above), it is marked with the current time. The remaining images are
+already marked from the previous spins. All images are then sorted by the time
+stamp.
+
+Once the collection starts, the oldest images get deleted first until the
+stopping criterion is met.
+
@@ -0,0 +1,111 @@
+// Module included in the following assemblies:
+//
+// * nodes/nodes-nodes-jobs.adoc
+
+[id='nodes-nodes-jobs-about_{context}']
+= Understanding jobs and CronJobs in {product-title} 
+
+A job tracks the overall progress of a task and updates its status with information
+about active, succeeded, and failed pods. Deleting a job will clean up any pods it created. 
+Jobs are part of the Kubernetes API, which can be managed
+with `oc` commands like other object types.
+
+There are two possible resource types that allow creating run-once objects in {product-title}:
+
+Job::
+A regular job is a run-once object that creates a task and ensures the job finishes.
+
+CronJob:: 
+
+A CronJob can be scheduled to run multiple times, use a CronJob.
+
+A _CronJob_ builds on a regular job by allowing you to specify
+how the job should be run. CronJobs are part of the
+link:http://kubernetes.io/docs/user-guide/cron-jobs[Kubernetes] API, which
+can be managed with `oc` commands like other object types.
+
+CronJobs are useful for creating periodic and recurring tasks, like running backups or sending emails. 
+CronJobs can also schedule individual tasks for a specific time, such as if you want to schedule a job for a low activity period.
+
+ifdef::openshift-online[]
+[IMPORTANT]
+====
+CronJobs are only available for _OpenShift Online Pro_. For more information about the
+differences between Starter and Pro tiers, visit the
+link:https://www.openshift.com/pricing/index.html[pricing page].
+====
+endif::[]
+
+[WARNING]
+====
+A CronJob creates a job object approximately once per execution time of its
+schedule, but there are circumstances in which it fails to create a job or
+two jobs might be created.  Therefore, jobs must be idempotent and you must
+configure history limits.
+====
+
+[[jobs-create]]
+= Understanding how to create jobs
+
+Both resource types require a job configuration that consists of the following key parts:
+
+- A pod template, which describes the pod that {product-title} creates.
+- An optional `parallelism` parameter, which specifies how many pods running in parallel at any point in time should execute a job. If not specified, this defaults to
+ the value in the `completions` parameter.
+- An optional `completions` parameter, specifying how many successful pod completions are needed to finish a job. If not specified, this value defaults to one.
+
+[[jobs-set-max]]
+== Understanding how to set a maximum duration for jobs
+
+When defining a job, you can define its maximum duration by setting
+the `activeDeadlineSeconds` field. It is specified in seconds and is not
+set by default. When not set, there is no maximum duration enforced.
+
+The maximum duration is counted from the time when a first pod gets scheduled in
+the system, and defines how long a job can be active. It tracks overall time of
+an execution. After reaching the specified timeout, the job is terminated by {product-title}.
+
+[[jobs-set-backoff]]
+== Understanding how to set a job back off policy for pod failure
+
+A Job can be considered failed, after a set amount of retries due to a
+logical error in configuration or other similar reasons. Failed Pods associated with the Job are recreated by the controller with
+an exponential back off delay (`10s`, `20s`, `40s` …) capped at six minutes. The
+limit is reset if no new failed pods appear between controller checks.
+
+Use the `spec.backoffLimit` parameter to set the number of retries for a job.
+
+[[jobs-artifacts]]
+== Understanding how to configure a CronJob to remove artifacts
+
+CronJobs can leave behind artifact resources such as jobs or pods.  As a user it is important
+to configure history limits so that old jobs and their pods are properly cleaned.  There are two fields within CronJob's spec responsible for that:
+
+* `.spec.successfulJobsHistoryLimit`. The number of successful finished jobs to retain (defaults to 3).
+
+* `.spec.failedJobsHistoryLimit`. The number of failed finished jobs to retain (defaults to 1). 
+
+[TIP]
+====
+* Delete CronJobs that you no longer need:
++
+[source,bash]
+----
+$ oc delete cronjob/<cron_job_name>
+----
++
+Doing this prevents them from generating unnecessary artifacts.
+
+* You can suspend further executions by setting the `spec.suspend` to true.  All subsequent executions are suspended until you reset to `false`. 
+====
+
+[[jobs-limits]]
+= Known limitations
+
+The job specification restart policy only applies to the _pods_, and not the _job controller_. However, the job controller is hard-coded to keep retrying jobs to completion.
+
+As such, `restartPolicy: Never` or `--restart=Never` results in the same behavior as `restartPolicy: OnFailure` or `--restart=OnFailure`. That is, when a job fails it is restarted automatically until it succeeds (or is manually discarded). The policy only sets which subsystem performs the restart.
+
+With the `Never` policy, the _job controller_ performs the restart. With each attempt, the job controller increments the number of failures in the job status and create new pods. This means that with each failed attempt, the number of pods increases.
+
+With the `OnFailure` policy, _kubelet_ performs the restart. Each attempt does not increment the number of failures in the job status. In addition, kubelet will retry failed jobs starting pods on the same nodes.