Chapter 32. Jobs

32.1. Overview
Copy link

A job, in contrast to a replication controller, runs a pod with any number of replicas to completion. A job tracks the overall progress of a task and updates its status with information about active, succeeded, and failed pods. Deleting a job will clean up any pod replicas it created. Jobs are part of the Kubernetes API, which can be managed with oc commands like other object types.

See the Kubernetes documentation for more information about jobs.

32.2. Creating a Job
Copy link

A job configuration consists of the following key parts:

A pod template, which describes the application the pod will create.
An optional parallelism parameter, which specifies how many pod replicas running in parallel should execute a job. If not specified, this defaults to the value in the completions parameter.
An optional completions parameter, specifying how many concurrently running pods should execute a job. If not specified, this value defaults to one.

The following is an example of a job resource:

apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  parallelism: 1    
  completions: 1    
  template:         
    metadata:
      name: pi
    spec:
      containers:
      - name: pi
        image: perl
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: OnFailure

apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  parallelism: 1

1


  completions: 1

2


  template:

3


    metadata:
      name: pi
    spec:
      containers:
      - name: pi
        image: perl
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: OnFailure

4

Copy to Clipboard

Toggle word wrap

Optional value for how many pod replicas a job should run in parallel; defaults to completions.
Optional value for how many successful pod completions are needed to mark a job completed; defaults to one.
Template for the pod the controller creates.
The restart policy of the pod. This does not apply to the job controller. See Section 32.2.1, “Known Limitations” for details.

You can also create and launch a job from a single command using oc run. The following command creates and launches the same job as specified in the previous example:

oc run pi --image=perl --replicas=1  --restart=OnFailure \
    --command -- perl -Mbignum=bpi -wle 'print bpi(2000)'

$ oc run pi --image=perl --replicas=1  --restart=OnFailure \
    --command -- perl -Mbignum=bpi -wle 'print bpi(2000)'

Copy to Clipboard

Toggle word wrap

32.2.1. Known Limitations
Copy link

The job specification restart policy only applies to the pods, and not the job controller. However, the job controller is hard-coded to keep retrying jobs to completion.

As such, restartPolicy: Never or --restart=Never results in the same behavior as restartPolicy: OnFailure or --restart=OnFailure. That is, when a job fails it is restarted automatically until it succeeds (or is manually discarded). The policy only sets which subsystem performs the restart.

With the Never policy, the job controller performs the restart. With each attempt, the job controller increments the number of failures in the job status and create new pods. This means that with each failed attempt, the number of pods increases.

With the OnFailure policy, kubelet performs the restart. Each attempt does not increment the number of failures in the job status. In addition, kubelet will retry failed jobs starting pods on the same nodes.

32.3. Scaling a Job
Copy link

A job can be scaled up or down by using the oc scale command with the --replicas option, which, in the case of jobs, modifies the spec.parallelism parameter. This will result in modifying the number of pod replicas running in parallel, executing a job.

The following command uses the example job above, and sets the parallelism parameter to three:

oc scale job pi --replicas=3

$ oc scale job pi --replicas=3

Copy to Clipboard

Toggle word wrap

Note

Scaling replication controllers also uses the oc scale command with the --replicas option, but instead changes the replicas parameter of a replication controller configuration.

32.4. Setting Maximum Duration
Copy link

When defining a Job, you can define its maximum duration by setting the activeDeadlineSeconds field. It is specified in seconds and is not set by default. When not set, there is no maximum duration enforced.

The maximum duration is counted from the time when a first pod gets scheduled in the system, and defines how long a job can be active. It tracks overall time of an execution and is irrelevant to the number of completions (number of pod replicas needed to execute a task). After reaching the specified timeout, the job is terminated by OpenShift Online.

The following example shows the part of a Job specifying activeDeadlineSeconds field for 30 minutes:

  spec:
    activeDeadlineSeconds: 1800

  spec:
    activeDeadlineSeconds: 1800

Copy to Clipboard

Toggle word wrap

32.5. Job Backoff Failure Policy
Copy link

A Job can be considered failed, after a set amount of retries due to a logical error in configuration or other similar reasons. To specify the number of retries for a job set the .spec.backoffLimit property. This field defaults to six. Failed Pods associated with the Job are recreated by the controller with an exponential backoff delay (10s, 20s, 40s …) capped at six minutes. The limit is reset if no new failed pods appear between controller checks.

32.1. Overview
Copy link

32.2. Creating a Job
Copy link

32.2.1. Known Limitations
Copy link

32.3. Scaling a Job
Copy link

32.4. Setting Maximum Duration
Copy link

32.5. Job Backoff Failure Policy
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 32. Jobs

32.1. OverviewCopy linkLink copied to clipboard!

32.2. Creating a JobCopy linkLink copied to clipboard!

32.2.1. Known LimitationsCopy linkLink copied to clipboard!

32.3. Scaling a JobCopy linkLink copied to clipboard!

32.4. Setting Maximum DurationCopy linkLink copied to clipboard!

32.5. Job Backoff Failure PolicyCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

32.1. Overview
Copy link

32.2. Creating a Job
Copy link

32.2.1. Known Limitations
Copy link

32.3. Scaling a Job
Copy link

32.4. Setting Maximum Duration
Copy link

32.5. Job Backoff Failure Policy
Copy link