Chapter 2. Autoscaling

2.1. Autoscaling
Copiar enlace

Knative Serving provides automatic scaling, or autoscaling, for applications to match incoming demand. For example, if an application is receiving no traffic, and scale-to-zero is enabled, Knative Serving scales the application down to zero replicas. If scale-to-zero is disabled, the application is scaled down to the minimum number of replicas configured for applications on the cluster. Replicas can also be scaled up to meet demand if traffic to the application increases.

Autoscaling settings for Knative services can be global settings that are configured by cluster administrators (or dedicated administrators for Red Hat OpenShift Service on AWS and OpenShift Dedicated), or per-revision settings that are configured for individual services.

You can modify per-revision settings for your services by using the OpenShift Container Platform web console, by modifying the YAML file for your service, or by using the Knative (kn) CLI.

Note

Any limits or targets that you set for a service are measured against a single instance of your application. For example, setting the target annotation to 50 configures the autoscaler to scale the application so that each revision handles 50 requests at a time.

2.2. Scale bounds
Copiar enlace

Scale bounds determine the minimum and maximum numbers of replicas that can serve an application at any given time. You can set scale bounds for an application to help prevent cold starts or control computing costs.

2.2.1. Minimum scale bounds
Copiar enlace

The minimum number of replicas that can serve an application is determined by the min-scale annotation. If scale to zero is not enabled, the min-scale value defaults to 1.

The min-scale value defaults to 0 replicas if the following conditions are met:

The min-scale annotation is not set
Scaling to zero is enabled
The class KPA is used

Example service spec with min-scale annotation

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: showcase
  namespace: default
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/min-scale: "0"
...

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: showcase
  namespace: default
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/min-scale: "0"
...

Copy to Clipboard

Toggle word wrap

2.2.1.1. Setting the min-scale annotation by using the Knative CLI
Copiar enlace

Using the Knative (kn) CLI to set the min-scale annotation provides a more streamlined and intuitive user interface over modifying YAML files directly. You can use the kn service command with the --scale-min flag to create or modify the min-scale value for a service.

Prerequisites

Knative Serving is installed on the cluster.
You have installed the Knative (kn) CLI.

Procedure

Set the minimum number of replicas for the service by using the --scale-min flag:

kn service create <service_name> --image <image_uri> --scale-min <integer>

$ kn service create <service_name> --image <image_uri> --scale-min <integer>

Copy to Clipboard

Toggle word wrap

Example command

kn service create showcase --image quay.io/openshift-knative/showcase --scale-min 2

$ kn service create showcase --image quay.io/openshift-knative/showcase --scale-min 2

Copy to Clipboard

Toggle word wrap

2.2.2. Maximum scale bounds
Copiar enlace

The maximum number of replicas that can serve an application is determined by the max-scale annotation. If the max-scale annotation is not set, there is no upper limit for the number of replicas created.

Example service spec with max-scale annotation

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: showcase
  namespace: default
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/max-scale: "10"
...

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: showcase
  namespace: default
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/max-scale: "10"
...

Copy to Clipboard

Toggle word wrap

2.2.2.1. Setting the max-scale annotation by using the Knative CLI
Copiar enlace

Using the Knative (kn) CLI to set the max-scale annotation provides a more streamlined and intuitive user interface over modifying YAML files directly. You can use the kn service command with the --scale-max flag to create or modify the max-scale value for a service.

Prerequisites

Knative Serving is installed on the cluster.
You have installed the Knative (kn) CLI.

Procedure

Set the maximum number of replicas for the service by using the --scale-max flag:

kn service create <service_name> --image <image_uri> --scale-max <integer>

$ kn service create <service_name> --image <image_uri> --scale-max <integer>

Copy to Clipboard

Toggle word wrap

Example command

kn service create showcase --image quay.io/openshift-knative/showcase --scale-max 10

$ kn service create showcase --image quay.io/openshift-knative/showcase --scale-max 10

Copy to Clipboard

Toggle word wrap

2.3. Concurrency
Copiar enlace

Concurrency determines the number of simultaneous requests that can be processed by each replica of an application at any given time. Concurrency can be configured as a soft limit or a hard limit:

A soft limit is a targeted requests limit, rather than a strictly enforced bound. For example, if there is a sudden burst of traffic, the soft limit target can be exceeded.
A hard limit is a strictly enforced upper bound requests limit. If concurrency reaches the hard limit, surplus requests are buffered and must wait until there is enough free capacity to execute the requests.
Important
Using a hard limit configuration is only recommended if there is a clear use case for it with your application. Having a low, hard limit specified may have a negative impact on the throughput and latency of an application, and might cause cold starts.

Adding a soft target and a hard limit means that the autoscaler targets the soft target number of concurrent requests, but imposes a hard limit of the hard limit value for the maximum number of requests.

If the hard limit value is less than the soft limit value, the soft limit value is tuned down, because there is no need to target more requests than the number that can actually be handled.

2.3.1. Configuring a soft concurrency target
Copiar enlace

A soft limit is a targeted requests limit, rather than a strictly enforced bound. For example, if there is a sudden burst of traffic, the soft limit target can be exceeded. You can specify a soft concurrency target for your Knative service by setting the autoscaling.knative.dev/target annotation in the spec, or by using the kn service command with the correct flags.

Procedure

Optional: Set the autoscaling.knative.dev/target annotation for your Knative service in the spec of the Service custom resource:

Example service spec

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: showcase
  namespace: default
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/target: "200"

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: showcase
  namespace: default
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/target: "200"

Copy to Clipboard

Toggle word wrap

Optional: Use the kn service command to specify the --concurrency-target flag:

kn service create <service_name> --image <image_uri> --concurrency-target <integer>

$ kn service create <service_name> --image <image_uri> --concurrency-target <integer>

Copy to Clipboard

Toggle word wrap

Example command to create a service with a concurrency target of 50 requests

kn service create showcase --image quay.io/openshift-knative/showcase --concurrency-target 50

$ kn service create showcase --image quay.io/openshift-knative/showcase --concurrency-target 50

Copy to Clipboard

Toggle word wrap

2.3.2. Configuring a hard concurrency limit
Copiar enlace

A hard concurrency limit is a strictly enforced upper bound requests limit. If concurrency reaches the hard limit, surplus requests are buffered and must wait until there is enough free capacity to execute the requests. You can specify a hard concurrency limit for your Knative service by modifying the containerConcurrency spec, or by using the kn service command with the correct flags.

Procedure

Optional: Set the containerConcurrency spec for your Knative service in the spec of the Service custom resource:
Example service spec
```
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: showcase
  namespace: default
spec:
  template:
    spec:
      containerConcurrency: 50
```
```
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: showcase
  namespace: default
spec:
  template:
    spec:
      containerConcurrency: 50
```
Copy to Clipboard Toggle word wrap
The default value is 0, which means that there is no limit on the number of simultaneous requests that are permitted to flow into one replica of the service at a time.
A value greater than 0 specifies the exact number of requests that are permitted to flow into one replica of the service at a time. This example would enable a hard concurrency limit of 50 requests.

Optional: Use the kn service command to specify the --concurrency-limit flag:

kn service create <service_name> --image <image_uri> --concurrency-limit <integer>

$ kn service create <service_name> --image <image_uri> --concurrency-limit <integer>

Copy to Clipboard

Toggle word wrap

Example command to create a service with a concurrency limit of 50 requests

kn service create showcase --image quay.io/openshift-knative/showcase --concurrency-limit 50

$ kn service create showcase --image quay.io/openshift-knative/showcase --concurrency-limit 50

Copy to Clipboard

Toggle word wrap

2.3.3. Concurrency target utilization
Copiar enlace

This value specifies the percentage of the concurrency limit that is actually targeted by the autoscaler. This is also known as specifying the hotness at which a replica runs, which enables the autoscaler to scale up before the defined hard limit is reached.

For example, if the containerConcurrency value is set to 10, and the target-utilization-percentage value is set to 70 percent, the autoscaler creates a new replica when the average number of concurrent requests across all existing replicas reaches 7. Requests numbered 7 to 10 are still sent to the existing replicas, but additional replicas are started in anticipation of being required after the containerConcurrency value is reached.

Example service configured using the target-utilization-percentage annotation

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: showcase
  namespace: default
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/target-utilization-percentage: "70"
...

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: showcase
  namespace: default
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/target-utilization-percentage: "70"
...

Copy to Clipboard

Toggle word wrap

2.4. Scale-to-zero
Copiar enlace

Knative Serving provides automatic scaling, or autoscaling, for applications to match incoming demand.

2.4.1. Enabling scale-to-zero
Copiar enlace

You can use the enable-scale-to-zero spec to enable or disable scale-to-zero globally for applications on the cluster.

Prerequisites

You have installed OpenShift Serverless Operator and Knative Serving on your cluster.
You have cluster administrator permissions on OpenShift Container Platform, or you have cluster or dedicated administrator permissions on Red Hat OpenShift Service on AWS or OpenShift Dedicated.
You are using the default Knative Pod Autoscaler. The scale to zero feature is not available if you are using the Kubernetes Horizontal Pod Autoscaler.

Procedure

Modify the enable-scale-to-zero spec in the KnativeServing custom resource (CR):
Example KnativeServing CR
```
apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  name: knative-serving
spec:
  config:
    autoscaler:
      enable-scale-to-zero: "false" 
```
```
apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  name: knative-serving
spec:
  config:
    autoscaler:
      enable-scale-to-zero: "false" 
```
1
Copy to Clipboard Toggle word wrap
1
The enable-scale-to-zero spec can be either "true" or "false". If set to true, scale-to-zero is enabled. If set to false, applications are scaled down to the configured minimum scale bound. The default value is "true".

2.4.2. Configuring the scale-to-zero grace period
Copiar enlace

Knative Serving provides automatic scaling down to zero pods for applications. You can use the scale-to-zero-grace-period spec to define an upper bound time limit that Knative waits for scale-to-zero machinery to be in place before the last replica of an application is removed.

Prerequisites

You have installed OpenShift Serverless Operator and Knative Serving on your cluster.
You have cluster administrator permissions on OpenShift Container Platform, or you have cluster or dedicated administrator permissions on Red Hat OpenShift Service on AWS or OpenShift Dedicated.
You are using the default Knative Pod Autoscaler. The scale-to-zero feature is not available if you are using the Kubernetes Horizontal Pod Autoscaler.

Procedure

Modify the scale-to-zero-grace-period spec in the KnativeServing custom resource (CR):

Example KnativeServing CR

apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  name: knative-serving
spec:
  config:
    autoscaler:
      scale-to-zero-grace-period: "30s"

apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  name: knative-serving
spec:
  config:
    autoscaler:
      scale-to-zero-grace-period: "30s"

1

Copy to Clipboard

Toggle word wrap

1: The grace period time in seconds. The default value is 30 seconds.

Este contenido no está disponible en el idioma seleccionado.

2.1. Autoscaling
Copiar enlace

2.2. Scale bounds
Copiar enlace

2.2.1. Minimum scale bounds
Copiar enlace

2.2.1.1. Setting the min-scale annotation by using the Knative CLI
Copiar enlace

2.2.2. Maximum scale bounds
Copiar enlace

2.2.2.1. Setting the max-scale annotation by using the Knative CLI
Copiar enlace

2.3. Concurrency
Copiar enlace

2.3.1. Configuring a soft concurrency target
Copiar enlace

2.3.2. Configuring a hard concurrency limit
Copiar enlace

2.3.3. Concurrency target utilization
Copiar enlace

2.4. Scale-to-zero
Copiar enlace

2.4.1. Enabling scale-to-zero
Copiar enlace

2.4.2. Configuring the scale-to-zero grace period
Copiar enlace

Aprender

Pruebe, compre y venda

Comunidades

Acerca de la documentación de Red Hat

Hacer que el código abierto sea más inclusivo

Acerca de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Este contenido no está disponible en el idioma seleccionado.

Chapter 2. Autoscaling

2.1. AutoscalingCopiar enlaceEnlace copiado en el portapapeles!

2.2. Scale boundsCopiar enlaceEnlace copiado en el portapapeles!

2.2.1. Minimum scale boundsCopiar enlaceEnlace copiado en el portapapeles!

2.2.1.1. Setting the min-scale annotation by using the Knative CLICopiar enlaceEnlace copiado en el portapapeles!

2.2.2. Maximum scale boundsCopiar enlaceEnlace copiado en el portapapeles!

2.2.2.1. Setting the max-scale annotation by using the Knative CLICopiar enlaceEnlace copiado en el portapapeles!

2.3. ConcurrencyCopiar enlaceEnlace copiado en el portapapeles!

2.3.1. Configuring a soft concurrency targetCopiar enlaceEnlace copiado en el portapapeles!

2.3.2. Configuring a hard concurrency limitCopiar enlaceEnlace copiado en el portapapeles!

2.3.3. Concurrency target utilizationCopiar enlaceEnlace copiado en el portapapeles!

2.4. Scale-to-zeroCopiar enlaceEnlace copiado en el portapapeles!

2.4.1. Enabling scale-to-zeroCopiar enlaceEnlace copiado en el portapapeles!

2.4.2. Configuring the scale-to-zero grace periodCopiar enlaceEnlace copiado en el portapapeles!

Aprender

Pruebe, compre y venda

Comunidades

Acerca de la documentación de Red Hat

Hacer que el código abierto sea más inclusivo

Acerca de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

2.1. Autoscaling
Copiar enlace

2.2. Scale bounds
Copiar enlace

2.2.1. Minimum scale bounds
Copiar enlace

2.2.1.1. Setting the min-scale annotation by using the Knative CLI
Copiar enlace

2.2.2. Maximum scale bounds
Copiar enlace

2.2.2.1. Setting the max-scale annotation by using the Knative CLI
Copiar enlace

2.3. Concurrency
Copiar enlace

2.3.1. Configuring a soft concurrency target
Copiar enlace

2.3.2. Configuring a hard concurrency limit
Copiar enlace

2.3.3. Concurrency target utilization
Copiar enlace

2.4. Scale-to-zero
Copiar enlace

2.4.1. Enabling scale-to-zero
Copiar enlace

2.4.2. Configuring the scale-to-zero grace period
Copiar enlace