Home
Products
Red Hat OpenShift Serverless
1.34
Serving
Chapter 2. Scalability and performance of OpenShift Serverless Serving

Chapter 2. Scalability and performance of OpenShift Serverless Serving

OpenShift Serverless consists of several different components that have different resource requirements and scaling behaviors. These components are horizontally and vertically scalable, but their resource requirements and configuration highly depend on the actual use-case.

Control-plane components: These components are responsible for observing and reacting to custom resources and continuously reconfiguring the system, for example, the controller pods.
Data-plane components: These components are directly involved in requests and response handling, for example, the Knative Servings activator component.

The following metrics and findings were recorded using the following test setup:

A cluster running OpenShift Container Platform 4.13
The cluster running 4 compute nodes in AWS with a machine type of m6.xlarge
OpenShift Serverless 1.30

2.1. Overhead of OpenShift Serverless Serving
Copy link

As components of OpenShift Serverless Serving are part of the data-plane, requests from clients are routed through:

The ingress-gateway (Kourier or Service Mesh)
The activator component
The queue-proxy sidecar container in each Knative Service

These components introduce an additional hop in networking and perform additional tasks, for example, adding observability and request queuing. The following are the measured latency overheads:

Each additional network hop adds 0.5 ms to 1 ms latency to a request. Depending on the current load of the Knative Service and if the Knative Service was scaled to zero before the request, the activator component is not always a part of the data-plane.
Depending on the payload size, each of the components is consuming up to 1 vCPU of CPU for handling 2500 requests per second.

2.2. Known limitations of OpenShift Serverless Serving
Copy link

The maximum number of Knative Services that can be created is 3,000. This corresponds to the OpenShift Container Platform Kubernetes services limit of 10,000, since 1 Knative Service creates 3 Kubernetes services.

2.3. Scaling and performance of OpenShift Serverless Serving
Copy link

OpenShift Serverless Serving has to be scaled and configured based on the following parameters:

Number of Knative Services
Number of Revisions
Amount of concurrent requests in the system
Size of payloads of the requests
The startup-latency and response latency of the Knative Service added by the user’s web application
Number of changes of the KnativeService custom resource (CR) over time

2.3.1. KnativeServing default configuration
Copy link

Per default, OpenShift Serverless Serving is configured to run all components with high-availability and medium-sized CPU and memory requests and limits. This means that the high-available field in KnativeServing CR is automatically set to a value of 2 and all system components are scaled to two replicas. This configuration is suitable for medium workload scenarios and has been tested with:

170 Knative Services
1-2 Revisions per Knative Service
89 test scenarios mainly focused on testing the control plane
48 re-creating scenarios where Knative Services are deleted and re-created
41 stable scenarios, in which requests are slowly but continuously sent to the system

During these test cases, the system components effectively consumed:

Expand

Component	Measured Resources
Operator in project `openshift-serverless`	1 GB Memory, 0.2 Cores of CPU
Serving components in project `knative-serving`	5 GB Memory, 2.5 Cores of CPU

2.3.2. Minimal requirements of OpenShift Serverless Serving
Copy link

While the default setup is suitable for medium-sized workloads, it might be over-sized for smaller setups or under-sized for high-workload scenarios. To configure OpenShift Serverless Serving for a minimal workload scenario, you need to know the idle consumption of the system components.

2.3.2.1. Idle consumption
Copy link

The idle consumption is dependent on the number of Knative Services. The following memory usage has been measured for the components in the knative-serving and knative-serving-ingress OpenShift Container Platform projects:

Expand

Component	0 Services	100 Services	500 Services	1000 Services
`activator`	55Mi	86Mi	300Mi	450Mi
`autoscaler`	52Mi	102Mi	225Mi	350Mi
`controller`	100Mi	135Mi	310Mi	500Mi
`webhook`	60Mi	60Mi	60Mi	60Mi
`3scale-kourier-gateway`	20Mi	60Mi	190Mi	330Mi
`net-kourier-controller`	90Mi	170Mi	340Mi	430Mi

Note

Either 3scale-kourier-gateway and net-kourier-controller components or istio-ingressgateway and net-istio-controller components are installed.

The memory consumption of net-istio is based on the total number of pods within the mesh.

2.3.3. Configuring Serving for minimal workloads
Copy link

Procedure

You can configure Knative Serving for minimal workloads using the KnativeServing custom resource (CR):

A minimal workload configuration in KnativeServing CR

apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  name: knative-serving
  namespace: knative-serving
spec:
  high-availability:
    replicas: 1 
  workloads:
    - name: activator
      replicas: 2 
      resources:
        - container: activator
          requests:
            cpu: 250m 
            memory: 60Mi 
          limits:
            cpu: 1000m
            memory: 600Mi
    - name: controller
      replicas: 1 
      resources:
        - container: controller
          requests:
            cpu: 10m
            memory: 100Mi
          limits: 
            cpu: 200m
            memory: 300Mi
    - name: webhook
      replicas: 2
      resources:
        - container: webhook
          requests:
            cpu: 100m 
            memory: 60Mi
          limits:
            cpu: 200m
            memory: 200Mi
  podDisruptionBudgets: 
    - name: activator-pdb
      minAvailable: 1
    - name: webhook-pdb
      minAvailable: 1

apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  name: knative-serving
  namespace: knative-serving
spec:
  high-availability:
    replicas: 1


  workloads:
    - name: activator
      replicas: 2


      resources:
        - container: activator
          requests:
            cpu: 250m


            memory: 60Mi


          limits:
            cpu: 1000m
            memory: 600Mi
    - name: controller
      replicas: 1


      resources:
        - container: controller
          requests:
            cpu: 10m
            memory: 100Mi
          limits:


            cpu: 200m
            memory: 300Mi
    - name: webhook
      replicas: 2
      resources:
        - container: webhook
          requests:
            cpu: 100m


            memory: 60Mi
          limits:
            cpu: 200m
            memory: 200Mi
  podDisruptionBudgets:


    - name: activator-pdb
      minAvailable: 1
    - name: webhook-pdb
      minAvailable: 1

Copy to Clipboard

Toggle word wrap

1: Setting this to 1 scales all system components to one replica.
2: Activator should always be scaled to a minimum of 2 instances to avoid downtime.
3: Activator CPU requests should not be set lower than 250m, as a HorizontalPodAutoscaler will use this as a reference to scale up and down.
4: Adjust memory requests to the idle values from the previous table. Also adjust memory limits according to your expected load (this might need custom testing to find the best values).
5: One webhook and one controller are sufficient for a minimal-workload scenario
6: These limits are sufficient for a minimal-workload scenario, but they also might need adjustments depending on your concrete workload.
7: Webhook CPU requests should not be set lower than 100m, as a HorizontalPodAutoscaler will use this as a reference to scale up and down.
8: Adjust the PodDistruptionBudgets to a value lower than replicas, to avoid problems during node maintenance.

2.3.4. Configuring Serving for high workloads
Copy link

You can configure Knative Serving for high workloads using the KnativeServing custom resource (CR). The following findings are relevant to configuring Knative Serving for a high workload:

Note

These findings have been tested with requests with a payload size of 0-32 kb. The Knative Service backends used in those tests had a startup latency between 0 to 10 seconds and response times between 0 to 5 seconds.

All data-plane components are mostly increasing CPU usage on higher requests and payload scenarios, so the CPU requests and limits have to be tested and potentially increased.
The activator component also might need more memory, when it has to buffer more or bigger request payloads, so the memory requests and limits might need to be increased as well.
One activator pod can handle approximately 2500 requests per second before it starts to increase latency and, at some point, leads to errors.
One 3scale-kourier-gateway or istio-ingressgateway pod can also handle approximately 2500 requests per second before it starts to increase latency and, at some point, leads to errors.
Each of the data-plane components consumes up to 1 vCPU of CPU for handling 2500 requests per second. Note that this highly depends on the payload size and the response times of the Knative Service backend.

Important

Fast startup and fast response-times of your Knative Service user workloads are critical for good performance of the overall system. The Knative Serving components are buffering incoming requests when the Knative Service user backend is scaling up or when request concurrency has reached its capacity. If your Knative Service user workload introduces long startup or request latency, it will either overload the activator component (when the CPU and memory configuration is too low) or lead to errors for the calling clients.

Procedure

To fine-tune your installation, use the previous findings combined with your own test results to configure the KnativeServing custom resource:
A high workload configuration in KnativeServing CR
```
apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  name: knative-serving
  namespace: knative-serving
spec:
  high-availability:
    replicas: 2 
  workloads:
    - name: component-name 
      replicas: 2 
      resources:
        - container: container-name
          requests:
            cpu: 
            memory:
          limits:
            cpu:
            memory:
  podDisruptionBudgets: 
    - name: name-of-pod-disruption-budget
      minAvailable: 1
```
```
apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  name: knative-serving
  namespace: knative-serving
spec:
  high-availability:
    replicas: 2 
```
1
```
  workloads:
    - name: component-name 
```
2
```
      replicas: 2 
```
3
```
      resources:
        - container: container-name
          requests:
            cpu: 
```
4
```
            memory:
          limits:
            cpu:
            memory:
  podDisruptionBudgets: 
```
5
```
    - name: name-of-pod-disruption-budget
      minAvailable: 1
```
Copy to Clipboard Toggle word wrap
1
Set this parameter to at least 2 to make sure you always have at least two instances of every component running. You can also use workloads to override the replicas for certain components.
2
Use the workloads list to configure specific components. Use the deployment name of the component and set the replicas field.
3
For the activator, webhook, and 3scale-kourier-gateway components, which use horizontal pod autoscalers (HPAs), the replicas field sets the minimum number of replicas. The actual number of replicas depends on the CPU load and scaling done by the HPAs.
4
Set the requested and limited CPU and memory according to at least the idle consumption while also taking the previous findings and your own test results into consideration.
5
Adjust the PodDistruptionBudgets to a value lower than replicas to avoid problems during node maintenance. The default minAvailable is set to 1, so if you increase the required replicas, you must also increase minAvailable.

Important

As each environment is highly specific, it is essential to test and find your own ideal configuration. Use the monitoring and alerting functionality of OpenShift Container Platform to continuously monitor your actual resource consumption and make adjustments if needed.

If you are using the OpenShift Serverless and Service Mesh integration, additional CPU processing is added by the istio-proxy sidecar containers. For more information about this, see the Service Mesh documentation.

Chapter 2. Scalability and performance of OpenShift Serverless Serving

2.1. Overhead of OpenShift Serverless Serving
Copy link

2.2. Known limitations of OpenShift Serverless Serving
Copy link

2.3. Scaling and performance of OpenShift Serverless Serving
Copy link

2.3.1. KnativeServing default configuration
Copy link

2.3.2. Minimal requirements of OpenShift Serverless Serving
Copy link

2.3.2.1. Idle consumption
Copy link

2.3.3. Configuring Serving for minimal workloads
Copy link

2.3.4. Configuring Serving for high workloads
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 2. Scalability and performance of OpenShift Serverless Serving

2.1. Overhead of OpenShift Serverless ServingCopy linkLink copied to clipboard!

2.2. Known limitations of OpenShift Serverless ServingCopy linkLink copied to clipboard!

2.3. Scaling and performance of OpenShift Serverless ServingCopy linkLink copied to clipboard!

2.3.1. KnativeServing default configurationCopy linkLink copied to clipboard!

2.3.2. Minimal requirements of OpenShift Serverless ServingCopy linkLink copied to clipboard!

2.3.2.1. Idle consumptionCopy linkLink copied to clipboard!

2.3.3. Configuring Serving for minimal workloadsCopy linkLink copied to clipboard!

2.3.4. Configuring Serving for high workloadsCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

2.1. Overhead of OpenShift Serverless Serving
Copy link

2.2. Known limitations of OpenShift Serverless Serving
Copy link

2.3. Scaling and performance of OpenShift Serverless Serving
Copy link

2.3.1. KnativeServing default configuration
Copy link

2.3.2. Minimal requirements of OpenShift Serverless Serving
Copy link

2.3.2.1. Idle consumption
Copy link

2.3.3. Configuring Serving for minimal workloads
Copy link

2.3.4. Configuring Serving for high workloads
Copy link