Este contenido no está disponible en el idioma seleccionado.
Chapter 2. Scalability and performance of OpenShift Serverless Serving
OpenShift Serverless consists of several different components that have different resource requirements and scaling behaviors. These components are horizontally and vertically scalable, but their resource requirements and configuration highly depend on the actual use-case.
- Control-plane components
- These components are responsible for observing and reacting to custom resources and continuously reconfiguring the system, for example, the controller pods.
- Data-plane components
- These components are directly involved in requests and response handling, for example, the Knative Servings activator component.
The following metrics and findings were recorded using the following test setup:
- A cluster running OpenShift Container Platform 4.13
- The cluster running 4 compute nodes in AWS with a machine type of m6.xlarge
- OpenShift Serverless 1.30
2.1. Overhead of OpenShift Serverless Serving
As components of OpenShift Serverless Serving are part of the data-plane, requests from clients are routed through:
- The ingress-gateway (Kourier or Service Mesh)
- The activator component
- The queue-proxy sidecar container in each Knative Service
These components introduce an additional hop in networking and perform additional tasks, for example, adding observability and request queuing. The following are the measured latency overheads:
- Each additional network hop adds 0.5 ms to 1 ms latency to a request. Depending on the current load of the Knative Service and if the Knative Service was scaled to zero before the request, the activator component is not always a part of the data-plane.
- Depending on the payload size, each of the components is consuming up to 1 vCPU of CPU for handling 2500 requests per second.
2.2. Known limitations of OpenShift Serverless Serving
The maximum number of Knative Services that can be created is 3,000. This corresponds to the OpenShift Container Platform Kubernetes services limit of 10,000, since 1 Knative Service creates 3 Kubernetes services.
2.3. Scaling and performance of OpenShift Serverless Serving
OpenShift Serverless Serving has to be scaled and configured based on the following parameters:
- Number of Knative Services
- Number of Revisions
- Amount of concurrent requests in the system
- Size of payloads of the requests
- The startup-latency and response latency of the Knative Service added by the user’s web application
- Number of changes of the KnativeService custom resource (CR) over time
2.3.1. KnativeServing default configuration
					Per default, OpenShift Serverless Serving is configured to run all components with high-availability and medium-sized CPU and memory requests and limits. This means that the high-available field in KnativeServing CR is automatically set to a value of 2 and all system components are scaled to two replicas. This configuration is suitable for medium workload scenarios and has been tested with:
				
- 170 Knative Services
- 1-2 Revisions per Knative Service
- 89 test scenarios mainly focused on testing the control plane
- 48 re-creating scenarios where Knative Services are deleted and re-created
- 41 stable scenarios, in which requests are slowly but continuously sent to the system
During these test cases, the system components effectively consumed:
| Component | Measured Resources | 
|---|---|
| 
									Operator in project  | 1 GB Memory, 0.2 Cores of CPU | 
| 
									Serving components in project  | 5 GB Memory, 2.5 Cores of CPU | 
2.3.2. Minimal requirements of OpenShift Serverless Serving
While the default setup is suitable for medium-sized workloads, it might be over-sized for smaller setups or under-sized for high-workload scenarios. To configure OpenShift Serverless Serving for a minimal workload scenario, you need to know the idle consumption of the system components.
2.3.2.1. Idle consumption
						The idle consumption is dependent on the number of Knative Services. The following memory usage has been measured for the components in the knative-serving and knative-serving-ingress OpenShift Container Platform projects:
					
| Component | 0 Services | 100 Services | 500 Services | 1000 Services | 
|---|---|---|---|---|
| 
										 | 55Mi | 86Mi | 300Mi | 450Mi | 
| 
										 | 52Mi | 102Mi | 225Mi | 350Mi | 
| 
										 | 100Mi | 135Mi | 310Mi | 500Mi | 
| 
										 | 60Mi | 60Mi | 60Mi | 60Mi | 
| 
										 | 20Mi | 60Mi | 190Mi | 330Mi | 
| 
										 | 90Mi | 170Mi | 340Mi | 430Mi | 
							Either 3scale-kourier-gateway and net-kourier-controller components or istio-ingressgateway and net-istio-controller components are installed.
						
							The memory consumption of net-istio is based on the total number of pods within the mesh.
						
2.3.3. Configuring Serving for minimal workloads
Procedure
- You can configure Knative Serving for minimal workloads using the - KnativeServingcustom resource (CR):- A minimal workload configuration in KnativeServing CR - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Setting this to1scales all system components to one replica.
- 2
- Activator should always be scaled to a minimum of2instances to avoid downtime.
- 3
- Activator CPU requests should not be set lower than250m, as aHorizontalPodAutoscalerwill use this as a reference to scale up and down.
- 4
- Adjust memory requests to the idle values from the previous table. Also adjust memory limits according to your expected load (this might need custom testing to find the best values).
- 5
- One webhook and one controller are sufficient for a minimal-workload scenario
- 6
- These limits are sufficient for a minimal-workload scenario, but they also might need adjustments depending on your concrete workload.
- 7
- Webhook CPU requests should not be set lower than100m, as a HorizontalPodAutoscaler will use this as a reference to scale up and down.
- 8
- Adjust thePodDistruptionBudgetsto a value lower thanreplicas, to avoid problems during node maintenance.
 
2.3.4. Configuring Serving for high workloads
					You can configure Knative Serving for high workloads using the KnativeServing custom resource (CR). The following findings are relevant to configuring Knative Serving for a high workload:
				
These findings have been tested with requests with a payload size of 0-32 kb. The Knative Service backends used in those tests had a startup latency between 0 to 10 seconds and response times between 0 to 5 seconds.
- All data-plane components are mostly increasing CPU usage on higher requests and payload scenarios, so the CPU requests and limits have to be tested and potentially increased.
- The activator component also might need more memory, when it has to buffer more or bigger request payloads, so the memory requests and limits might need to be increased as well.
- One activator pod can handle approximately 2500 requests per second before it starts to increase latency and, at some point, leads to errors.
- 
							One 3scale-kourier-gatewayoristio-ingressgatewaypod can also handle approximately 2500 requests per second before it starts to increase latency and, at some point, leads to errors.
- Each of the data-plane components consumes up to 1 vCPU of CPU for handling 2500 requests per second. Note that this highly depends on the payload size and the response times of the Knative Service backend.
						Fast startup and fast response-times of your Knative Service user workloads are critical for good performance of the overall system. The Knative Serving components are buffering incoming requests when the Knative Service user backend is scaling up or when request concurrency has reached its capacity. If your Knative Service user workload introduces long startup or request latency, it will either overload the activator component (when the CPU and memory configuration is too low) or lead to errors for the calling clients.
					
Procedure
- To fine-tune your installation, use the previous findings combined with your own test results to configure the - KnativeServingcustom resource:- A high workload configuration in KnativeServing CR - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Set this parameter to at least2to make sure you always have at least two instances of every component running. You can also useworkloadsto override the replicas for certain components.
- 2
- Use theworkloadslist to configure specific components. Use thedeploymentname of the component and set thereplicasfield.
- 3
- For theactivator,webhook, and3scale-kourier-gatewaycomponents, which use horizontal pod autoscalers (HPAs), thereplicasfield sets the minimum number of replicas. The actual number of replicas depends on the CPU load and scaling done by the HPAs.
- 4
- Set the requested and limited CPU and memory according to at least the idle consumption while also taking the previous findings and your own test results into consideration.
- 5
- Adjust thePodDistruptionBudgetsto a value lower thanreplicasto avoid problems during node maintenance. The defaultminAvailableis set to1, so if you increase the required replicas, you must also increaseminAvailable.
 
As each environment is highly specific, it is essential to test and find your own ideal configuration. Use the monitoring and alerting functionality of OpenShift Container Platform to continuously monitor your actual resource consumption and make adjustments if needed.
						If you are using the OpenShift Serverless and Service Mesh integration, additional CPU processing is added by the istio-proxy sidecar containers. For more information about this, see the Service Mesh documentation.