Chapter 13. Using CPU Manager
13.1. What CPU Manager Does Copy linkLink copied to clipboard!
CPU Manager manages groups of CPUs and constrains workloads to specific CPUs.
CPU Manager is useful for workloads that have some of these attributes:
- Require as much CPU time as possible.
- Are sensitive to processor cache misses.
- Are low-latency network applications.
- Coordinate with other processes and benefit from sharing a single processor cache.
13.2. Setting up CPU Manager Copy linkLink copied to clipboard!
To set up CPU Manager:
Optionally, label a node:
# oc label node perf-node.example.com cpumanager=trueEnable CPU manager support on the target node:
# oc edit configmap <name> -n openshift-nodeFor example:
# oc edit cm node-config-compute -n openshift-nodeExample Output
... kubeletArguments: ... feature-gates: - CPUManager=true cpu-manager-policy: - static cpu-manager-reconcile-period: - 5s system-reserved:1 - cpu=500m# systemctl restart atomic-openshift-node- 1
system-reservedis a required setting. The value might need to be adjusted depending on your environment.
Create a pod that requests a core or multiple cores. Both limits and requests must have their CPU value set to a whole integer. That is the number of cores that will be dedicated to this pod:
# cat cpumanager.yamlExample Output
apiVersion: v1 kind: Pod metadata: generateName: cpumanager- spec: containers: - name: cpumanager image: gcr.io/google_containers/pause-amd64:3.0 resources: requests: cpu: 1 memory: "1G" limits: cpu: 1 memory: "1G" nodeSelector: cpumanager: "true"Create the pod:
# oc create -f cpumanager.yamlVerify that the pod is scheduled to the node that you labeled:
# oc describe pod cpumanagerExample Output
Name: cpumanager-4gdtn Namespace: test Node: perf-node.example.com/172.31.62.105 ... Limits: cpu: 1 memory: 1G Requests: cpu: 1 memory: 1G ... QoS Class: Guaranteed Node-Selectors: cpumanager=true region=primaryVerify that the
cgroupsare set up correctly. Get the PID of the pause process:# systemd-cgls -lExample Output
├─1 /usr/lib/systemd/systemd --system --deserialize 20 ├─kubepods.slice │ ├─kubepods-pod0ec1ab8b_e1c4_11e7_bb22_027b30990a24.slice │ │ ├─docker-b24e29bc4021064057f941dc5f3538595c317d294f2c8e448b5e61a29c026d1c.scope │ │ │ └─44216 /pausePods of QoS tier
Guaranteedare placed within thekubepods.slice. Pods of other QoS tiers end up in childcgroupsofkubepods.# cd /sys/fs/cgroup/cpuset/kubepods.slice/kubepods-pod0ec1ab8b_e1c4_11e7_bb22_027b30990a24.slice/docker-b24e29bc4021064057f941dc5f3538595c317d294f2c8e448b5e61a29c026d1c.scope # for i in `ls cpuset.cpus tasks` ; do echo -n "$i "; cat $i ; doneExample Output
cpuset.cpus 2 tasks 44216Check the allowed CPU list for the task:
# grep ^Cpus_allowed_list /proc/44216/statusExample Output
Cpus_allowed_list: 2Verify that another pod (in this case, the pod in the
burstableQoS tier) on the system can not run on the core allocated for theGuaranteedpod:# cat /sys/fs/cgroup/cpuset/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podbe76ff22_dead_11e7_b99e_027b30990a24.slice/docker-da621bea7569704fc39f84385a179923309ab9d832f6360cccbff102e73f9557.scope/cpuset.cpus 0-1,3# oc describe node perf-node.example.comExample Output
... Capacity: cpu: 4 memory: 16266720Ki pods: 40 Allocatable: cpu: 3500m memory: 16164320Ki pods: 40 --- Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- test cpumanager-4gdtn 1 (28%) 1 (28%) 1G (6%) 1G (6%) test cpumanager-hczts 1 (28%) 1 (28%) 1G (6%) 1G (6%) test cpumanager-r9wrq 1 (28%) 1 (28%) 1G (6%) 1G (6%) ... Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 3 (85%) 3 (85%) 5437500k (32%) 9250M (55%)This VM has four CPU cores. You set
system-reservedto 500 millicores, meaning half of one core is subtracted from the total capacity of the node to arrive at theNode Allocatableamount.You can see that
Allocatable CPUis 3500 millicores. This means we can run three of our CPU manager pods since each will take one whole core. A whole core is equivalent to 1000 millicores.If you try to schedule a fourth pod, the system will accept the pod, but it will never be scheduled:
# oc get pods --all-namespaces |grep testExample Output
test cpumanager-4gdtn 1/1 Running 0 8m test cpumanager-hczts 1/1 Running 0 8m test cpumanager-nb9d5 0/1 Pending 0 8m test cpumanager-r9wrq 1/1 Running 0 8m