Appendix B. Frequently Asked Questions
B.1. Cluster Operator
B.1.1. Why do I need cluster admin privileges to install AMQ Streams?
To install AMQ Streams, you must have the ability to create Custom Resource Definitions (CRDs). CRDs instruct OpenShift about resources that are specific to AMQ Streams, such as Kafka, KafkaConnect, and so on. Because CRDs are a cluster-scoped resource rather than being scoped to a particular OpenShift namespace, they typically require cluster admin privileges to install.
In addition, you must also have the ability to create ClusterRoles and ClusterRoleBindings. Like CRDs, these are cluster-scoped resources that typically require cluster admin privileges.
The cluster administrator can inspect all the resources being installed (in the /install/
directory) to assure themselves that the ClusterRoles
do not grant unnecessary privileges. For more information about why the Cluster Operator installation resources grant the ability to create ClusterRoleBindings
see the following question.
After installation, the Cluster Operator will run as a regular Deployment
; any non-admin user with privileges to access the Deployment
can configure it.
By default, normal users will not have the privileges necessary to manipulate the custom resources, such as Kafka
, KafkaConnect
and so on, which the Cluster Operator deals with. These privileges can be granted using normal RBAC resources by the cluster administrator. See this procedure for more details of how to do this.
B.1.2. Why does the Cluster Operator require the ability to create ClusterRoleBindings
? Is that not a security risk?
OpenShift has built-in privilege escalation prevention. That means that the Cluster Operator cannot grant privileges it does not have itself. Which in turn means that the Cluster Operator needs to have the privileges necessary for all the components it orchestrates.
In the context of this question there are two places where the Cluster Operator needs to create bindings to ClusterRoleBindings
to ServiceAccounts
:
-
The Topic Operator and User Operator need to be able to manipulate
KafkaTopics
andKafkaUsers
, respectively. The Cluster Operator therefore needs to be able to grant them this access, which it does by creating aRole
andRoleBinding
. For this reason the Cluster Operator itself needs to be able to createRoles
andRoleBindings
in the namespace that those operators will run in. However, because of the privilege escalation prevention, the Cluster Operator cannot grant privileges it does not have itself (in particular it cannot grant such privileges in namespace it cannot access). -
When using rack-aware partition assignment, AMQ Streams needs to be able to discover the failure domain (for example, the Availability Zone in AWS) of the node on which a broker pod is assigned. To do this the broker pod needs to be able to get information about the
Node
it is running on. ANode
is a cluster-scoped resource, so access to it can only be granted via aClusterRoleBinding
(not a namespace-scopedRoleBinding
). Therefore the Cluster Operator needs to be able to createClusterRoleBindings
. But again, because of privilege escalation prevention, the Cluster Operator cannot grant privileges it does not have itself (so it cannot, for example, create aClusterRoleBinding
to aClusterRole
to grant privileges that the Cluster Operator does not not already have).
B.1.3. Why can standard OpenShift users not create the custom resource (Kafka
, KafkaTopic
, and so on)?
Because, when they installed AMQ Streams, the OpenShift cluster administrator did not grant the necessary privileges to standard users.
See this FAQ answer for more details.
B.1.4. Log contains warnings about failing to acquire lock
For each cluster, the Cluster Operator always executes only one operation at a time. The Cluster Operator uses locks to make sure that there are never two parallel operations running for the same cluster. In case an operation requires more time to complete, other operations will wait until it is completed and the lock is released.
- INFO
- Examples of cluster operations are cluster creation, rolling update, scale down or scale up and so on.
If the wait for the lock takes too long, the operation times out and the following warning message will be printed to the log:
2018-03-04 17:09:24 WARNING AbstractClusterOperations:290 - Failed to acquire lock for kafka cluster lock::kafka::myproject::my-cluster
Depending on the exact configuration of STRIMZI_FULL_RECONCILIATION_INTERVAL_MS
and STRIMZI_OPERATION_TIMEOUT_MS
, this warning message may appear regularly without indicating any problems. The operations which time out will be picked up by the next periodic reconciliation. It will try to acquire the lock again and execute.
Should this message appear periodically even in situations when there should be no other operations running for a given cluster, it might indicate that due to some error the lock was not properly released. In such cases it is recommended to restart the Cluster Operator.
B.1.5. Hostname verification fails when connecting to NodePorts using TLS
Currently, off-cluster access using NodePorts with TLS encryption enabled does not support TLS hostname verification. As a result, the clients that verify the hostname will fail to connect. For example, the Java client will fail with the following exception:
Caused by: java.security.cert.CertificateException: No subject alternative names matching IP address 168.72.15.231 found at sun.security.util.HostnameChecker.matchIP(HostnameChecker.java:168) at sun.security.util.HostnameChecker.match(HostnameChecker.java:94) at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:455) at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:436) at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:252) at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:136) at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1501) ... 17 more
To connect, you must disable hostname verification. In the Java client, you can do this by setting the configuration option ssl.endpoint.identification.algorithm
to an empty string.
When configuring the client using a properties file, you can do it this way:
ssl.endpoint.identification.algorithm=
When configuring the client directly in Java, set the configuration option to an empty string:
props.put("ssl.endpoint.identification.algorithm", "");