Chapter 2. Troubleshooting node network configuration
If the node network configuration encounters an issue, the policy is automatically rolled back and the enactments report failure. This includes issues such as:
- The configuration fails to be applied on the host.
- The host loses connection to the default gateway.
- The host loses connection to the API server.
2.1. Troubleshooting an incorrect node network configuration policy configuration Copy linkLink copied to clipboard!
You can apply changes to the node network configuration across your entire cluster by applying a node network configuration policy.
If you applied an incorrect configuration, you can use the following example to troubleshoot and correct the failed node network policy. The example attempts to apply a Linux bridge policy to a cluster that has three control plane nodes and three compute nodes. The policy is not applied because the policy references the wrong interface.
To find an error, you need to investigate the available NMState resources. You can then update the policy with the correct configuration.
Prerequisites
-
You installed the OpenShift CLI (
oc
). -
You ensured that an
ens01
interface does not exist on your Linux system.
Procedure
Create a policy on your cluster. The following example creates a simple bridge,
br1
that hasens01
as its member:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the policy to your network interface:
oc apply -f ens01-bridge-testfail.yaml
$ oc apply -f ens01-bridge-testfail.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
nodenetworkconfigurationpolicy.nmstate.io/ens01-bridge-testfail created
nodenetworkconfigurationpolicy.nmstate.io/ens01-bridge-testfail created
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify the status of the policy by running the following command:
oc get nncp
$ oc get nncp
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The output shows that the policy failed:
Example output
NAME STATUS ens01-bridge-testfail FailedToConfigure
NAME STATUS ens01-bridge-testfail FailedToConfigure
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The policy status alone does not indicate if it failed on all nodes or a subset of nodes.
List the node network configuration enactments to see if the policy was successful on any of the nodes. If the policy failed for only a subset of nodes, the output suggests that the problem is with a specific node configuration. If the policy failed on all nodes, the output suggests that the problem is with the policy.
oc get nnce
$ oc get nnce
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The output shows that the policy failed on all nodes:
Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow View one of the failed enactments. The following command uses the output tool
jsonpath
to filter the output:oc get nnce compute-1.ens01-bridge-testfail -o jsonpath='{.status.conditions[?(@.type=="Failing")].message}'
$ oc get nnce compute-1.ens01-bridge-testfail -o jsonpath='{.status.conditions[?(@.type=="Failing")].message}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
[2024-10-10T08:40:46Z INFO nmstatectl] Nmstate version: 2.2.37 NmstateError: InvalidArgument: Controller interface br1 is holding unknown port ens01
[2024-10-10T08:40:46Z INFO nmstatectl] Nmstate version: 2.2.37 NmstateError: InvalidArgument: Controller interface br1 is holding unknown port ens01
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The previous example shows the output from an
InvalidArgument
error that indicates that theens01
is an unknown port. For this example, you might need to change the port configuration in the policy configuration file.To ensure that the policy is configured properly, view the network configuration for one or all of the nodes by requesting the
NodeNetworkState
object. The following command returns the network configuration for thecontrol-plane-1
node:oc get nns control-plane-1 -o yaml
$ oc get nns control-plane-1 -o yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The output shows that the interface name on the nodes is
ens1
but the failed policy incorrectly usesens01
:Example output
- ipv4: # ... name: ens1 state: up type: ethernet
- ipv4: # ... name: ens1 state: up type: ethernet
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Correct the error by editing the existing policy:
oc edit nncp ens01-bridge-testfail
$ oc edit nncp ens01-bridge-testfail
Copy to Clipboard Copied! Toggle word wrap Toggle overflow # ... port: - name: ens1
# ... port: - name: ens1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Save the policy to apply the correction.
Check the status of the policy to ensure it updated successfully:
oc get nncp
$ oc get nncp
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME STATUS ens01-bridge-testfail SuccessfullyConfigured
NAME STATUS ens01-bridge-testfail SuccessfullyConfigured
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The updated policy is successfully configured on all nodes in the cluster.
2.2. Troubleshooting DNS connectivity issues in a disconnected environment Copy linkLink copied to clipboard!
If you experience health check probe issues when configuring nmstate
in a disconnected environment, you can configure the DNS server to resolve the custom domain name instead of the default root-servers.net
domain.
Ensure that the DNS server includes a name server (NS) entry for the root-servers.net
zone. The DNS server does not need to forward a query to an upstream resolver, but the server must return a correct answer for the NS query.
2.2.1. Configuring the bind9 DNS named server Copy linkLink copied to clipboard!
For a cluster configured to query a bind9
DNS server, you can add the root-servers.net
zone to a configuration file that contains at least one DNS record. For example you can use the /var/named/named.localhost
as a zone file that already matches this criteria.
Procedure
Add the
root-servers.net
zone at the end of the/etc/named.conf
configuration file by running the following command:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Restart the
named
service by running the following command:systemctl restart named
$ systemctl restart named
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm that the
root-servers.net
zone is present by running the following command:journalctl -u named|grep root-servers.net
$ journalctl -u named|grep root-servers.net
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Jul 03 15:16:26 rhel-8-10 bash[xxxx]: zone root-servers.net/IN: loaded serial 0 Jul 03 15:16:26 rhel-8-10 named[xxxx]: zone root-servers.net/IN: loaded serial 0
Jul 03 15:16:26 rhel-8-10 bash[xxxx]: zone root-servers.net/IN: loaded serial 0 Jul 03 15:16:26 rhel-8-10 named[xxxx]: zone root-servers.net/IN: loaded serial 0
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the DNS server can resolve the NS record for the
root-servers.net
domain by running the following command:host -t NS root-servers.net. 127.0.0.1
$ host -t NS root-servers.net. 127.0.0.1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Using domain server: Name: 127.0.0.1 Address: 127.0.0.53 Aliases: root-servers.net name server root-servers.net.
Using domain server: Name: 127.0.0.1 Address: 127.0.0.53 Aliases: root-servers.net name server root-servers.net.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
2.2.2. Configuring the dnsmasq DNS server Copy linkLink copied to clipboard!
If you are using dnsmasq
as the DNS server, you can delegate resolution of the root-servers.net
domain to another DNS server, for example, by creating a new configuration file that resolves root-servers.net
using a DNS server that you specify.
Create a configuration file that delegates the domain
root-servers.net
to another DNS server by running the following command:echo 'server=/root-servers.net/<DNS_server_IP>'> /etc/dnsmasq.d/delegate-root-servers.net.conf
$ echo 'server=/root-servers.net/<DNS_server_IP>'> /etc/dnsmasq.d/delegate-root-servers.net.conf
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Restart the
dnsmasq
service by running the following command:systemctl restart dnsmasq
$ systemctl restart dnsmasq
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm that the
root-servers.net
domain is delegated to another DNS server by running the following command:journalctl -u dnsmasq|grep root-servers.net
$ journalctl -u dnsmasq|grep root-servers.net
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Jul 03 15:31:25 rhel-8-10 dnsmasq[1342]: using nameserver 192.168.1.1#53 for domain root-servers.net
Jul 03 15:31:25 rhel-8-10 dnsmasq[1342]: using nameserver 192.168.1.1#53 for domain root-servers.net
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the DNS server can resolve the NS record for the
root-servers.net
domain by running the following command:host -t NS root-servers.net. 127.0.0.1
$ host -t NS root-servers.net. 127.0.0.1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Using domain server: Name: 127.0.0.1 Address: 127.0.0.1#53 Aliases: root-servers.net name server root-servers.net.
Using domain server: Name: 127.0.0.1 Address: 127.0.0.1#53 Aliases: root-servers.net name server root-servers.net.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
2.2.3. Creating a custom DNS host name to resolve DNS connectivity issues Copy linkLink copied to clipboard!
In a disconnected environment where the external DNS server cannot be reached, you can resolve Kubernetes NMState Operator health probe issues by specifying a custom DNS host name in the NMState
custom resource definition (CRD).
Procedure
Add the DNS host name configuration to the
NMState
CRD of your cluster:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the DNS host name configuration to your cluster network by running the following command. Ensure that you replace
<filename>
with the name of your CRD file.$ oc apply -f <filename>.yaml
$ oc apply -f <filename>.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow