Chapter 15. Troubleshooting


There are cases where the Assisted Installer cannot begin the installation or the cluster fails to install properly. In these events, it is helpful to understand the likely failure modes as well as how to troubleshoot the failure.

15.1. Prerequisites

  • You have created an infrastructure environment using the API or have created a cluster using the UI.

15.2. Troubleshooting discovery ISO issues

The Assisted Installer uses an ISO image to run an agent that registers the host to the cluster and performs hardware and network validations before attempting to install OpenShift. You can follow these procedures to troubleshoot problems related to the host discovery.

Once you start the host with the discovery ISO image, the Assisted Installer discovers the host and presents it in the Assisted Service UI.

See Configuring the discovery image for additional details.

15.3. Minimal ISO Image

The minimal ISO image should be used when bandwidth over the virtual media connection is limited. It includes only what is required to boot a host with networking. The majority of the content is downloaded upon boot. The resulting ISO image is about 100MB in size compared to 1GB for the full ISO image.

15.3.1. Troubleshooting minimal ISO boot failures

If your environment requires static network configuration to access the Assisted Installer service, any issues with that configuration may prevent the Minimal ISO from booting properly. If the boot screen shows that the host has failed to download the root file system image, verify that any additional network configuration is correct. Switching to a Full ISO image will also allow for easier debugging.

Example rootfs download failure

screenshot of failing root file system image download

15.4. Verify the discovery agent is running

Prerequisites

  • You have created an Infrastructure Environment by using the API or have created a cluster by using the UI.
  • You booted a host with the Infrastructure Environment discovery ISO and the host failed to register.
  • You have ssh access to the host.
  • You provided an SSH public key in the "Add hosts" dialog before generating the Discovery ISO so that you can SSH into your machine without a password.

Procedure

  1. Verify that your host machine is powered on.
  2. If you selected DHCP networking, check that the DHCP server is enabled.
  3. If you selected Static IP, bridges and bonds networking, check that your configurations are correct.
  4. Verify that you can access your host machine using SSH, a console such as the BMC, or a virtual machine console:

    $ ssh core@<host_ip_address>

    You can specify private key file using the -i parameter if it isn’t stored in the default directory.

    $ ssh -i <ssh_private_key_file> core@<host_ip_address>

    If you fail to ssh to the host, the host failed during boot or it failed to configure the network.

    Upon login you should see this message:

    Example login

    screenshot of assisted iso login message If you are not seeing this message it means that the host didn’t boot with the assisted-installer ISO. Make sure you configured the boot order properly (The host should boot once from the live-ISO).

  5. Check the agent service logs:

    $ sudo journalctl -u agent.service

    In the following example, the errors indicate there is a network issue:

    Example agent service log screenshot of agent service log

    screenshot of agent service log

    If there is an error pulling the agent image, check the proxy settings. Verify that the host is connected to the network. You can use nmcli to get additional information about your network configuration.

15.5. Verify the agent can access the assisted-service

Prerequisites

  • You have created an Infrastructure Environment by using the API or have created a cluster by using the UI.
  • You booted a host with the Infrastructure Environment discovery ISO and the host failed to register.
  • You verified the discovery agent is running.

Procedure

  • Check the agent logs to verify the agent can access the Assisted Service:

    $ sudo journalctl TAG=agent

    The errors in the following example indicate that the agent failed to access the Assisted Service.

    Example agent log

    screenshot of the agent log failing to access the Assisted Service

    Check the proxy settings you configured for the cluster. If configured, the proxy must allow access to the Assisted Service URL.

15.6. Correcting a host’s boot order

Once the installation that runs as part of the Discovery Image completes, the Assisted Installer reboots the host.  The host must boot from its installation disk to continue forming the cluster.  If you have not correctly configured the host’s boot order, it will boot from another disk instead, interrupting the installation.

If the host boots the discovery image again, the Assisted Installer will immediately detect this event and set the host’s status to Installing Pending User Action.  Alternatively, if the Assisted Installer does not detect that the host has booted the correct disk within the allotted time, it will also set this host status.

Procedure

  • Reboot the host and set its boot order to boot from the installation disk. If you didn’t select an installation disk, the Assisted Installer selected one for you. To view the selected installation disk, click to expand the host’s information in the host inventory, and check which disk has the “Installation disk” role.

15.7. Rectifying partially-successful installations

There are cases where the Assisted Installer declares an installation to be successful even though it encountered errors:

  • If you requested to install OLM operators and one or more failed to install, log into the cluster’s console to remediate the failures.
  • If you requested to install more than two worker nodes and at least one failed to install, but at least two succeeded, add the failed workers to the installed cluster.
Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.