Chapter 1. Configuring observability
Use observability to gain insights into the metrics, logs, and alerts from your Red Hat OpenStack Services on OpenShift (RHOSO) deployment. You can configure observability by editing the default Telemetry service (ceilometer, prometheus) in your OpenStackControlPlane custom resource (CR) file.
1.1. RHOSO observability architecture Copy linkLink copied to clipboard!
The observability architecture in Red Hat OpenStack Services on OpenShift (RHOSO) is composed of services within Red Hat OpenShift Container Platform (RHOCP), as well as services on your Compute nodes that provide metrics, logs, and alerts. You can use Red Hat OpenShift Observability for insight into your RHOSO environment and for collecting, storing, and searching through logs.
The observability platform available with RHOSO does not guarantee the delivery of metrics. Metrics are exposed for scraping but they are not cached. If data is dropped there is no ability to retrospectively fill in gaps in the data, which might result in incomplete metrics.
1.2. Configuring observability on the control plane Copy linkLink copied to clipboard!
The Telemetry service (ceilometer, prometheus) is enabled by default in a Red Hat OpenStack Services on OpenShift (RHOSO) deployment. You can configure observability by editing the Telemetry service in your OpenStackControlPlane custom resource (CR) file.
Prerequisites
-
The control plane includes initial configuration of the Telemetry service. For more information, see the
telemetryconfiguration in Creating the control plane in Deploying Red Hat OpenStack Services on OpenShift.
Procedure
-
Open your
OpenStackControlPlaneCR file,openstack_control_plane.yaml, on your workstation. Configure the Telemetry service,
telemetry, as required for your environment:Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
metricStorage.monitoringStack.scrapeInterval: Specifies the interval at which new metrics are gathered. Changing this interval can affect performance. -
metricStorage.monitoringStack.storage.retention: Specifies the length of time that telemetry metrics are stored. The duration affects the amount of storage required. -
storage.persistent.pvcStorageRequest: Specifies the amount of storage to allocate to the Prometheus time series database. -
autoscaling.enabled: Set totrueto enable autoscaling. Theautoscalingfield must be present even when autoscaling is disabled. For more information about autoscaling, see Autoscaling for Instances. -
ceilometer.enabled: Set tofalseto disable theceilometerservice. If you do not disable ceilometer, then a Prometheus metrics exporter is created and exposed from inside the cluster at the following URL: http://ceilometer-internal.openstack.svc:3000/metrics -
logging.enabled: Set totrueto enable observability logging. For more information about configuring observability logging, see Enabling RHOSO observability logging. -
cloudkitty.enabled: Set totrueto enable the Rating service (cloudkitty). For more information about configuring chargeback and rating capabilities, see Enabling the Rating service on the control plane.
-
Update the control plane:
oc apply -f openstack_control_plane.yaml -n openstack
$ oc apply -f openstack_control_plane.yaml -n openstackCopy to Clipboard Copied! Toggle word wrap Toggle overflow Wait until RHOCP creates the resources related to the
OpenStackControlPlaneCR. Run the following command to check the status:oc get openstackcontrolplane -n openstack NAME STATUS MESSAGE openstack-control-plane Unknown Setup started
$ oc get openstackcontrolplane -n openstack NAME STATUS MESSAGE openstack-control-plane Unknown Setup startedCopy to Clipboard Copied! Toggle word wrap Toggle overflow The OpenStackControlPlane resources are created when the status is "Setup complete".
TipAppend the
-woption to the end of thegetcommand to track deployment progress.Optional: Confirm that the control plane is deployed by reviewing the pods in the
openstacknamespace for each of your cells:oc get pods -n openstack
$ oc get pods -n openstackCopy to Clipboard Copied! Toggle word wrap Toggle overflow The control plane is deployed when all the pods are either completed or running.
Verification
Access the remote shell for the
OpenStackClientpod from your workstation:oc rsh -n openstack openstackclient
$ oc rsh -n openstack openstackclientCopy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm that you can query
prometheusand that the scrape endpoints are active:openstack metric query up --disable-rbac -c container -c instance -c value
$ openstack metric query up --disable-rbac -c container -c instance -c valueCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteEach entry in the value field should be "1" when there are active workloads scheduled on the cluster, except for the
prometheuscontainer. Theprometheuscontainer reports a value of "0" due to TLS, which is enabled by default.-
You can find the
openstack-telemetry-operatordashboards by clickingObserveand thenDashboardsin the RHOCP console. For more information about RHOCP dashboards, see Reviewing monitoring dashboards as a cluster administrator in the RHOCP Monitoring Guide.
1.3. Enabling Telemetry power monitoring on the data plane Copy linkLink copied to clipboard!
This feature is available in this release as a Technology Preview, and therefore is not fully supported by Red Hat. It should only be used for testing, and should not be deployed in a production environment. For more information about Technology Preview features, see Scope of Coverage Details.
You can enable power monitoring on the data plane to collect power consumption metrics, by adding the telemetry-power-monitoring service to each OpenStackDataPlaneNodeSet custom resource (CR) defined for the data plane.
Procedure
-
Open the
OpenStackDataPlaneNodeSetCR definition file for the node set you want to update, for example,openstack_data_plane.yaml. Add the
servicesfield, and include all the required services, including the default services, then addtelemetry-power-monitoringaftertelemetry:Copy to Clipboard Copied! Toggle word wrap Toggle overflow For more information about deploying data plane services, see Deploying the data plane in the Deploying Red Hat OpenStack Services on OpenShift guide.
-
Save the
OpenStackDataPlaneNodeSetCR definition file. Apply the updated
OpenStackDataPlaneNodeSetCR configuration:oc apply -f openstack_data_plane.yaml
$ oc apply -f openstack_data_plane.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the data plane resource has been updated by confirming that the status is
SetupReady:oc wait openstackdataplanenodeset openstack-data-plane --for condition=SetupReady --timeout=10m
$ oc wait openstackdataplanenodeset openstack-data-plane --for condition=SetupReady --timeout=10mCopy to Clipboard Copied! Toggle word wrap Toggle overflow When the status is
SetupReadythe command returns acondition metmessage, otherwise it returns a timeout error.For information about the data plane conditions and states, see Data plane conditions and states in Deploying Red Hat OpenStack Services on OpenShift.
Create a file on your workstation to define the
OpenStackDataPlaneDeploymentCR:apiVersion: dataplane.openstack.org/v1beta1 kind: OpenStackDataPlaneDeployment metadata: name: <node_set_deployment_name>
apiVersion: dataplane.openstack.org/v1beta1 kind: OpenStackDataPlaneDeployment metadata: name: <node_set_deployment_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Replace
<node_set_deployment_name>with the name of theOpenStackDataPlaneDeploymentCR. The name must be unique, must consist of lower case alphanumeric characters,-(hyphen) or.(period), and must start and end with an alphanumeric character.
TipGive the definition file and the
OpenStackDataPlaneDeploymentCR unique and descriptive names that indicate the purpose of the modified node set.-
Replace
Add the
OpenStackDataPlaneNodeSetCR that you modified:spec: nodeSets: - <nodeSet_name>spec: nodeSets: - <nodeSet_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Save the
OpenStackDataPlaneDeploymentCR deployment file. Deploy the modified
OpenStackDataPlaneNodeSetCR:oc create -f openstack_data_plane_deploy.yaml -n openstack
$ oc create -f openstack_data_plane_deploy.yaml -n openstackCopy to Clipboard Copied! Toggle word wrap Toggle overflow You can view the Ansible logs while the deployment executes:
oc get pod -l app=openstackansibleee -w oc logs -l app=openstackansibleee -f --max-log-requests 10
$ oc get pod -l app=openstackansibleee -w $ oc logs -l app=openstackansibleee -f --max-log-requests 10Copy to Clipboard Copied! Toggle word wrap Toggle overflow If the
oc logscommand returns an error similar to the following error, increase the--max-log-requestsvalue:error: you are attempting to follow 19 log streams, but maximum allowed concurrency is 10, use --max-log-requests to increase the limit
error: you are attempting to follow 19 log streams, but maximum allowed concurrency is 10, use --max-log-requests to increase the limitCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the modified
OpenStackDataPlaneNodeSetCR is deployed:Copy to Clipboard Copied! Toggle word wrap Toggle overflow For information about the meaning of the returned status, see Data plane conditions and states in the Deploying Red Hat OpenStack Services on OpenShift guide.
If the status indicates that the data plane has not been deployed, then troubleshoot the deployment. For information, see Troubleshooting the data plane creation and deployment in the Deploying Red Hat OpenStack Services on OpenShift guide.
Verify that the
telemetry-power-monitoringservice is deployed by checking forceilometer_agent_ipmiandkeplercontainers in the data plane nodes:podman ps | grep -i -e ceilometer_agent_ipmi -e kepler
$ podman ps | grep -i -e ceilometer_agent_ipmi -e keplerCopy to Clipboard Copied! Toggle word wrap Toggle overflow