검색

1장. 클러스터에 연결하여 원격 상태 모니터링

download PDF

1.1. 원격 상태 모니터링으로 수집된 데이터 표시

관리자는 Telemetry 및 Insights Operator에서 수집한 메트릭을 검토할 수 있습니다.

1.1.1. Telemetry로 수집한 데이터 표시

Telemetry에서 캡처한 클러스터 및 구성 요소 시계열 데이터를 볼 수 있습니다.

사전 요구 사항

  • OpenShift Container Platform CLI(oc)를 설치했습니다.
  • cluster-admin 역할 또는 cluster-monitoring-view 역할의 사용자로 클러스터에 액세스할 수 있습니다.

절차

  1. 클러스터에 로그인합니다.
  2. 다음 명령을 실행하여 클러스터의 Prometheus 서비스를 쿼리하고 Telemetry에서 캡처한 전체 시계열 데이터 세트를 반환합니다.

    $ curl -G -k -H "Authorization: Bearer $(oc whoami -t)" \
    https://$(oc get route prometheus-k8s-federate -n \
    openshift-monitoring -o jsonpath="{.spec.host}")/federate \
    --data-urlencode 'match[]={__name__=~"cluster:usage:.*"}' \
    --data-urlencode 'match[]={__name__="count:up0"}' \
    --data-urlencode 'match[]={__name__="count:up1"}' \
    --data-urlencode 'match[]={__name__="cluster_version"}' \
    --data-urlencode 'match[]={__name__="cluster_version_available_updates"}' \
    --data-urlencode 'match[]={__name__="cluster_version_capability"}' \
    --data-urlencode 'match[]={__name__="cluster_operator_up"}' \
    --data-urlencode 'match[]={__name__="cluster_operator_conditions"}' \
    --data-urlencode 'match[]={__name__="cluster_version_payload"}' \
    --data-urlencode 'match[]={__name__="cluster_installer"}' \
    --data-urlencode 'match[]={__name__="cluster_infrastructure_provider"}' \
    --data-urlencode 'match[]={__name__="cluster_feature_set"}' \
    --data-urlencode 'match[]={__name__="instance:etcd_object_counts:sum"}' \
    --data-urlencode 'match[]={__name__="ALERTS",alertstate="firing"}' \
    --data-urlencode 'match[]={__name__="code:apiserver_request_total:rate:sum"}' \
    --data-urlencode 'match[]={__name__="cluster:capacity_cpu_cores:sum"}' \
    --data-urlencode 'match[]={__name__="cluster:capacity_memory_bytes:sum"}' \
    --data-urlencode 'match[]={__name__="cluster:cpu_usage_cores:sum"}' \
    --data-urlencode 'match[]={__name__="cluster:memory_usage_bytes:sum"}' \
    --data-urlencode 'match[]={__name__="openshift:cpu_usage_cores:sum"}' \
    --data-urlencode 'match[]={__name__="openshift:memory_usage_bytes:sum"}' \
    --data-urlencode 'match[]={__name__="workload:cpu_usage_cores:sum"}' \
    --data-urlencode 'match[]={__name__="workload:memory_usage_bytes:sum"}' \
    --data-urlencode 'match[]={__name__="cluster:virt_platform_nodes:sum"}' \
    --data-urlencode 'match[]={__name__="cluster:node_instance_type_count:sum"}' \
    --data-urlencode 'match[]={__name__="cnv:vmi_status_running:count"}' \
    --data-urlencode 'match[]={__name__="cluster:vmi_request_cpu_cores:sum"}' \
    --data-urlencode 'match[]={__name__="node_role_os_version_machine:cpu_capacity_cores:sum"}' \
    --data-urlencode 'match[]={__name__="node_role_os_version_machine:cpu_capacity_sockets:sum"}' \
    --data-urlencode 'match[]={__name__="subscription_sync_total"}' \
    --data-urlencode 'match[]={__name__="olm_resolution_duration_seconds"}' \
    --data-urlencode 'match[]={__name__="csv_succeeded"}' \
    --data-urlencode 'match[]={__name__="csv_abnormal"}' \
    --data-urlencode 'match[]={__name__="cluster:kube_persistentvolumeclaim_resource_requests_storage_bytes:provisioner:sum"}' \
    --data-urlencode 'match[]={__name__="cluster:kubelet_volume_stats_used_bytes:provisioner:sum"}' \
    --data-urlencode 'match[]={__name__="ceph_cluster_total_bytes"}' \
    --data-urlencode 'match[]={__name__="ceph_cluster_total_used_raw_bytes"}' \
    --data-urlencode 'match[]={__name__="ceph_health_status"}' \
    --data-urlencode 'match[]={__name__="odf_system_raw_capacity_total_bytes"}' \
    --data-urlencode 'match[]={__name__="odf_system_raw_capacity_used_bytes"}' \
    --data-urlencode 'match[]={__name__="odf_system_health_status"}' \
    --data-urlencode 'match[]={__name__="job:ceph_osd_metadata:count"}' \
    --data-urlencode 'match[]={__name__="job:kube_pv:count"}' \
    --data-urlencode 'match[]={__name__="job:odf_system_pvs:count"}' \
    --data-urlencode 'match[]={__name__="job:ceph_pools_iops:total"}' \
    --data-urlencode 'match[]={__name__="job:ceph_pools_iops_bytes:total"}' \
    --data-urlencode 'match[]={__name__="job:ceph_versions_running:count"}' \
    --data-urlencode 'match[]={__name__="job:noobaa_total_unhealthy_buckets:sum"}' \
    --data-urlencode 'match[]={__name__="job:noobaa_bucket_count:sum"}' \
    --data-urlencode 'match[]={__name__="job:noobaa_total_object_count:sum"}' \
    --data-urlencode 'match[]={__name__="odf_system_bucket_count", system_type="OCS", system_vendor="Red Hat"}' \
    --data-urlencode 'match[]={__name__="odf_system_objects_total", system_type="OCS", system_vendor="Red Hat"}' \
    --data-urlencode 'match[]={__name__="noobaa_accounts_num"}' \
    --data-urlencode 'match[]={__name__="noobaa_total_usage"}' \
    --data-urlencode 'match[]={__name__="console_url"}' \
    --data-urlencode 'match[]={__name__="cluster:ovnkube_master_egress_routing_via_host:max"}' \
    --data-urlencode 'match[]={__name__="cluster:network_attachment_definition_instances:max"}' \
    --data-urlencode 'match[]={__name__="cluster:network_attachment_definition_enabled_instance_up:max"}' \
    --data-urlencode 'match[]={__name__="cluster:ingress_controller_aws_nlb_active:sum"}' \
    --data-urlencode 'match[]={__name__="cluster:route_metrics_controller_routes_per_shard:min"}' \
    --data-urlencode 'match[]={__name__="cluster:route_metrics_controller_routes_per_shard:max"}' \
    --data-urlencode 'match[]={__name__="cluster:route_metrics_controller_routes_per_shard:avg"}' \
    --data-urlencode 'match[]={__name__="cluster:route_metrics_controller_routes_per_shard:median"}' \
    --data-urlencode 'match[]={__name__="cluster:openshift_route_info:tls_termination:sum"}' \
    --data-urlencode 'match[]={__name__="insightsclient_request_send_total"}' \
    --data-urlencode 'match[]={__name__="cam_app_workload_migrations"}' \
    --data-urlencode 'match[]={__name__="cluster:apiserver_current_inflight_requests:sum:max_over_time:2m"}' \
    --data-urlencode 'match[]={__name__="cluster:alertmanager_integrations:max"}' \
    --data-urlencode 'match[]={__name__="cluster:telemetry_selected_series:count"}' \
    --data-urlencode 'match[]={__name__="openshift:prometheus_tsdb_head_series:sum"}' \
    --data-urlencode 'match[]={__name__="openshift:prometheus_tsdb_head_samples_appended_total:sum"}' \
    --data-urlencode 'match[]={__name__="monitoring:container_memory_working_set_bytes:sum"}' \
    --data-urlencode 'match[]={__name__="namespace_job:scrape_series_added:topk3_sum1h"}' \
    --data-urlencode 'match[]={__name__="namespace_job:scrape_samples_post_metric_relabeling:topk3"}' \
    --data-urlencode 'match[]={__name__="monitoring:haproxy_server_http_responses_total:sum"}' \
    --data-urlencode 'match[]={__name__="rhmi_status"}' \
    --data-urlencode 'match[]={__name__="status:upgrading:version:rhoam_state:max"}' \
    --data-urlencode 'match[]={__name__="state:rhoam_critical_alerts:max"}' \
    --data-urlencode 'match[]={__name__="state:rhoam_warning_alerts:max"}' \
    --data-urlencode 'match[]={__name__="rhoam_7d_slo_percentile:max"}' \
    --data-urlencode 'match[]={__name__="rhoam_7d_slo_remaining_error_budget:max"}' \
    --data-urlencode 'match[]={__name__="cluster_legacy_scheduler_policy"}' \
    --data-urlencode 'match[]={__name__="cluster_master_schedulable"}' \
    --data-urlencode 'match[]={__name__="che_workspace_status"}' \
    --data-urlencode 'match[]={__name__="che_workspace_started_total"}' \
    --data-urlencode 'match[]={__name__="che_workspace_failure_total"}' \
    --data-urlencode 'match[]={__name__="che_workspace_start_time_seconds_sum"}' \
    --data-urlencode 'match[]={__name__="che_workspace_start_time_seconds_count"}' \
    --data-urlencode 'match[]={__name__="cco_credentials_mode"}' \
    --data-urlencode 'match[]={__name__="cluster:kube_persistentvolume_plugin_type_counts:sum"}' \
    --data-urlencode 'match[]={__name__="visual_web_terminal_sessions_total"}' \
    --data-urlencode 'match[]={__name__="acm_managed_cluster_info"}' \
    --data-urlencode 'match[]={__name__="cluster:vsphere_vcenter_info:sum"}' \
    --data-urlencode 'match[]={__name__="cluster:vsphere_esxi_version_total:sum"}' \
    --data-urlencode 'match[]={__name__="cluster:vsphere_node_hw_version_total:sum"}' \
    --data-urlencode 'match[]={__name__="openshift:build_by_strategy:sum"}' \
    --data-urlencode 'match[]={__name__="rhods_aggregate_availability"}' \
    --data-urlencode 'match[]={__name__="rhods_total_users"}' \
    --data-urlencode 'match[]={__name__="instance:etcd_disk_wal_fsync_duration_seconds:histogram_quantile",quantile="0.99"}' \
    --data-urlencode 'match[]={__name__="instance:etcd_mvcc_db_total_size_in_bytes:sum"}' \
    --data-urlencode 'match[]={__name__="instance:etcd_network_peer_round_trip_time_seconds:histogram_quantile",quantile="0.99"}' \
    --data-urlencode 'match[]={__name__="instance:etcd_mvcc_db_total_size_in_use_in_bytes:sum"}' \
    --data-urlencode 'match[]={__name__="instance:etcd_disk_backend_commit_duration_seconds:histogram_quantile",quantile="0.99"}' \
    --data-urlencode 'match[]={__name__="jaeger_operator_instances_storage_types"}' \
    --data-urlencode 'match[]={__name__="jaeger_operator_instances_strategies"}' \
    --data-urlencode 'match[]={__name__="jaeger_operator_instances_agent_strategies"}' \
    --data-urlencode 'match[]={__name__="appsvcs:cores_by_product:sum"}' \
    --data-urlencode 'match[]={__name__="nto_custom_profiles:count"}' \
    --data-urlencode 'match[]={__name__="openshift_csi_share_configmap"}' \
    --data-urlencode 'match[]={__name__="openshift_csi_share_secret"}' \
    --data-urlencode 'match[]={__name__="openshift_csi_share_mount_failures_total"}' \
    --data-urlencode 'match[]={__name__="openshift_csi_share_mount_requests_total"}' \
    --data-urlencode 'match[]={__name__="cluster:velero_backup_total:max"}' \
    --data-urlencode 'match[]={__name__="cluster:velero_restore_total:max"}' \
    --data-urlencode 'match[]={__name__="eo_es_storage_info"}' \
    --data-urlencode 'match[]={__name__="eo_es_redundancy_policy_info"}' \
    --data-urlencode 'match[]={__name__="eo_es_defined_delete_namespaces_total"}' \
    --data-urlencode 'match[]={__name__="eo_es_misconfigured_memory_resources_info"}' \
    --data-urlencode 'match[]={__name__="cluster:eo_es_data_nodes_total:max"}' \
    --data-urlencode 'match[]={__name__="cluster:eo_es_documents_created_total:sum"}' \
    --data-urlencode 'match[]={__name__="cluster:eo_es_documents_deleted_total:sum"}' \
    --data-urlencode 'match[]={__name__="pod:eo_es_shards_total:max"}' \
    --data-urlencode 'match[]={__name__="eo_es_cluster_management_state_info"}' \
    --data-urlencode 'match[]={__name__="imageregistry:imagestreamtags_count:sum"}' \
    --data-urlencode 'match[]={__name__="imageregistry:operations_count:sum"}' \
    --data-urlencode 'match[]={__name__="log_logging_info"}' \
    --data-urlencode 'match[]={__name__="log_collector_error_count_total"}' \
    --data-urlencode 'match[]={__name__="log_forwarder_pipeline_info"}' \
    --data-urlencode 'match[]={__name__="log_forwarder_input_info"}' \
    --data-urlencode 'match[]={__name__="log_forwarder_output_info"}' \
    --data-urlencode 'match[]={__name__="cluster:log_collected_bytes_total:sum"}' \
    --data-urlencode 'match[]={__name__="cluster:log_logged_bytes_total:sum"}' \
    --data-urlencode 'match[]={__name__="cluster:kata_monitor_running_shim_count:sum"}' \
    --data-urlencode 'match[]={__name__="platform:hypershift_hostedclusters:max"}' \
    --data-urlencode 'match[]={__name__="platform:hypershift_nodepools:max"}' \
    --data-urlencode 'match[]={__name__="namespace:noobaa_unhealthy_bucket_claims:max"}' \
    --data-urlencode 'match[]={__name__="namespace:noobaa_buckets_claims:max"}' \
    --data-urlencode 'match[]={__name__="namespace:noobaa_unhealthy_namespace_resources:max"}' \
    --data-urlencode 'match[]={__name__="namespace:noobaa_namespace_resources:max"}' \
    --data-urlencode 'match[]={__name__="namespace:noobaa_unhealthy_namespace_buckets:max"}' \
    --data-urlencode 'match[]={__name__="namespace:noobaa_namespace_buckets:max"}' \
    --data-urlencode 'match[]={__name__="namespace:noobaa_accounts:max"}' \
    --data-urlencode 'match[]={__name__="namespace:noobaa_usage:max"}' \
    --data-urlencode 'match[]={__name__="namespace:noobaa_system_health_status:max"}' \
    --data-urlencode 'match[]={__name__="ocs_advanced_feature_usage"}' \
    --data-urlencode 'match[]={__name__="os_image_url_override:sum"}'

1.1.2. Insights Operator에 의해 수집된 데이터의 표시

Insights Operator가 수집한 데이터를 검토할 수 있습니다.

사전 요구 사항

  • cluster-admin 역할의 사용자로 클러스터에 액세스할 수 있어야 합니다.

절차

  1. Insights Operator에 대해 현재 실행 중인 Pod의 이름을 검색합니다.

    $ INSIGHTS_OPERATOR_POD=$(oc get pods --namespace=openshift-insights -o custom-columns=:metadata.name --no-headers  --field-selector=status.phase=Running)
  2. Insights Operator가 수집한 최근 데이터 아카이브를 복사합니다.

    $ oc cp openshift-insights/$INSIGHTS_OPERATOR_POD:/var/lib/insights-operator ./insights-data

최신 Insights Operator 아카이브는 이제 insights-data 디렉토리에서 사용할 수 있습니다.

Red Hat logoGithubRedditYoutubeTwitter

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 문서 정보

Red Hat을 사용하는 고객은 신뢰할 수 있는 콘텐츠가 포함된 제품과 서비스를 통해 혁신하고 목표를 달성할 수 있습니다.

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat은 코드, 문서, 웹 속성에서 문제가 있는 언어를 교체하기 위해 최선을 다하고 있습니다. 자세한 내용은 다음을 참조하세요.Red Hat 블로그.

Red Hat 소개

Red Hat은 기업이 핵심 데이터 센터에서 네트워크 에지에 이르기까지 플랫폼과 환경 전반에서 더 쉽게 작업할 수 있도록 강화된 솔루션을 제공합니다.

© 2024 Red Hat, Inc.