4.5. 在 Jaeger UI 中配置 Monitor 选项卡
您可以在 OpenShift Container Platform Web 控制台的 Monitor 选项卡中通过 Jaeger Console 提取请求率、错误和持续时间(RED)指标,并通过 Jaeger Console 进行视觉化。指标来自 OpenTelemetry Collector 中的 span,由 Prometheus 从 Collector 中提取,您可以在用户工作负载监控堆栈中部署。Jaeger UI 从 Prometheus 端点查询这些指标,并视觉化它们。
先决条件
- 您已为 Distributed Tracing Platform 配置了权限和租户。如需更多信息,请参阅"配置权限和租户"。
流程
在
OpenTelemetry Collector的 OpenTelemetryCollector 自定义资源中,启用 Spanmetrics Connector (spanmetrics),它将从 trace 派生指标,并以 Prometheus 格式导出指标。span RED 的
OpenTelemetryCollector自定义资源示例apiVersion: opentelemetry.io/v1beta1 kind: OpenTelemetryCollector metadata: name: otel spec: mode: deployment observability: metrics: enableMetrics: true1 config: | connectors: spanmetrics:2 metrics_flush_interval: 15s receivers: otlp:3 protocols: grpc: http: exporters: prometheus:4 endpoint: 0.0.0.0:8889 add_metric_suffixes: false resource_to_telemetry_conversion: enabled: true5 otlp: auth: authenticator: bearertokenauth endpoint: tempo-redmetrics-gateway.mynamespace.svc.cluster.local:8090 headers: X-Scope-OrgID: dev tls: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt insecure: false extensions: bearertokenauth: filename: /var/run/secrets/kubernetes.io/serviceaccount/token service: extensions: - bearertokenauth pipelines: traces: receivers: [otlp] exporters: [otlp, spanmetrics]6 metrics: receivers: [spanmetrics]7 exporters: [prometheus] # ...在
TempoStack自定义资源中,启用 Monitor 选项卡,并将 Prometheus 端点设置为 Thanos querier 服务,以从用户定义的监控堆栈查询数据。带有启用 Monitor 选项卡的
TempoStack自定义资源示例apiVersion: tempo.grafana.com/v1alpha1 kind: TempoStack metadata: name: redmetrics spec: storage: secret: name: minio-test type: s3 storageSize: 1Gi tenants: mode: openshift authentication: - tenantName: dev tenantId: "1610b0c3-c509-4592-a256-a1871353dbfa" template: gateway: enabled: true queryFrontend: jaegerQuery: monitorTab: enabled: true1 prometheusEndpoint: https://thanos-querier.openshift-monitoring.svc.cluster.local:90922 redMetricsNamespace: ""3 # ...可选:使用带有警报规则的
spanmetrics连接器生成的 span RED 指标。例如,对于有关较慢的服务或定义服务级别目标(SLO)的警报,连接器会创建一个duration_bucket直方图和调用计数器指标。这些指标具有标识服务、API 名称、操作类型和其他属性的标签。Expand 表 4.4. 在 spanmetrics 连接器中创建的指标标签 标签 描述 值 service_name由
otel_service_name环境变量设置的服务名称。frontendspan_name操作的名称。
-
/ -
/customer
span_kind标识服务器、客户端、消息传递或内部操作。
-
SPAN_KIND_SERVER -
SPAN_KIND_CLIENT -
SPAN_KIND_PRODUCER -
SPAN_KIND_CONSUMER -
SPAN_KIND_INTERNAL
PrometheusRule自定义资源示例,当前端服务于 2000ms 内没有提供 95% 时 SLO 定义警报规则apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: span-red spec: groups: - name: server-side-latency rules: - alert: SpanREDFrontendAPIRequestLatency expr: histogram_quantile(0.95, sum(rate(duration_bucket{service_name="frontend", span_kind="SPAN_KIND_SERVER"}[5m])) by (le, service_name, span_name)) > 20001 labels: severity: Warning annotations: summary: "High request latency on {{$labels.service_name}} and {{$labels.span_name}}" description: "{{$labels.instance}} has 95th request latency above 2s (current value: {{$value}}s)"- 1
- 这个表达式检查,是否 95% 的前端服务器响应时间值低于 2000 ms。时间范围 (
[5m]) 必须至少是提取间隔的四倍,并且足以适应指标的变化。
-