33.10. MetalLB 日志记录、故障排除和支持
如果您需要对 MetalLB 配置进行故障排除,请查看以下部分来了解常用命令。
33.10.1. 设置 MetalLB 日志记录级别
MetalLB 在带有默认设置 info
的容器中使用 FRRouting(FRR)会生成大量日志。您可以通过设置 logLevel
来控制生成的日志的详细程度,如下例所示。
通过将 logLevel
设置为 debug
来深入了解 MetalLB,如下所示:
先决条件
-
您可以使用具有
cluster-admin
角色的用户访问集群。 -
已安装 OpenShift CLI(
oc
)。
流程
创建一个文件,如
setdebugloglevel.yaml
,其内容类似以下示例:apiVersion: metallb.io/v1beta1 kind: MetalLB metadata: name: metallb namespace: metallb-system spec: logLevel: debug nodeSelector: node-role.kubernetes.io/worker: ""
应用配置:
$ oc replace -f setdebugloglevel.yaml
注意使用
oc replace
可以被理解为,metallb
CR 已被创建,您在此处更改了日志级别。显示
speaker
pod 的名称:$ oc get -n metallb-system pods -l component=speaker
输出示例
NAME READY STATUS RESTARTS AGE speaker-2m9pm 4/4 Running 0 9m19s speaker-7m4qw 3/4 Running 0 19s speaker-szlmx 4/4 Running 0 9m19s
注意重新创建发言人和控制器 Pod,以确保应用更新的日志记录级别。对于 MetalLB 的所有组件,会修改日志记录级别。
查看
speaker
日志:$ oc logs -n metallb-system speaker-7m4qw -c speaker
输出示例
{"branch":"main","caller":"main.go:92","commit":"3d052535","goversion":"gc / go1.17.1 / amd64","level":"info","msg":"MetalLB speaker starting (commit 3d052535, branch main)","ts":"2022-05-17T09:55:05Z","version":""} {"caller":"announcer.go:110","event":"createARPResponder","interface":"ens4","level":"info","msg":"created ARP responder for interface","ts":"2022-05-17T09:55:05Z"} {"caller":"announcer.go:119","event":"createNDPResponder","interface":"ens4","level":"info","msg":"created NDP responder for interface","ts":"2022-05-17T09:55:05Z"} {"caller":"announcer.go:110","event":"createARPResponder","interface":"tun0","level":"info","msg":"created ARP responder for interface","ts":"2022-05-17T09:55:05Z"} {"caller":"announcer.go:119","event":"createNDPResponder","interface":"tun0","level":"info","msg":"created NDP responder for interface","ts":"2022-05-17T09:55:05Z"} I0517 09:55:06.515686 95 request.go:665] Waited for 1.026500832s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operators.coreos.com/v1alpha1?timeout=32s {"Starting Manager":"(MISSING)","caller":"k8s.go:389","level":"info","ts":"2022-05-17T09:55:08Z"} {"caller":"speakerlist.go:310","level":"info","msg":"node event - forcing sync","node addr":"10.0.128.4","node event":"NodeJoin","node name":"ci-ln-qb8t3mb-72292-7s7rh-worker-a-vvznj","ts":"2022-05-17T09:55:08Z"} {"caller":"service_controller.go:113","controller":"ServiceReconciler","enqueueing":"openshift-kube-controller-manager-operator/metrics","epslice":"{\"metadata\":{\"name\":\"metrics-xtsxr\",\"generateName\":\"metrics-\",\"namespace\":\"openshift-kube-controller-manager-operator\",\"uid\":\"ac6766d7-8504-492c-9d1e-4ae8897990ad\",\"resourceVersion\":\"9041\",\"generation\":4,\"creationTimestamp\":\"2022-05-17T07:16:53Z\",\"labels\":{\"app\":\"kube-controller-manager-operator\",\"endpointslice.kubernetes.io/managed-by\":\"endpointslice-controller.k8s.io\",\"kubernetes.io/service-name\":\"metrics\"},\"annotations\":{\"endpoints.kubernetes.io/last-change-trigger-time\":\"2022-05-17T07:21:34Z\"},\"ownerReferences\":[{\"apiVersion\":\"v1\",\"kind\":\"Service\",\"name\":\"metrics\",\"uid\":\"0518eed3-6152-42be-b566-0bd00a60faf8\",\"controller\":true,\"blockOwnerDeletion\":true}],\"managedFields\":[{\"manager\":\"kube-controller-manager\",\"operation\":\"Update\",\"apiVersion\":\"discovery.k8s.io/v1\",\"time\":\"2022-05-17T07:20:02Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:addressType\":{},\"f:endpoints\":{},\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:endpoints.kubernetes.io/last-change-trigger-time\":{}},\"f:generateName\":{},\"f:labels\":{\".\":{},\"f:app\":{},\"f:endpointslice.kubernetes.io/managed-by\":{},\"f:kubernetes.io/service-name\":{}},\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"0518eed3-6152-42be-b566-0bd00a60faf8\\\"}\":{}}},\"f:ports\":{}}}]},\"addressType\":\"IPv4\",\"endpoints\":[{\"addresses\":[\"10.129.0.7\"],\"conditions\":{\"ready\":true,\"serving\":true,\"terminating\":false},\"targetRef\":{\"kind\":\"Pod\",\"namespace\":\"openshift-kube-controller-manager-operator\",\"name\":\"kube-controller-manager-operator-6b98b89ddd-8d4nf\",\"uid\":\"dd5139b8-e41c-4946-a31b-1a629314e844\",\"resourceVersion\":\"9038\"},\"nodeName\":\"ci-ln-qb8t3mb-72292-7s7rh-master-0\",\"zone\":\"us-central1-a\"}],\"ports\":[{\"name\":\"https\",\"protocol\":\"TCP\",\"port\":8443}]}","level":"debug","ts":"2022-05-17T09:55:08Z"}
查看 FRR 日志:
$ oc logs -n metallb-system speaker-7m4qw -c frr
输出示例
Started watchfrr 2022/05/17 09:55:05 ZEBRA: client 16 says hello and bids fair to announce only bgp routes vrf=0 2022/05/17 09:55:05 ZEBRA: client 31 says hello and bids fair to announce only vnc routes vrf=0 2022/05/17 09:55:05 ZEBRA: client 38 says hello and bids fair to announce only static routes vrf=0 2022/05/17 09:55:05 ZEBRA: client 43 says hello and bids fair to announce only bfd routes vrf=0 2022/05/17 09:57:25.089 BGP: Creating Default VRF, AS 64500 2022/05/17 09:57:25.090 BGP: dup addr detect enable max_moves 5 time 180 freeze disable freeze_time 0 2022/05/17 09:57:25.090 BGP: bgp_get: Registering BGP instance (null) to zebra 2022/05/17 09:57:25.090 BGP: Registering VRF 0 2022/05/17 09:57:25.091 BGP: Rx Router Id update VRF 0 Id 10.131.0.1/32 2022/05/17 09:57:25.091 BGP: RID change : vrf VRF default(0), RTR ID 10.131.0.1 2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF br0 2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF ens4 2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF ens4 addr 10.0.128.4/32 2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF ens4 addr fe80::c9d:84da:4d86:5618/64 2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF lo 2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF ovs-system 2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF tun0 2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF tun0 addr 10.131.0.1/23 2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF tun0 addr fe80::40f1:d1ff:feb6:5322/64 2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF veth2da49fed 2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF veth2da49fed addr fe80::24bd:d1ff:fec1:d88/64 2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF veth2fa08c8c 2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF veth2fa08c8c addr fe80::6870:ff:fe96:efc8/64 2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF veth41e356b7 2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF veth41e356b7 addr fe80::48ff:37ff:fede:eb4b/64 2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF veth1295c6e2 2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF veth1295c6e2 addr fe80::b827:a2ff:feed:637/64 2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF veth9733c6dc 2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF veth9733c6dc addr fe80::3cf4:15ff:fe11:e541/64 2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF veth336680ea 2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF veth336680ea addr fe80::94b1:8bff:fe7e:488c/64 2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF vetha0a907b7 2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF vetha0a907b7 addr fe80::3855:a6ff:fe73:46c3/64 2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF vethf35a4398 2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF vethf35a4398 addr fe80::40ef:2fff:fe57:4c4d/64 2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF vethf831b7f4 2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF vethf831b7f4 addr fe80::f0d9:89ff:fe7c:1d32/64 2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF vxlan_sys_4789 2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF vxlan_sys_4789 addr fe80::80c1:82ff:fe4b:f078/64 2022/05/17 09:57:26.094 BGP: 10.0.0.1 [FSM] Timer (start timer expire). 2022/05/17 09:57:26.094 BGP: 10.0.0.1 [FSM] BGP_Start (Idle->Connect), fd -1 2022/05/17 09:57:26.094 BGP: Allocated bnc 10.0.0.1/32(0)(VRF default) peer 0x7f807f7631a0 2022/05/17 09:57:26.094 BGP: sendmsg_zebra_rnh: sending cmd ZEBRA_NEXTHOP_REGISTER for 10.0.0.1/32 (vrf VRF default) 2022/05/17 09:57:26.094 BGP: 10.0.0.1 [FSM] Waiting for NHT 2022/05/17 09:57:26.094 BGP: bgp_fsm_change_status : vrf default(0), Status: Connect established_peers 0 2022/05/17 09:57:26.094 BGP: 10.0.0.1 went from Idle to Connect 2022/05/17 09:57:26.094 BGP: 10.0.0.1 [FSM] TCP_connection_open_failed (Connect->Active), fd -1 2022/05/17 09:57:26.094 BGP: bgp_fsm_change_status : vrf default(0), Status: Active established_peers 0 2022/05/17 09:57:26.094 BGP: 10.0.0.1 went from Connect to Active 2022/05/17 09:57:26.094 ZEBRA: rnh_register msg from client bgp: hdr->length=8, type=nexthop vrf=0 2022/05/17 09:57:26.094 ZEBRA: 0: Add RNH 10.0.0.1/32 type Nexthop 2022/05/17 09:57:26.094 ZEBRA: 0:10.0.0.1/32: Evaluate RNH, type Nexthop (force) 2022/05/17 09:57:26.094 ZEBRA: 0:10.0.0.1/32: NH has become unresolved 2022/05/17 09:57:26.094 ZEBRA: 0: Client bgp registers for RNH 10.0.0.1/32 type Nexthop 2022/05/17 09:57:26.094 BGP: VRF default(0): Rcvd NH update 10.0.0.1/32(0) - metric 0/0 #nhops 0/0 flags 0x6 2022/05/17 09:57:26.094 BGP: NH update for 10.0.0.1/32(0)(VRF default) - flags 0x6 chgflags 0x0 - evaluate paths 2022/05/17 09:57:26.094 BGP: evaluate_paths: Updating peer (10.0.0.1(VRF default)) status with NHT 2022/05/17 09:57:30.081 ZEBRA: Event driven route-map update triggered 2022/05/17 09:57:30.081 ZEBRA: Event handler for route-map: 10.0.0.1-out 2022/05/17 09:57:30.081 ZEBRA: Event handler for route-map: 10.0.0.1-in 2022/05/17 09:57:31.104 ZEBRA: netlink_parse_info: netlink-listen (NS 0) type RTM_NEWNEIGH(28), len=76, seq=0, pid=0 2022/05/17 09:57:31.104 ZEBRA: Neighbor Entry received is not on a VLAN or a BRIDGE, ignoring 2022/05/17 09:57:31.105 ZEBRA: netlink_parse_info: netlink-listen (NS 0) type RTM_NEWNEIGH(28), len=76, seq=0, pid=0 2022/05/17 09:57:31.105 ZEBRA: Neighbor Entry received is not on a VLAN or a BRIDGE, ignoring
33.10.1.1. FRRouting(FRR)日志级别
下表描述了 FRR 日志记录级别。
日志级别 | 描述 |
---|---|
| 为所有日志记录级别提供所有日志信息。 |
|
这些信息有助于相关人员进行问题诊断。设置为 |
| 提供始终应记录的信息,但在正常情况下,不需要用户干预。这是默认的日志记录级别。 |
|
任何可能导致 |
|
对 |
| 关闭所有日志。 |
33.10.2. BGP 故障排除问题
红帽支持在 speaker
Pod 的容器中使用 FRRouting(FRR)的 BGP 实施。作为集群管理员,如果您需要对 BGP 配置问题进行故障排除,您需要在 FRR 容器中运行命令。
先决条件
-
您可以使用具有
cluster-admin
角色的用户访问集群。 -
已安装 OpenShift CLI(
oc
)。
流程
显示
speaker
pod 的名称:$ oc get -n metallb-system pods -l component=speaker
输出示例
NAME READY STATUS RESTARTS AGE speaker-66bth 4/4 Running 0 56m speaker-gvfnf 4/4 Running 0 56m ...
显示 FRR 的运行配置:
$ oc exec -n metallb-system speaker-66bth -c frr -- vtysh -c "show running-config"
输出示例
Building configuration... Current configuration: ! frr version 7.5.1_git frr defaults traditional hostname some-hostname log file /etc/frr/frr.log informational log timestamp precision 3 service integrated-vtysh-config ! router bgp 64500 1 bgp router-id 10.0.1.2 no bgp ebgp-requires-policy no bgp default ipv4-unicast no bgp network import-check neighbor 10.0.2.3 remote-as 64500 2 neighbor 10.0.2.3 bfd profile doc-example-bfd-profile-full 3 neighbor 10.0.2.3 timers 5 15 neighbor 10.0.2.4 remote-as 64500 4 neighbor 10.0.2.4 bfd profile doc-example-bfd-profile-full 5 neighbor 10.0.2.4 timers 5 15 ! address-family ipv4 unicast network 203.0.113.200/30 6 neighbor 10.0.2.3 activate neighbor 10.0.2.3 route-map 10.0.2.3-in in neighbor 10.0.2.4 activate neighbor 10.0.2.4 route-map 10.0.2.4-in in exit-address-family ! address-family ipv6 unicast network fc00:f853:ccd:e799::/124 7 neighbor 10.0.2.3 activate neighbor 10.0.2.3 route-map 10.0.2.3-in in neighbor 10.0.2.4 activate neighbor 10.0.2.4 route-map 10.0.2.4-in in exit-address-family ! route-map 10.0.2.3-in deny 20 ! route-map 10.0.2.4-in deny 20 ! ip nht resolve-via-default ! ipv6 nht resolve-via-default ! line vty ! bfd profile doc-example-bfd-profile-full 8 transmit-interval 35 receive-interval 35 passive-mode echo-mode echo-interval 35 minimum-ttl 10 ! ! end
<.>
router bgp
部分表示 MetalLB 的 ASN。<.> 确认存在neighbor <ip-address> remote-as <peer-ASN>
行,用于您添加的每个 BGP peer 自定义资源。<.>如果配置了 BFD,请确认 BFD 配置集与正确的 BGP peer 关联,并确认 BFD 配置集会出现在命令输出中。<.> 确认network <ip-address-range>
行与您在您在地址池自定义资源中指定的 IP 地址范围匹配。显示 BGP 概述:
$ oc exec -n metallb-system speaker-66bth -c frr -- vtysh -c "show bgp summary"
输出示例
IPv4 Unicast Summary: BGP router identifier 10.0.1.2, local AS number 64500 vrf-id 0 BGP table version 1 RIB entries 1, using 192 bytes of memory Peers 2, using 29 KiB of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt 10.0.2.3 4 64500 387 389 0 0 0 00:32:02 0 1 1 10.0.2.4 4 64500 0 0 0 0 0 never Active 0 2 Total number of neighbors 2 IPv6 Unicast Summary: BGP router identifier 10.0.1.2, local AS number 64500 vrf-id 0 BGP table version 1 RIB entries 1, using 192 bytes of memory Peers 2, using 29 KiB of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt 10.0.2.3 4 64500 387 389 0 0 0 00:32:02 NoNeg 3 10.0.2.4 4 64500 0 0 0 0 0 never Active 0 4 Total number of neighbors 2
显示接收地址池的 BGP 对等点:
$ oc exec -n metallb-system speaker-66bth -c frr -- vtysh -c "show bgp ipv4 unicast 203.0.113.200/30"
将
ipv4
替换为ipv6
,以显示接收 IPv6 地址池的 BGP 对等点。将203.0.113.200/30
替换为地址池的 IPv4 或 IPv6 IP 地址范围。输出示例
BGP routing table entry for 203.0.113.200/30 Paths: (1 available, best #1, table default) Advertised to non peer-group peers: 10.0.2.3 <.> Local 0.0.0.0 from 0.0.0.0 (10.0.1.2) Origin IGP, metric 0, weight 32768, valid, sourced, local, best (First path received) Last update: Mon Jan 10 19:49:07 2022
<.> 确认输出中包含 BGP peer 的 IP 地址。
33.10.3. BFD 问题故障排除
红帽支持双向转发检测(BFD)实施,在 speaker
Pod 中使用 FRRouting(FRR)。BFD 实施依赖于 BFD 对等点,也被配置为带有已建立的 BGP 会话的 BGP 对等点。作为集群管理员,如果您需要排除 BFD 配置问题,则需要在 FRR 容器中运行命令。
先决条件
-
您可以使用具有
cluster-admin
角色的用户访问集群。 -
已安装 OpenShift CLI(
oc
)。
流程
显示
speaker
pod 的名称:$ oc get -n metallb-system pods -l component=speaker
输出示例
NAME READY STATUS RESTARTS AGE speaker-66bth 4/4 Running 0 26m speaker-gvfnf 4/4 Running 0 26m ...
显示 BFD 对等点:
$ oc exec -n metallb-system speaker-66bth -c frr -- vtysh -c "show bfd peers brief"
输出示例
Session count: 2 SessionId LocalAddress PeerAddress Status ========= ============ =========== ====== 3909139637 10.0.1.2 10.0.2.3 up <.>
确认
PeerAddress
列包含每个 BFD 对等点。如果输出没有列出输出要包含的 BFD peer IP 地址,并与 peer 对 BGP 连接性进行故障排除。如果状态字段显示down
,请检查在节点和对等点间的链接和设备连接。您可以使用oc get pods -n metallb-system speaker-66bth -o jsonpath='{.spec.nodeName}'
命令确定 speaker pod 的节点名称。
33.10.4. BGP 和 BFD 的 MetalLB 指标
OpenShift Container Platform 捕获与 BGP peer 和 BFD 配置集相关的 MetalLB 的以下 Prometheus 指标。
-
metallb_bfd_control_packet_input
统计从每个 BFD peer 接收的 BFD 控制数据包的数量。 -
metallb_bfd_control_packet_output
统计发送到每个 BFD peer 的 BFD 控制数据包的数量。 -
metallb_bfd_echo_packet_input
统计从每个 BFD 对等点接收的 BFD echo 数据包的数量。 -
metallb_bfd_echo_packet_output
计算发送到每个 BFD 对等点的 BFD 回显数据包的数量。 -
metallb_bfd_session_down_events
统计 BFD 会话进入down
状态的次数。 -
metallb_bfd_session_up
指示与 BFD 对等点的连接状态。1
表示会话状态为up
,0
表示会话为down
。 -
metallb_bfd_session_up_events
统计 BFD 会话进入up
状态的次数。 -
metallb_bfd_zebra_notifications
统计每个 BFD Zebra 通知的数量。 -
metallb_bgp_announced_prefixes_total
计算公告给 BGP 对等的负载均衡器 IP 地址前缀的数量。术语 前缀(prefix) 和聚合路由(aggregated route) 具有相同的含义。 -
metallb_bgp_session_up
表示连接状态与 BGP peer。1
表示会话状态为up
,0
表示会话为down
。 -
metallb_bgp_updates_total
计算发送到 BGP peer 的 BGP更新
信息的数量。
其他资源
- 有关使用监控仪表板的信息,请参阅 查询指标。
33.10.5. 关于收集 MetalLB 数据
您可以使用 oc adm must-gather
CLI 命令来收集有关集群、MetalLB 配置和 MetalLB Operator 的信息。以下功能和对象与 MetalLB 和 MetalLB Operator 关联:
- 在其中部署 MetalLB Operator 的命名空间和子对象
- 所有 MetalLB Operator 自定义资源定义(CRD)
oc adm must-gather
CLI 命令会收集红帽用来实施 BGP 和 BFD 的 FRRouting(FRR)的以下信息:
-
/etc/frr/frr.conf
-
/etc/frr/frr.log
-
/etc/frr/daemons
配置文件 -
/etc/frr/vtysh.conf
上述列表中的日志和配置文件从每个 speaker
pod 中的 frr
容器收集。
除了日志和配置文件外,oc adm must-gather
CLI 命令还会从以下 vtysh
命令收集输出:
-
show running-config
-
show bgp ipv4
-
show bgp ipv6
-
show bgp neighbor
-
show bfd peer
运行 oc adm must-gather
CLI 命令时不需要额外的配置。
其他资源