1.29. Thanos compactor halts 故障排除
您可能会收到紧凑器停止的错误消息。当存在损坏块或 Thanos 紧凑器持久性卷声明(PVC)空间不足时,会出现这种情况。
1.29.1. 症状:Thanos compactor halts 复制链接链接已复制到粘贴板!
Thanos compactor 会停止,因为持久性卷声明(PVC)中没有剩余空间。您会收到以下信息:
ts=2024-01-24T15:34:51.948653839Z caller=compact.go:491 level=error msg="critical error detected; halting" err="compaction: group 0@5827190780573537664: compact blocks [ /var/thanos/compact/compact/0@15699422364132557315/01HKZGQGJCKQWF3XMA8EXAMPLE]: 2 errors: populate block: add series: write series data: write /var/thanos/compact/compact/0@15699422364132557315/01HKZGQGJCKQWF3XMA8EXAMPLE.tmp-for-creation/index: no space left on device; write /var/thanos/compact/compact/0@15699422364132557315/01HKZGQGJCKQWF3XMA8EXAMPLE.tmp-for-creation/index: no space left on device"
ts=2024-01-24T15:34:51.948653839Z caller=compact.go:491 level=error msg="critical error detected; halting" err="compaction: group 0@5827190780573537664: compact blocks [ /var/thanos/compact/compact/0@15699422364132557315/01HKZGQGJCKQWF3XMA8EXAMPLE]: 2 errors: populate block: add series: write series data: write /var/thanos/compact/compact/0@15699422364132557315/01HKZGQGJCKQWF3XMA8EXAMPLE.tmp-for-creation/index: no space left on device; write /var/thanos/compact/compact/0@15699422364132557315/01HKZGQGJCKQWF3XMA8EXAMPLE.tmp-for-creation/index: no space left on device"
1.29.2. 解决问题:Thanos compactor halts 复制链接链接已复制到粘贴板!
要解决这个问题,增大 Thanos compactor PVC 的存储空间。完成以下步骤:
-
增加
data-observability-thanos-compact-0
PVC 的存储空间。如需更多信息,请参阅增加和减少持久性卷和持久性卷声明。 通过删除 pod 来重启
observability-thanos-compact
pod。新 pod 会自动创建并启动。oc delete pod observability-thanos-compact-0 -n open-cluster-management-observability
oc delete pod observability-thanos-compact-0 -n open-cluster-management-observability
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
重启
observability-thanos-compact
pod 后,检查acm_thanos_compact_bulk_compactions
指标。当 Thanos compactor 通过 backlog 工作时,指标值会减少。 确认指标在循环中发生了变化,并检查磁盘使用情况。然后您可以重新尝试再次减少 PVC。
注意: 这可能需要几周时间。
1.29.3. 症状:Thanos compactor halts 复制链接链接已复制到粘贴板!
Thanos compactor 会停止,因为您已损坏的块。您可能会收到以下输出,其中 01HKZYEZ2DVDQXF1STVEXAMPLE
块已损坏:
ts=2024-01-24T15:34:51.948653839Z caller=compact.go:491 level=error msg="critical error detected; halting" err="compaction: group 0@15699422364132557315: compact blocks [/var/thanos/compact/compact/0@15699422364132557315/01HKZGQGJCKQWF3XMA8EXAMPLE /var/thanos/compact/compact/0@15699422364132557315/01HKZQK7TD06J2XWGR5EXAMPLE /var/thanos/compact/compact/0@15699422364132557315/01HKZYEZ2DVDQXF1STVEXAMPLE /var/thanos/compact/compact/0@15699422364132557315/01HM05APAHXBQSNC0N5EXAMPLE]: populate block: chunk iter: cannot populate chunk 8 from block 01HKZYEZ2DVDQXF1STVEXAMPLE: segment index 0 out of range"
ts=2024-01-24T15:34:51.948653839Z caller=compact.go:491 level=error msg="critical error detected; halting" err="compaction: group 0@15699422364132557315: compact blocks [/var/thanos/compact/compact/0@15699422364132557315/01HKZGQGJCKQWF3XMA8EXAMPLE /var/thanos/compact/compact/0@15699422364132557315/01HKZQK7TD06J2XWGR5EXAMPLE /var/thanos/compact/compact/0@15699422364132557315/01HKZYEZ2DVDQXF1STVEXAMPLE /var/thanos/compact/compact/0@15699422364132557315/01HM05APAHXBQSNC0N5EXAMPLE]: populate block: chunk iter: cannot populate chunk 8 from block 01HKZYEZ2DVDQXF1STVEXAMPLE: segment index 0 out of range"
1.29.4. 解决问题:Thanos compactor halts 复制链接链接已复制到粘贴板!
在对象存储配置中添加 thanos bucket verify
命令。完成以下步骤:
通过在对象存储配置中添加
thanos bucket verify
命令来解决块错误。使用以下命令在observability-thanos-compact
pod 中设置配置:oc rsh observability-thanos-compact-0 [..] thanos tools bucket verify -r --objstore.config="$OBJSTORE_CONFIG" --objstore-backup.config="$OBJSTORE_CONFIG" --id=01HKZYEZ2DVDQXF1STVEXAMPLE
oc rsh observability-thanos-compact-0 [..] thanos tools bucket verify -r --objstore.config="$OBJSTORE_CONFIG" --objstore-backup.config="$OBJSTORE_CONFIG" --id=01HKZYEZ2DVDQXF1STVEXAMPLE
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 如果上一个命令不起作用,则必须标记块以进行删除,因为它可能会损坏。运行以下命令:
thanos tools bucket mark --id "01HKZYEZ2DVDQXF1STVEXAMPLE" --objstore.config="$OBJSTORE_CONFIG" --marker=deletion-mark.json --details=DELETE
thanos tools bucket mark --id "01HKZYEZ2DVDQXF1STVEXAMPLE" --objstore.config="$OBJSTORE_CONFIG" --marker=deletion-mark.json --details=DELETE
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 如果您阻止删除,请运行以下命令清理标记的块:
thanos tools bucket cleanup --objstore.config="$OBJSTORE_CONFIG"
thanos tools bucket cleanup --objstore.config="$OBJSTORE_CONFIG"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow