This documentation is for a release that is no longer maintained
See documentation for the latest supported version 3 or the latest supported version 4.7.3.4. 清理 CRI-O 存储
如果遇到以下问题,您可以手动清除 CRI-O 临时存储:
节点无法在任何 pod 上运行,并出现以下错误:
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to mount container XXX: error recreating the missing symlinks: error reading name of symlink for XXX: open /var/lib/containers/storage/overlay/XXX/link: no such file or directory
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to mount container XXX: error recreating the missing symlinks: error reading name of symlink for XXX: open /var/lib/containers/storage/overlay/XXX/link: no such file or directory
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 您无法在工作节点上创建新容器,并出现 “can’t stat lower layer” 错误:
can't stat lower layer ... because it does not exist. Going through storage to recreate the missing symlinks.
can't stat lower layer ... because it does not exist. Going through storage to recreate the missing symlinks.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
在集群升级后或尝试重启节点时,您的节点处于
NotReady
状态。 -
容器运行时实施 (
crio
) 无法正常工作。 -
您无法使用
oc debug node/<nodename>
在节点上启动 debug shell,因为容器运行时实例 (crio
) 无法正常工作。
按照以下步骤完全擦除 CRI-O 存储并解决错误。
先决条件:
-
您可以使用具有
cluster-admin
角色的用户访问集群。 -
已安装 OpenShift CLI(
oc
)。
流程
在节点上使用
cordon
。这是为了避免在节点处于Ready
状态时调度任何工作负载。当您的 Status 部分中存在SchedulingDisabled
时代表调度被禁用:oc adm cordon <nodename>
$ oc adm cordon <nodename>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 以 cluster-admin 用户身份排空节点:
oc adm drain <nodename> --ignore-daemonsets --delete-emptydir-data
$ oc adm drain <nodename> --ignore-daemonsets --delete-emptydir-data
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 注意pod 或 pod 模板的
terminationGracePeriodSeconds
属性控制恰当终止周期。此属性默认值 30 秒,但可以根据需要为每个应用程序自定义。如果设置为 90 秒以上,pod 可能会标记为SIGKILL
,且无法成功终止。当节点返回时,通过 SSH 或控制台连接节点。然后连接到 root 用户:
ssh core@node1.example.com sudo -i
$ ssh core@node1.example.com $ sudo -i
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 手动停止 kubelet:
systemctl stop kubelet
# systemctl stop kubelet
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 停止容器和 pod:
crictl rmp -fa
# crictl rmp -fa
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 手动停止 crio 服务:
systemctl stop crio
# systemctl stop crio
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 运行这些命令后,您可以完全擦除临时存储:
crio wipe -f
# crio wipe -f
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 启动 crio 和 kubelet 服务:
systemctl start crio systemctl start kubelet
# systemctl start crio # systemctl start kubelet
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 如果 crio 和 kubelet 服务启动,且节点处于
Ready
状态时,代表清理操作已正常工作:oc get nodes
$ oc get nodes
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输出示例
NAME STATUS ROLES AGE VERSION ci-ln-tkbxyft-f76d1-nvwhr-master-1 Ready, SchedulingDisabled master 133m v1.22.0-rc.0+75ee307
NAME STATUS ROLES AGE VERSION ci-ln-tkbxyft-f76d1-nvwhr-master-1 Ready, SchedulingDisabled master 133m v1.22.0-rc.0+75ee307
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 将节点标记为可以调度。当状态中不再有
SchedulingDisabled
时代表启用了调度:oc adm uncordon <nodename>
$ oc adm uncordon <nodename>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输出示例
NAME STATUS ROLES AGE VERSION ci-ln-tkbxyft-f76d1-nvwhr-master-1 Ready master 133m v1.22.0-rc.0+75ee307
NAME STATUS ROLES AGE VERSION ci-ln-tkbxyft-f76d1-nvwhr-master-1 Ready master 133m v1.22.0-rc.0+75ee307
Copy to Clipboard Copied! Toggle word wrap Toggle overflow