Fix – Telco Cloud Automation by VMware Broadcom, Containerd service will not start again after resize of disks

Published by Jimmy Mankowitz on 2024-12-112024-12-11

I recently had a case with a customer where we resized the disks for Telco Cloud Automation, and after a reboot for some reason the TCA services would fail to start again.
After a lot of troubleshooting, we came to the conclusion that there was something wrong with containerd not starting correctly.

This post is a workaround on how to get containerd started again.

The Error we saw was when running journalctl -xfe, there was a lot of noice coming from containerd not being able to run CRI service. Below is an example of the error:
1. Collect the logs for Containerd

journalctl -xe --unit containerd -f
Dec 10 13:06:13 tcamgr.catbird.local containerd[8635]: time="2024-12-10T13:06:13.836586150Z" level=fatal msg="Failed to run CRI service" error="failed to recover state: failed to reserve container name \"tca-database-admin-service_tca-database-admin-service-6cd59cf46f-thzt6_tca-mgr_79f08f9b-790e-471a-90ff-b663ecbb94a3_9\": name \"tca-database-admin-service_tca-database-admin-service-6cd59cf46f-thzt6_tca-mgr_79f08f9b-790e-471a-90ff-b663ecbb94a3_9\" is reserved for \"131388c0f5dfd2c73c35650c148b54148a3df321c38b6eeebd9c36f8ee534554\""

As we can see in the error message it says it is beeing blocked from starting any containers since it is already reserved to some other id.

So to fix this go to the following location and edit the config.toml file:
On the top row edit the disabled_plugins section and add the cri name: io.containerd.grpc.v1.cri

vi /etc/containerd/config.toml
[ /home/admin ]# cat /etc/containerd/config.toml
disabled_plugins = ["io.containerd.grpc.v1.cri"]
#disabled_plugins = []
imports = []
oom_score = 0
plugin_dir = ""
required_plugins = []
root = "/var/lib/containerd"
state = "/run/containerd"
temp = ""
version = 2

[cgroup]
  path = ""

[debug]
  address = ""
  format = ""
  gid = 0
  level = ""
  uid = 0

[grpc]
  address = "/run/containerd/containerd.sock"
  gid = 0
  max_recv_message_size = 16777216
  max_send_message_size = 16777216
  tcp_address = ""
  tcp_tls_ca = ""
  tcp_tls_cert = ""
  tcp_tls_key = ""
  uid = 0

[metrics]
  address = ""
  grpc_histogram = false

[plugins]

  [plugins."io.containerd.gc.v1.scheduler"]
    deletion_threshold = 0
    mutation_threshold = 100
    pause_threshold = 0.02
    schedule_delay = "0s"
    startup_delay = "100ms"

  [plugins."io.containerd.grpc.v1.cri"]
    device_ownership_from_security_context = false
    disable_apparmor = true

3. Restart the containerd service

systemctl stop containerd.service
systemctl start containerd.service

4. List the containers in containerd

ctr -n k8s.io containers ls

5. Delete all containers in containerd

ctr -n k8s.io c rm $(ctr -n k8s.io c ls -q

6. Stop Containerd service and reedit the config.toml file and change back the plugin value to the default.

vi /etc/containerd/config.toml
[ /home/admin ]# cat /etc/containerd/config.toml
disabled_plugins = []
imports = []
oom_score = 0
plugin_dir = ""
required_plugins = []
root = "/var/lib/containerd"
state = "/run/containerd"
temp = ""
version = 2

7. Start Containerd and then see that all pods will start to come back after some time.

systemctl start containerd.service 
ctr -n k8s.io containers ls

Best is to now reboot the TCA Manager completely to have all related services started in the correct order.

Post Views: 525

Fix – Telco Cloud Automation by VMware Broadcom, Containerd service will not start again after resize of disks

0 Comments

Leave a Reply Cancel reply

Upgrade to VCF 9.0 from VCF 5.2.x

Setup a VCF 9.0 Offline Depot and VCF Download Tool (VCFDT)

Passed VMware by Broadcom Cloud Foundation 5.2 VCP-VCF Architect Certification

Fix – Telco Cloud Automation by VMware Broadcom, Containerd service will not start again after resize of disks

0 Comments

Leave a Reply Cancel reply

Related Posts

Upgrade to VCF 9.0 from VCF 5.2.x

Setup a VCF 9.0 Offline Depot and VCF Download Tool (VCFDT)

Passed VMware by Broadcom Cloud Foundation 5.2 VCP-VCF Architect Certification