Podman container not starting after unexpected shutdown, restart - Readlink overlay storage no such file error

Podman container not starting after unexpected shutdown, restart - Readlink overlay storage no such file error

Show table of contents

Are you using podman containers in combination with systemd and are finding that they’re no longer working after an unexpected restart, reboot or power failure? .You’re probably getting an error which looks like this

podman[1930]: Error: readlink /var/lib/containers/storage/overlay/l/X27IKW4J5ZO2ZPWJARK26JBPYS: no such file

TL;DR; The solution might be easier than you think. Just pull the 'offending' image again with podman pull

Over the weekend something has happened on my testing environment which resulted in the failure of a certain podman container to start up. I had completely forgotten that friday to saturday the electricity fell a few times shortly. The server where I discovered this issue was a Raspberry PI. I had moved it one month ago from my desk to somewhere else because the RaspberryPI 4’s fan can sometimes make an irritating noise. The downside? No UPS for the new location..

I had figured out that systemd + podman could do the trick and that an additional monit was not needed. Plus, most apps are using Elixir so they’d have fault tolerance built in.

I was working in offline mode (similar to airplane mode) and as I commited my code to fossil the expected "could not sync" happened. The next day when I wanted to sync I and I thought that the fossil service had crashed.

After all crashes indicate a system which is being used and are a great source of learning.

Upon further inspection and restarting the service it still would’nt sync. I concluded that it wasn’t fossil that crashed it was actually some other component. A component which I thought would NEVER crash, and indeed it never crashes by itself.

The offender was haproxy! The first questions that came to mind were: Can I trust haproxy? and What had happened?

sudo systemctl status haproxy
● haproxy.service - Podman container-haproxy.service
   Loaded: loaded (/etc/systemd/system/haproxy.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2021-02-10 10:06:43 GMT; 1s ago
     Docs: man:podman-generate-systemd(1)
  Process: 2059 ExecStartPre=/bin/rm -f /run/container-haproxy.pid /run/container-haproxy.ctr-id (code=exited, status=0/SUCCE
  Process: 2060 ExecStart=/usr/bin/podman run --conmon-pidfile /run/container-haproxy.pid --cidfile /run/container-haproxy.ct
  Process: 2094 ExecStopPost=/usr/bin/podman rm --ignore -f --cidfile /run/container-haproxy.ctr-id (code=exited, status=0/SU

Feb 10 10:06:43 raspberrypi systemd[1]: haproxy.service: Service RestartSec=100ms expired, scheduling restart.
Feb 10 10:06:43 raspberrypi systemd[1]: haproxy.service: Scheduled restart job, restart counter is at 5.
Feb 10 10:06:43 raspberrypi systemd[1]: Stopped Podman container-haproxy.service.
Feb 10 10:06:43 raspberrypi systemd[1]: haproxy.service: Start request repeated too quickly.
Feb 10 10:06:43 raspberrypi systemd[1]: haproxy.service: Failed with result 'exit-code'.
Feb 10 10:06:43 raspberrypi systemd[1]: Failed to start Podman container-haproxy.service.

Well, the systemd logs where pretty unasuming. Podman wouldn’t bring up any logs of itself because I constantly killed the and recreated the container.

So I was stuck. What is going on? This is really interesting. Because in this case I have to conclude that podman is the culprit! I hadn’t altered anything in haproxy for the past 1 month.

I went on to inspect the whole journal haproxy service maybe we can find something there. .journalctl -u haproxy-service

Feb 10 10:06:40 raspberrypi systemd[1]: Failed to start Podman container-haproxy.service.
Feb 10 10:06:40 raspberrypi systemd[1]: haproxy.service: Service RestartSec=100ms expired, scheduling restart.
Feb 10 10:06:40 raspberrypi systemd[1]: haproxy.service: Scheduled restart job, restart counter is at 2.
Feb 10 10:06:40 raspberrypi systemd[1]: Stopped Podman container-haproxy.service.
Feb 10 10:06:40 raspberrypi systemd[1]: Starting Podman container-haproxy.service...
Feb 10 10:06:41 raspberrypi podman[1930]: Error: readlink /var/lib/containers/storage/overlay/l/X27IKW4J5ZO2ZPWJARK26JBPYS: no such file
Feb 10 10:06:41 raspberrypi systemd[1]: haproxy.service: Control process exited, code=exited, status=125/n/a
Feb 10 10:06:41 raspberrypi systemd[1]: haproxy.service: Failed with result 'exit-code'.

And soon enough I noticed something weird. Readlink error?

My raspberry pi seems to have "restarted" on the 6th. And it was missing some podman overlay file.

After extra searching and debugging I figured why not try to pull the haproxy image and see what happens.. AND IT WORKED!

Note
My other rootless pods seem to work fine, only the rootfull haproxy kicked the dust.

Later after investigating for culprits I saw that other people have had the same issue https://github.com/containers/podman/issues/5986 It seems that this issue occured with CRI-O containers aswell in the past

The best way to avoid this is to ensure that when you restart your Server or virtual machine that you use systemctl restart

It seems that this occured because of something being left in an unclean state. However, it’s unclear to me why the original image of the container might be missing or in an unclean state. Probably some bidirectional link which broke

I hope this doesn’t happen too often because In production it would certainly mean a headache.

Photo by Wynand Uys

Subscribe to my newsletter

NOTE:You will need to confirm your e-mail address in order to fully complete the subscription process.

The comments section is closed.

You might enjoy these similar articles: