June 14, 2023 · 9:42 am

And it only took 21 hours.

Linux 6.4 has a bug where it hangs on boot, but probably only 1 in 1000 boots (and rarer if using Intel hardware for some reason). It’s surprising to me that no one has noticed this, but I certainly did because our nbdkit tests which use libguestfs were randomly hanging, always at the same place early in booting the libguestfs qemu appliance:

[    0.070120] Freeing SMP alternatives memory: 48K

So to bisect this I had to run guestfish in a loop until it either hangs or doesn’t. How many times? I chose 10,000 boots as a good threshold. To make this easier I wrote a test harness which uses up to 8 threads and parses the output to detect the hang.

After a painful bisection between v6.0 and v6.4-rc6 which took many days I found the culprit, a regression in the printk time feature: https://lkml.org/lkml/2023/6/13/733

To prove it I booted Linux 292,612 times before the faulty commit (successfully), and then after (failed after under 1,000 boots).

Read More