A couple of weeks ago I had a power failure in my place and unfortunately my vSphere cluster is not on UPS power (got it hooked up to 220v and haven’t had a chance to purchase a 220v UPS yet). Most of my machines came back fine but one of my machines wouldn’t boot…at all. It seemed to have lost the disk, but even more than that, BIOS couldn’t even see it. 

Initially I thought maybe the disk image file (vmdk) had corrupted somehow and ESXi wasn’t able to read/mount it to the machine any longer. After running several commands from the ESXi server to check the health of the vmdk, (some of them outlined here) it seemed that it probably wasn’t an issue with the disk file itself.

Once I concluded that it was not the file, I turned to discovering what could have happened with it to cause it to not even be visible to BIOS, much less bootable. After a bit of googling I came across this question/explanation of how to use Testdisk to search for and recover lost partitions. Unfortunately I found the step-by-step in that article/answer a bit wandering but it sent me down the right path. From there I found this VERY thorough write-up on how to use Testdisk (and more) to recover lost data on Linux. It’s a bit scary because the first step is to destroy whatever is left of the current partition table but after reading through the rest of the article I felt pretty confident of their skill level. Also, it was a server that didn’t really have any data on it that couldn’t be recreated (really just a worker server) but would have probably taken me 8-10 hours to rebuild so I gave it a shot. It probably wouldn’t be the worst idea to create a duplicate of the disk using dc3dd as outlined in the first article I linked to in this paragraph (askubuntu) if you’re concerned about completely losing the data on the disk.

I was able to download a desktop ISO of Ubuntu 16.04, boot up and use the “Try Ubuntu” option (only available on the desktop version, not available on the server version) to run the Live DVD. From there I was able to install Gparted (as indicated in the Dedoimedo article) to wipe out the partition table.

I did run into an issue once I got to the section on using Testdisk because I wasn’t able to get it installed. Turns out that you’ll have to enable the Universal Repository within Software Update before you’ll be able to find Testdisk to get it installed (https://askubuntu.com/questions/398335/how-to-install-testdisk-in-ubuntu-13-10). From there you should be able to get Testdisk installed and run through the procedure that Dedoimedo outlines in the article. 

From there BIOS recognized the drive (yay!) but Ubuntu still wouldn’t boot. It would get through POST and whatever additional BIOS there is within VMware machines but then would just sit at a blank black screen with “1234F:” displayed in the corner. With a small bit of googling (can you tell Linux is not my forte?) I found this question/answer on askubuntu which directed me here to run Boot-Repair. So once again I booted into the Ubuntu Live DVD and then ran Boot-Repair and rebooted. Success! My machine was back up and running like nothing had happened.

Even though I called them all out in the article, here are all of my sources once again. As with all of my write-ups, if any of the authors of the below articles ever reads this, THANK YOU for all of the help!!!