I can’t think of a better way to kick this blog into gear than a good old-fashioned hair-on-fire story. Last week one day I was working on rolling some updates to a beta system for our internal users and ended up learning a rough lesson in VMware snapshots.
Before rolling out the change, I took a snapshot as usual. After doing the upgrade, I put the software into debug mode to make sure that everything was running smoothly. After being satisfied, I went back to the Virtual Infrastructure Client to remove the snapshot just in time to watch the snapshot fill up the disk. Apparently, the debug logs had been using disk so fast that I’d used up the disk writing changes. Dang, not quite what I had in mind.
Well, VMware shut down the guest for me, I removed the snapshot, and rebooted the guest. I was able to free up some space to get the guest booted back up and get out of trouble.. for the moment. Somewhere along the line, however, the guest still had a snapshot attached to it without it showing up in the Snapshot Manager. After just a few hours, I went back to make another snapshot and noticed the free space on the datastore that the guest was stored on had shrunk drastically.
After doing a bit of searching, I found a few others with this problem on VMware Fusion and VMware ESX. Apparently my guest was stuck in some awkward not-snapshotted-but-still-snapshotted state. There was a vmdk file for the disk as well as one with the same name with -000001 appended to the end which was growing, indicating that it was still under a snapshot yet nothing showed up in the GUI.
In the end, I had to perform the following steps to get out of this weird state:
1) shut down the guest
2) create a new snapshot
3) “Delete All” snapshots from the Snapshot Manager
4) start guest
Not cool, especially when it happens during production hours. Fortunately for me, this happened to an internal beta system, but still not cool by any stretch of the imagination.
In the end, I learned two things:
1) apparently VMware snapshots aren’t overly robust under high I/O loads as someone else in the VMware forums mentioned as well
2) never, ever fill up a vmfs volume (well, I re-learned this one)
UPDATE: I’ve since learned that in most cases, Virtual Center is the one timing out here, not the actual snapshot deletion process. If you have a little patience and wait for a while, eventually the snapshots delete themselves. At least, that seemed to work for me most of the time.
No Comments on “Raiders of the Lost Snapshot”
You can track this conversation through its atom feed.