The Situation:
1. We have two SuperMicro servers running ESXi 5.1, and do cross-backups each night between the servers using vkmfstools. Each server has Windows 2003 Server VM running Windows Services for UNIX Version 3.5, each with a system virtual hard drive, and a data virtual hard drive, which is an NTFS compressed volume that is shared NFS. All VMs are thin. This has combination has worked fine since 2008. I expanded the NFS volume on Server2 from 200 GB to 500 GB. I unmounted and remounted it afterward, and it shows 499 GB.
The Problem:
1. When Server2 backs up fine to the NFS volume on Server1, the clones go fine. However, when Server1 backs up to Server2, the largest VM (55GB thin and 80GB declared, fails at the 90% point on each attempt, even if I clear space, which it shouldn't need as the drive only has about 51GB of data on a 500 GB NFS volume, and the ESXi VMware Client shows the same amount of free space no matter which server I check it from.
Destination disk format: VMFS thin-provisioned
Cloning disk '/vmfs/volumes/datastore1/my_vm/my_vm.vmdk'...
Clone: 90% done.Failed to clone disk: There is not enough space on the file system for the selected operation (13).
All of the other VMs after it backup fine. It only fails on the larger VM, but I can back it up fine to a local directory.
2. When I check out the size of the virtual hard drive used to hold the NFS volume, it shows a provisioned size of 500 GB with an on disk size of 398 GB, which is odd, because the drive was only 200 GB before it was expanded because I couldn't backup the larger VM anymore. However, I never tried to copy anywhere near that much data to it that it would inflate it like that. It also only shows it using 51 GB, which is too small, even compressed.
Thoughts: Windows sees the large volume, the ESXi sees the large volume, but the usable space must not be any larger. I could just remove the NFS sharing from the volume and re-add it, but the size on disk of the thin vmdk shouldn't be anywhere near that large and the contents should be more than 51 GB, which made me wonder if there was something fishy with the .vmdk. That turnned out to be exactly the case.
Solution:
- Remove NFS volume as a data source on ESXi servers (if you skip this, you may have to reboot one or more esxi servers later)
- Inside Windows 2003 server remove NFS sharing on volume (if you skip this step, it confuses Windows, and ESXi won't be able to connect)
- In the VMware Client settings for the Wndows 2003 VM, remove virtual hard drive 2, and exit settings (behaves like unplugging a USB drive)
- Re-enter the VMware Client settings and make another larger virtual hard disk, and exit settings.
- Re-enter the VMware Client settings and remove the virtual hard drive I just made. (this leaves you with a virtual hard drive to copy)
- Log in SSH to the ESXi server and rename the nas-2 directory to nas-2.old
- Make new nas-2 directory
- vmkfstools -i admin2_1.vmdk -d thin ./nas-2/nas-2.vmdk
- delete admin2_1.vmdk (if you skip this step, ESXi will see it, and add virtual hard disk 3 instead of virtual hard disk 2)
- In the VMware Client settings for the Wndows 2003 VM, add new nas-2.vmdk virtual hard drive (shows provisioned 500 GB, space on disk 0)
- In 2003 Server, initialize new disk, do a quick format, with compression, set volume name, and drive letter. Share as nas-2, and add NFS sharing.
- Re-add new nas-2 as data sources to the ESXi servers
Lessons learned:
- The best way to expand a virtual hard drive in ESXi is to start with a new one.
- If the size of the vmdk doesn't make sense, there is something fishy with the vmdk.
- If vmkfstools says there is a problem, there is a problem, even if everything else says it's OK.