More...
The problem remains. I use GhettoVCB latest to do thin clone backups to a Windows 2003 Server compressed volume, projecting the volume as an NFS share. GhettoVCB deletes the oldest backups, so there is no reason for the volume to run out of space, and indeed both Windows and ESXi agree on the amount of free space, and that there are many times what is required for the backup even if the backups were not thin. This is what I've observed:
1. After so many cycles, the same, and largest VM, will not backup anymore. The error from vmkfstools is that it is out of disk space, even though it shows it has almost a terabyte of free space. I'm assuming the reason the problem is with this VM is because it's larger than the rest all put together. (80 declared, 55 GB thin) I have other declared at 80 also, but they are much smaller from a thin standpoint. It will get to 90%+ complete before it fails.
2. When I run into the problem, no matter how many backups I erase, and no matter how much free space the NFS volumes shows I have, I can no longer backup the VMs. The error returned in vmkfstools is that there is not enough disk space, even when there is almost a terabyte free.
3. If I run chkdsk, no switches, from within Windows (read only), it shows no errors.
4. If I run chkdsk /f, and reboot, it finds and fixes errors. However, if I repeat the process, I will get the same errors on the same files no matter how many times I do it. This behavior is consistent for the system drive as well, which is not exported NFS and has no thin backups on it, and is not compressed. I also get the same behavior on my other 2003 Server VM. It seems chkdsk doesn't work properly inside of a VM.
5. The last process of chckdsk /f runs is recalculate the free space. After that, I can backup again.
6. New Information: If I run chkdsk /f and take the option to dismount the volume instead of running it at boot time, I do not get the errors, although I'm not certain yet if it will return the space as it does with boot version since it doesn't end with recalculating free space. Afterward I ran chkdsk /f and had it perform the tests during boot. However, it acted totally differently this time, and it came back with NO ERRORS. That's a first.
Thoughts:
- The fact that Windows and ESXi agree on the amount of free space is not impressive because NFS is not a file system, it is simply a protocol, and ESXi gets its information from Windows through the NFS protocol.
- It seems likely that vmkfstools actually does run out of disk space, even though Windows shows there is far more than enough.
- Since the problem is related to the number of cycles, I would theorize that the space is not being returned for use when a backup is deleted. This is supported by the fact that after chkdsk /f, I can backup again.
- I've been using this method of backing up since 2008 with no problems. The things that have changed recently is moving from ESXi 4.0 to 5.1, and updating to the latest GhettoVCB script. Since GhettoVCB simply uses VMware commands, the chances of if being script related seems remote.
- Take together with the fact that chkdsk /f does not work properly with the Windows VMs, there appears to be a compatibility issue between Windows and VMware.
- Since I have performed a chckdsk /f, it has started a new cycle and has not yet reached the point where it recycles backup space yet. As soon as the problem surfaces again, I plan to use article How to locate and correct disk space problems on NTFS volumes pointed to by Microsoft to attempt to determine the cause, and possibly implement a cure.
Parting Question:
Has anybody else had this problem, and if you found a way around the problem, what was it?
Thanks!