I have a weird problem here: I have a Sun v40z with ESX 4.0.0 build-504850 installed and the network cards stopped working. At first I thought it was a hardware issue but now I'm not so sure about this any more. When booting normally into ESX, on the service console I can see the following:
v40z1# mii-tool -v vmnic1
vmnic1: no link
product info: vendor 00:08:18, model 22 rev 2
basic mode: autonegotiation enabled
basic status: no link
capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
advertising: flow-control
Notice that both interfaces are reported with "no link", as if there was no cable plugged in. But when I rebooted into "Troubleshooting mode" (this loads the kernel from /boot/trouble), both interfaces appear to have link:
vmnic0: negotiated 100baseTx-HD, link ok
product info: vendor 00:08:18, model 22 rev 2
basic mode: autonegotiation enabled
basic status: autonegotiation complete, link ok
capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control
link partner: 100baseTx-HD 10baseT-HD
vmnic1: negotiated 100baseTx-HD, link ok
product info: vendor 00:08:18, model 22 rev 2
basic mode: autonegotiation enabled
basic status: autonegotiation complete, link ok
capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control
link partner: 100baseTx-HD 10baseT-HD
These interfaces are the onboard Broadcom NICs:
v40z1# lspci | grep Ether
02:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703 Gigabit Ethernet (rev 02)
02:03.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703 Gigabit Ethernet (rev 02)
24:01.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
However, in "Troubleshooting mode", no kernel modules can be loaded, because the installation got upgraded at some point in the past to kernel 2.6.18-238.ESX - but /boot/trouble/vmlinuz loads a 2.6.18-128.ESX kernel. And somehow all the kernel-modules for 2.6.18-128.ESX were nuked from /lib/modules. So, in short: while I have link in "Troubleshooting mode" I have no kernel modules to actually bring the network up. And in "normal mode" I have everything in place, except link on the NICs. Oh, what a cruel world.
An exact same v40z standing next to it is running fine with these NICs and the same Vmware ESX version. And the machine in question has been running fine in the past too. So yeah, a v40z is kinda old, so it might be a hardware issue after all. And the installed VMware version might not be the latest, but it was running fine for a year or so.
Any thoughts on this? In particular: does anybody know how I could obtain a 2.6.18-238.ESX Troubleshooting-kernel? Or, better yet: are there any magic boot options that would turn my 2.6.18-238.ESX kernel into a troubleshooting kernel?
Thanks,
Christian.