Reboot or shut down vSAN node: Take your time!

While working at a customer’s site, I notice that a system administrator wanted to reboot a vSAN node. The time it took to shutdown the node was too long (30+ min) for the system administrator, so he used iDRAC to hard reset the host.
I asked him, why he just didn’t wait until the host was completely shut down?
He answered, that according to him, this didn’t matter, because ESXi was installed on a SD card. Before he hard reset the host, he makes sure the host is in maintenance mode, so no virtual machine is running on that host and vSAN data is guaranteed.
The system administrator noted that this SD card is only used during boot time.
After ESXi has booted, you can pull out the SD card and ESXi will continue to operate as normal. So in his opinion, he could easily hard reset ESXi, as long he ensured that no virtual machines a running on that host.

This statement is partially true, but before I dive into this statement, first a little background information.
[expand title=”Read more…”]
During ESXi boot, a RAMdisk is created. This RAMdisk is among other things used to store the VMkernel, modules and the scratch partition. The scratch partition is used for storing log files.
To ensure that log files are available after a reboot, during shutdown, these log files are copied to persistent storage (in this case a SD-card). During boot, the same log files are copied from persistent storage (SD-card) to the scratch partition on the RAMdisk.
So, the statement that the SD card isn’t used after the VMkernel is up and running is not completely true. During a shutdown of ESXi, the SD card is also used!

In this case, the customer was using VMware vSAN. vSAN generates vSAN trace log files that are stored on in the scratch partition.
Although the size of the log files is limited to 180 MB by default. For most SD cards, this can take up to 30 minutes to be copied.
As you can see in the screenshot below, vmhba32 is 100% used but is only copying data by 0.02 MB per Sec with an average latency of 66.25 ms.

So, what are your options to avoid long start-up of shutdown times?
1. Limit the size of your vSAN trace files. Do mind, in case of troubleshooting, you can miss some important information!
2. Store your log files on a syslog server or a central VMFS datastore.
3. Don’t use SD card for ESXi using VMware vSAN. A SATADOM is not that expensive and much faster than a SD card. Of course, you can also use a local (SSD) disk.

Final note: When shutting down your VMware vSAN node, take your time![/expand]

About Michael
Michael Wilmsen is a experienced VMware Architect with more than 20 years in the IT industry. Main focus is VMware vSphere, Horizon View and Hyper Converged with a deep interest into performance and architecture. Michael is VCDX 210 certified, has been rewarded with the vExpert title from 2011, Nutanix Tech Champion and a Nutanix Platform Professional.

RSS feed for comments on this post.

Leave a Reply

You must be logged in to post a comment.