The last and first week of the year I normally reserve for some jobs around the house, some time with wife and kid and some studying.
This time the VCP5 exam was the goal. I combined this with the update of my vSphere4 administration and advanced administration course I give for Ictivity Training. This way I practice the new vSphere 5 features and have the course up-to-date.
Although I have given a vSphere5 administration course (with only the features of that where already available in vSphere4) there were quit some features I didn’t have the time to play with except in the VMware Vsphere: What’s new ESXi 5.0 and vCenter Server 5.0. This includes Storage profiles, Auto deploy and more.
As I follow @esloof on Twitter I read on his blog that the VCP5 exam is different than the VCP4 exam. With the previous VCP exam you had to know the maximum configurations out-of-your-head! There were quite some question: “What is the maximum size of a VMFS volume”. This is not the case any more. The exam is more based on global ESXi 5.0 knowledge with I think is a good thing.
The material I used for my preparation:
- VMware Education (by @esloof) VMware vSphere: What’s new ESXi 5.0 and vCenter Server 5.0
- Scott Lowe (@scott_lowe) Mastering VMware vSphere 5
- Duncan Epping (@DuncanYB) and Frank Denneman (@FrankDenneman) vSphere 5 Clustering Technical Deepdive
- And a lot of testing 🙂 in my home-lab
Gladly all the work payed-off and I passed my exam with a score of 411 where a minimum score of 300 is the passing rate.
As I’m VCAP4-DCD and VCAP4-DCA certified I will go for VCAP5 when it’s available.
After that maybe I will go for VCDX but I’m wondering if this certification brings me more interesting work here in the Netherlands. There aren’t many corporations in the Netherlands that need a VCDX for there environment.
So will we see. For now I’m glad I can call myself VCP5
While preparing for my VCP5 exam I was reading a lot of material including the upgrade guide.
As with all upgrade, not all upgrade paths are supported. That’s why I thought to make a list with some upgrade considerations.
- vSphere5 only comes with the ESXi hypervisor architecture. The main distinction is that ESX comes with a Service Console (also called SC or COS from Console Operating System) and ESXi only has a tech support mode (busybox implementation). If you want to perform command-line administration, you can use vCLI or PowerCLI.
- You can upgrade from ESX 3.5/4.x and ESXi 4.x installations to ESXi 5.0 preserving your VMFS partitions.
- New installation and boot devices options are:
- Unified Extensible Firmware Interface (UEFI) like USB
- Disk larger than 2TB if the system firmware supports it.
- A minimum of 2098MB or RAM is required.
- No graphical installer is available because this requires a Service Console.
- The (text-base) installer can be used for new installations or upgrades
- New partitions use the GUID partition table (GPT) instead of the MBR. GUID supports partitions size larger than 2TB.
- New installations create a 4GB scratch partition. Any remaining disk space is formatted as VMFS datastore.
- Rolling back to a previous version of ESX/ESXi is not supported.
- When using a kick-start script (ks.cfg) you can press Shift-O when the ESXi installer screen appears to edit the boot options and provide the kick-start script (Example: ks=nfs//192.168.1.10/vSphere-install/esxi5.cfg nameserver=192.168.1.10 ip=192.168.1.1 netmask=255.255.255.0 gateway=192.168.1.254)
- The default database for vCenter 5 is Microsoft SQL Server 2008 R2 express. This is bundled with the vCenter DVD. Requirements are:
- Microsoft Windows Installer version 4.5 (MSI 4.5)
- 4GB RAM
- 4GB Disk Storage
- 64 Bits Operating System
- A in-place upgrade on Windows XP is not supported
- When doing a in-place upgrade of your vCenter server, your vCenter server can be down for 40 till 50 minutes. During this downtime, DRS will not function. HA will.
- A in-place upgrade on a 32Bits Operating System is not supported. You have to perform a migration to a 64Bits Operating System.
- There is also a vCenter appliance available. This appliance is based on SUSE Linux Enterprise Server 11 (SLES11).
- Some configuration files are not migrated when performing a upgrade:
- /etc/sysconfig/mouse
- /etc/sudoers
- /etc/yp.conf
- Custom scripts that are added to /etc/rc.d
- Configuration files that are migrated are:
- /etc/vmware/esx.conf
- /etc/ntp.conf, ntp.drift, ntp.keys
- /etc/krb.*, /etc/krb5.*
- /etc/hosts, /etc/resolv.conf
- /etc/pam.d/*
- /etc/vmware/vmkiscsid/*
- Configuration files that are partially migrated are:
- /etc/passwd, /etc/shadow (only root and vpxusers accounts)
- When performing a new vSphere5 installation the default partition table that is used will be GPT. When performing a in-place upgrade the MBR partition format will be kept.
- When migration from ESX the Service Console network interface cards (NICs) are converted to VMkernel NICs and the Service Console port group is removed.
- Rule set files and customized firewall rules are not preserved.
- You cannot perform a in-place upgrade from ESX4.x to ESXi 5.0 when the ESX4.x was upgraded from ESX3.x.
- When performing a in-place upgrade from ESX 4.x the /boot partition has to have more than 350Mb of free space. If the hosts that you are upgrading does not have more than 250MB of free space in the /boot partition, use a scripted or interactive upgrade instead.
- You can preserve your (local) VMFS datastores when upgrading. Afterward you can upgrade your VMFS3 datastore to VMFS5.
“The server has poor performance because the server is running virtual.”
“Virtual server are always slower than physical servers.”
“My vSphere server has 60% CPU utilization but the performance of my Virtual Server is poor.”
When you are a VMware administrator you constantly have to defend your self that it doesn’t matter if a server is virtualized or that it running on physical hardware. But how can you troubleshoot these issues?
In these Performance Troubleshooting series I’m going to explain how you can troubleshoot performance problems in your virtualized environment.
This series will contain the following chapters:
- CPU
- Memory
- Storage
- Network
- Virtual Machine
If you think I left something out. Let me know!
Before you dive into the different chapters, first let me explain how esxtop works.
Running esxtop
You can run esxtop in 2 different ways. One directly in the ESXi Console (the so called busybox) or remote through the remote CLI command called resxtop.
In case you want to run esxtop on the ESXi console, login as root and start esxtop by typing the command esxtop followed by a enter.
esxtop
If you want to use resxtop start resxtop on a OS where the remote CLI commands are installed with the following parameters:
esxtop –server [dns or ip adres of ESXi host] –user [username probably root] –password [password of the user]
The option –user and –password are not required. If you don’t provide them as a parameter, you will be prompted for them.
Alter esxtop view
When esxtop is started (it doesn’t matter if this is remote or local) you can alter you view.
Option | Result |
V (Capital) | The esxtop screen will only view Virtual Machines. If V is pressed again all other worlds are displayed again. |
f | Alters the field list. This enables you to create a custom view. |
s | Changes the refresh time of the screen. This is default 5 sec. |
h | Help screen |
c | CPU view |
m | Memory view |
n | Network view |
i | Interrupt view |
d | Disk Adapter view |
u | Disk Devices view |
v | Disk VM view |
p | Power State view |
# | Limits the number of rows displayed in esxtop |
w | Write the alter config. If you just press enter after pressing w, the default configuration is altered. If you provide a path and file-name you can use this configuration the next time you start esxtop. You will have to provide the option -c with the path of the configuration file. Example: esxtop -c ~/myesxtopconf.cfg |
Running esxtop in batch mode
It’s also possible to start esxtop in batch mode. The results are saved in a CSV file. This can be done with the option -b. With the option -d you can specify the delay for the refresh and the option -n specifies the amount of iterations. Example:
esxtop -c ~/myesxtopconf.cfg -b -d 5 -n 10 > ~/esxtop-output.csv
In this example esxtop is started with a custom configruation file called myesxtopconf.cfg, in batch mode, with a delay of 5, and 10 iterations.
With resxtop you have to provide the IP or DNS name of the ESXi host.
The CSV file can be imported in Microsoft Perfmon for example. Note that if you have a large CSV file the import can take a very long time.
Even do vSphere is telling you that the overall CPU utilization is no more than 60%, this doesn’t indicated that your VM isn’t running low on CPU resources.
In this part where going to deepdive in troubleshooting CPU performance related issues on your vSphere environment. Although I’m using vSphere 5.0 for my screenshot, most of the options used also go for vSphere 3 and 4.
Key tool for CPU performance troubleshooting is esxtop or if your running your esxtop remote resxtop.
I my examples I’m using esxtop directly on the vSphere ESXi console but all of these command also work with resxtop.You just have to provide a extra parameter called –server on witch server you want to run the command. Of course you have to provide a username and password to get access to that server.
If we start esxtop on our vSphere ESXi host we will get the following screen:
First let me explain what we see here.
The first three lines:
1:06:05pm | The current time of your ESXi server. Notice that the time is in UTC. |
up 18 days 3:35 | How long your ESXi server has been up. |
307 worlds | a world is a process thats running in your VMkernel |
5 VMs | The amount of VMs running on your ESXi host |
12 vCPUs | The amount of vCPU provided to VMs |
CPU load average | The average CPU load per 5, 10 and 15 minutues. If the average load is higher than the amount of CPU cores, your system has not enough CPU recourses. |
PCPU USED(%) | Real-time amount of CPU usage per CPU core in percentages. As you can see my system is a 8 core system (2 quad core CPUs). AVG: is the average of all pCPU cores. (4,3 + 2,4 + 0,0 + 0,3 +5,1 + 1,7 + 2,7 + 3,3) / 8 = 25 |
PCPU UTIL(%) | Real-time amount of CPU utilizaton per CPU core in percentage. AVG is the average of all pCPU cores. (4,4 + 7,2 + 100 + 5,2 + 5,9 + 2,3 + 2,4 + 4,4) / 8 = 16 |
After the first three lines you will see a table with like the following:
So let me explain what we see there:
ID | The recourse world id. A world is an ESXi VMkernel schedulable entity, similar to a process or thread in other operating systems. | |
GID | The resource group world id. A group contains more worlds. If you press e in esxtop and enter the number of the GID, this GID will expand itself in multiple world’s for the same group. Every VM consists of minimum 4 worlds:
|
|
NAME | The name of the world or world recourse pool. | |
NWLD | The amount of world’s in the world recourse pool. | |
%USED | The percentage of physical CPU core cycles used by the recourse pool/world. | |
%USED = #vCPU*100% indicates that the VM occupies all the CPU cycles he can takes. Indicates that the VM is running at 100%. |
||
%RUN | The percentage of time scheduled. This value can be twice as large as %USED. | |
%RUN > %USED the pCPU is not running at its rated clock frequency. Probably due Power saving. | ||
%SYS | The percentage of time spend in the ESXi VMkernel on behave of the recourse pool/world to process interrupts and to perform other system activities. | |
If higher than 25 the VM is a high IO VM. If you are aware of this,OK. If not check other statistics. | ||
%WAIT |
|
|
%VMWAIT | ||
=%WAIT-%IDLE | ||
%RDY | The percentage of time the Resource pool/world was ready to run. | |
>20% indicated that the amount of pCPU cores is to low. | ||
%IDLE | The percentage of time the Resource pool/world was idle. | |
%OVRLP | The Percentage of system time that was spent on behalf of some other Resource Pool/World while Resource Pool/World was scheduled. | |
%CSTP | The percentage of time the Resource pool/world spent in ready, co-deschedule state. | |
>5% This accours when a VM as more vCPUs and one vCPU has to wait on another vCPU in order to catch up. | ||
%MLMTD | Percentage of time the ESX VMKernel deliberately did not run the Resource Pool/World because that would violate the Resource Pool/World’s limit setting. | |
%SWPWT |
This picture (who I have borrowed from a VMworld presentation) explains the overall relationship between the different variables.
Oke, now we now where the different variables stand for and what there relationship is. The next question will be, which variables do I have to monitor and what are there thresholds?
Variable | Threshold | Resolution |
%RDY | >10% | If higher than 10% for a long time, add more CPU cores tho your vSphere host |
%CSTP | >5% | This only occurs in a VM with more than 1 vCPU. Add more pCPU to the host or decrease the amount of vCPUs in the VM |
%MLMTD | >0% | If higher than 0% the vCPU is throttled because of CPU limits |
%SYS | >20% | If higher than 20% the VM is like a high I/O VM. Check guest OS for problems |
%RUN | >%USED | The pCPU is not running at its rated clock frequency. Probably due Power saving. |
So that’s what you need to know about monitoring you vSphere ESXi host with esxtop.
Just a quick post about something I noticed while configuring a new vSphere5 host with a Broadcom iSCSI Adapter. This is not a real iSCSI HBA but a vmnic with TCP/IP offloading. This offloading feature benefits the performance of iSCSI because the vmkernel doesn’t have to do as must work as with a normal vmnic without the offloading feature.
The Broadcom vmnic will show up as a vmhba in the configuration screen of your vSphere5 host.
no images were found
With the Software iSCSI client provided by VMware the IQN name of this adapter is created when you enable the adapter. This IQN name has a naming convention like: iqn.1998-01.com.vmware:hostname:string where hostname is the name of your vSphere host and string is a random value. Now with the Broadcom card the hostname is where the pain is. As the Broadcom card is enabled by default the hostname is detected as localhost (as seen in the screenshot).
Of course this is not what we want. As when we have a 32 vSphere host all showing up in your iSCSI storage as localhost, it’s now easy to see anymore witch iqn belongs to a vSphere host. Of course the string behind the hostname will be unique and you can document the iqn, but it’s easier and thus better manageable if the iqn contains the real hostname of the server who you have configured during installation.
As of now, the only way I figured out to change the hostname in the iqn name of your vmhba is to go to the properties of you vmhba, on the General tab click configure and change the iqn name in the iSCSI Name field.
no images were found
After doing so you have to to a rescan of the vmhba in order to update the information in your iSCSI target. If you have used the previous iqn name containing the hostname as localhost you have the setup your lun masking in your iSCSI target again.
Note: I have tested this behavior with Broadcom NICs, I don’t know if the behavior is similar on other NICs who have offloading features.