May
23
2017

Migrating a datacenter with PowerCLI – Introduction

For the last few months, I have been working on a project to migrate 1000 virtual machines from one datacentre to another datacentre. Both datacentres are 50KM in distance from each other. The building where the source datacentre is situated, will be stripped and will be rebuild conform current standards.
This means that every window, door, wall and all cables will be removed.
You can imagine that moving all the physical hardware (3 blades chassis, 42u of storage hardware and physical switches) and virtual machine from one datacentre to another datacentre is a huge operation. Because the virtual machine hosted in this datacentre are running production workloads, user impact should be minimal, and if user impact is expected, and migrations should be done during maintenance windows.
Both datacentres have their own vCenter servers (6.0), multiple vSphere (6.0) clusters and storage (Fiber Channel), and are connected at L2 level. The vCenter servers are members of the same SSO domain.[expand title=”Read more…”]
Because of this, I can use long distance vMotion to move the virtual machines from one datacentre to the other datacentre.
Of course, this can be done, using the vSphere web Client, but because of the number of virtual machines, we decided to write a Powershell script, that will do the job for us.
This script is scheduled to run during maintenance hours, and will read text file (batchXX.txt) to determine which virtual machines should be migrated.

In this blog series, I will explain the PowerCLI script I created, step by step, and eventually, how the migration went.

Table of content
Introduction
Chapter 1: Log function
Chapter 2: Notifications
Chapter 3: Reading VM attributes
Chapter 4: Storage space
Chapter 5: Move-vm
Chapter 6: Testing
[/expand]

May
5
2017

Install VMware PSC fails: vdcsetupldu failed. Error [9234] – User invalid credentials

I was setting-up a redundant VMware PSC setup stretching 2 datacenters. Every datacenter has 2 PSC and an load balancer.

Eventually, the virtual machines who run the PSC services, will run in a management cluster consisting of 3 nodes. These nodes are using Virtual SAN (VSAN) for storage.

I first installed 1 node with VMware vSphere and created on one of the SAS disks a datastore. Later on, the VMs will be moved to a VSAN datastore.

The first datacenter went as expected. No problems. But the second datacenter the installation of the PSC software failed with an error: Encountered an internal error
[expand title=”Read more…”]
PSC Error

Looking in the logfile vmafd-firstbood-py-xxxx_stderr.log, vdcsetupldu failed with Error [9234] – User invalid credential

PSC Error

I was sure that the password provided was ok. Diving deeper into the log files, I found that after installation, VMware Identity services starts and the install tries to make a LDAP connection on TCP 389, who would fail. I created a PowerShell script that would check TCP 389 every 5 seconds. I find out that eventually, I could make a connection on TCP 389, but the install already gave up.

Ok, so the services will start, but too late.

Looking at ESXTOP (best troubleshooting program for ESXi), I saw, that when the VMware Identity services start, the disk latency went up to 100ms. Could it be that the disk is slowing down the virtual machine, so that the installation would fail? I moved the virtual machine to a SSD disk, restart the installation, and guess what. Install successful.

So, probably the PSC installation program does not check if the service is available, before trying to login. I will fail, saying that the credentials are not valid, rather than saying that it cannot make a connection.[/expand]

Jun
22
2016

Nutanix .NEXT 2016 presentation: VCDX design for a 4000 seat Horizon View Deployment

Today I had the pleasure of presenting a session at Nutanix .NEXT 2016 in Las Vegas. My session “VCDX design for a 4000 seat Horizon View Deployment “ was about my VCDX journey, the choses that I had to make, and the problems I ran into.

NextPic01

Although this isn’t the first time I presented a session, this was the first-time in English, and for a room where 90% of the crowd is native, but I’m very pleased how it went. The room had a good vibe and I had good feedback.

Screen Shot 2016-06-22 at 01.27.37 Screen Shot 2016-06-22 at 01.27.24

I’ve always liked presenting. Sharing your knowledge and opinion, the feedback from the crowd. It always gives my positive energy.  This is definitely something I will be doing more in the future.

Although my presentation probably can be found on the .NEXT website, I promised to share the presentation. If you want a copy please contact me through twitter @WilmsenIT

Jun
17
2016

Do IOPS matter?

I’ve been designing VMware vSphere clusters for the last 10 years know. And with every design, the storage part is one of the most challenging. A improper storage design, results in an poor virtual machine performance.

Over the years, storage vendors added all kind of optimizations to their solution, in the form of cache. Almost every vendor added flash as an caching tear. Some only do read, but most of them do read and write.

With this cache, most vendors claim that there solution can handle 100.000 IOPS or more. But we all know, adding an flash drive who can handle 100.000 IOPS, won’t give you 100.000 IOPS in your vSphere environment.

We also have to deal with different block sizes, read/write ratio and write penalties.

What I see, is that most of the time, the storage processor, or the storage area network is the bottleneck.

 

In the near future, NVMe will be common in our datacenter, and the successor 3DXpoint will follow shortly. While NVMe can deliver 1,800 MB/sec read/write sequential speeds and 150K/80K random IOPS, 3DXpoint is a 1000x faster. Both solutions have an average latency of less than 1ms.

This is going to change the way we design our storage solution!

@vcdxnz001 wrote a great article about the storage area network and de speeds that are involved with the upcoming flash devices.

 

So, to get back to the question: Do IOPS matter?

If we have a full flash array, consisting of SSD, NVMe or 3DXpoint, IOPS are no longer the problem. All types of flash can deliver plenty of IOPS.

What matters is, latency. Ok, all flash devices are low latency (<1ms). But all IOPS have to be processed by the storage processor (in your array, or if you’re running a hyper-converged solution by your CPU) and by the storage area network. So these 2 components will determine your latency, thus the performance of your virtual environment.

Therefore IOPS are no longer an concern when you’re designing your storage.

When you design your storage solution, determine the highest latency you want to encounter and monitor this carefully. Design you whole stack, from HBA to the disk for low latency. If your latency is low, IOPS won’t be a problem.

Feb
16
2016

Determine your vSphere storage needs – Part 3: Availability, Security and Connectivity

 

This is the last part about the mini series: Determine your vSphere storage needs.

In this part, we’re going to cover 3 subjects:

  • Availability
  • Security
  • Connectivity

Although important, these aren’t the parts where you have many options.

Availability

When we talk availability in storage solution, where actually talking about high availability. Most (enterprise) storage solutions are in basis high available. Almost all components are redundant. Meaning that in case of a component failure, a second component takes over instantly. So this type of availability we don’t have to design. The part we may have to determine is when you storage solution is part of a Disaster Recovery (DR) solution. In this case we first have to determine the RPO and RTO for our environment. This is something you have to ask you business. How long may your storage (including all data) be unavailable (RTO), and when it’s unavailable, how many data may be lost in terms of second, minutes, hours of day’s (RPO)?

Let’s assume that our storage is located in 2 datacenters.

When you have a RPO of 0, you know you must have a storage solution that supports synchronous replication. Meaning, that if you write in one datacenter, this write is instantly done on the second datacenter. The writer (VMware vSphere) get’s a acknowledge back, when both storage solutions can confirm that write is finished.

This option is the most expensive option. This connection between the 2 datacenters has to be low latency and high capacity (depending on your change rate).

In most case synchronic replication also provides a low RTO. If and how the data becomes available in case of a DR depends on your storage solution. Active-Active you probably won’t have to do much. In case of a DR, data is instantly available. In case of active-passive, you have to pull the trikker to make the data available on the passive side. This can be done manually (through a management interface), or automatically by a script or VMware SRM.

When you have a RPO greater than 0, you have to option to go for asynchronous replication. In this case the write on the second datacenter can be acknowledge later than the first one. You also can replicate data once a hour, day, week, etc. The choice is yours.

If and how that becomes available in case of a DR, is the same as in the active-passive option in the pervious section (RPO=0).

Security

Most of the time, securing your storage solution is determine how you can access storage solution management interface and which servers can access your data. Most of the time, the storage network is a dedicated, non-routed network which cannot be accessed from external.

Normally, I advice a storage management server where the management software runs. If you make sure that this server is only accessible from your management workstations, your storage management is secure enough for most implementations.

To secure which servers can access your storage, depends on the protocol you’re using. To sum it up:

  • FC -> Based on WWN
  • iSCSI -> Based on IQN
  • NFS -> Based on IP address.

The choice of your protocol also determines the way you can secure your storage. Talk to your storage vendor about best practices how to secure your storage network.

Connectivity

And that brings us to the last part, connectivity.

As noted in the security part, with VMware vSphere we have 3 connectivity options:

  • Fiber Channel (FC)
  • iSCSI
  • NFS

So, what’s the best protocol? As always in IT, the answer to this question is: It depends.

It depends on your storage solution. Every storage solution is created with some principles. This makes this storage solution unique. These principles determine the best storage protocol for the storage solution. Of course, almost every storage solution supports 2 or more protocols, but only one performance best.  You probably know that FC is the fasted protocol, in theory. But what if you storage solution implemented NFS the most efficient? You probably going to choose NFS.

So ask your vendor. Especially if you made them responsible for the performance part, as discussed in part 1 of this series.

This ends this series of to determine your storage needs. Although you can design and determine a lot more, these series will give you a head start.