|
| |
August 14th, 2008 @ 8:34 am
If you are not aware yet, a major bug has been revealed in ESX 3.5 and ESXi 3.5 Update 2. Apparently, the beta was coded to expire on August 12, 2008 and this code failed to be removed from the actual release. Details are available in the VMware Knowledge Base and This Topic on their forums. You might also checkout This Post on Matthew Marlow’s blog for more information.
On the morning of the 12th, I was greeted with several errors like this one in the logs for our ESX cluster:

VMware finally released the patch really late Tuesday night, which, unfortunately kept me up most of the night getting our cluster patched. It involved setting back the clock on all of the hosts so Vmotion would work, manually migrating VM’s off of a host, going into maintenance mode, applying the patch via the command line, then migrating the VM’s back.
VMware is one of my favorite companies and it know for delivering rock-sold, enterprise-class products, so it really disappoints me that they would let something like this slip through the cracks. Imagine how many hundreds of thousands (Maybe millions?) of VM’s this affected. They do seem to be committed to fixing their mistake and making things right. You can check out the Letter From Their CEO for more info.
July 2nd, 2008 @ 2:12 am
I’m seeing an interesting, and seemingly recurring issue with one particular user’s Exchange calendar. The complain is always that when a new appointment is added to the calendar, it disappears a minute or two later. The event log on the Exchange Server always shows these errors:
Calendaring agent failed with error code 0×80040215 while saving appointment.
Calendaring agent failed to update the free/busy cache during an appointment save or delete operation.
Calendaring agent failed in message save notification with error 0×800703eb on XXXX@jfbc.org: /Calendar/test.EML.
I’ve tested, researched, tried to reproduce, etc, etc and am at a loss as to what’s causing this. The only thing I can think of is maybe it’s related to syncing the users Palm Pilot. Maybe the Palm device is allowing something to be input that Exchange doesn’t like?
I have figured out that there is some corrupt record that exists in the mailbox that’s causing this. Moving the mailbox to another database and then back clears things up for a while, then it pops back up. I’m curious if anyone else out there has seen anything like this?
June 4th, 2008 @ 11:06 am
Over the Memorial Day holiday, I had a “VMware Upgrade Party.” I’m not sure it was really a “Party” since I was the lone attendee, but I’ll call it one anyway. I got all of our ESX servers upgraded to the latest build of 3.5, as well as Virtualcenter to the latest 2.5 build. I also added our fourth ESX server, which is the diskless, boot from SAN box I talked about a couple of weeks ago.
I was a little bit hesitant to put the diskless box in production since a bug in the QLogic HBA firmware required me to run their beta or “Limited Release” firmware in order to do Jumbo Frames. So far though, it has been rock solid.
Below, you can see our current VMware environment. I get more and more excited about this every day. I can now have a new machine online in less than 20 minutes without adding any physical hardware. Awesome! We currently have one stand-alone ESX server, jfbc-ecc-esx03 that runs our virtual desktops. The other three servers are in an HA cluster sporting a total of 32GHz of CPU resources and 52GB of RAM. I like it! I hope to be able to add Vmotion and Distributed Resource Scheduling later this year so we can more effectively manage our host resources.

May 28th, 2008 @ 12:04 am
While everyone was away for the holiday Monday, I took the opportunity to upgrade our SAN and ESX servers. Everything went surprisingly well.
What was really impressive is how fast the Equallogic SAN reboots. The firmware upgrade was the first reboot since it was installed. They claimed you could reboot it “live” without causing any problems with the servers, but I had never tested that theory until now. I was sending it a series of pings every 1 second during the entire process. I dropped a total of 12 pings during the reboot and the servers never new the storage had just rebooted. Pretty impressive! Check this out (I did it from home, hence the 12-15ms latency):

I also migrated all of our ESX servers from version 3.0.2 to 3.5. For some reason, the HA agent had to be reconfigured on a couple of them, and the ESX firewall decided to block outbound iSCSI traffic on every box after the upgrade. Other than that, the ESX upgrades went great!
Out first diskless ESX server is no online also. The QLogic HBA initially wouldn’t connect to our SAN using jumbo frames. QLogic’s response was to send me their “Beta” or “Limited Release” firmware, which scares me a little. I have several production VM’s running on that host with no issues though. I hope to do some benchmarks on VMware Server vs ESX with software iSCSI vs ESX with hardware iSCSI. Stay tuned for details on that!
I love it when a project goes as planned!
May 25th, 2008 @ 3:34 am
I’ve received several comments and question on my post from a few days ago, “iSCSI Slow? I Think Not.” The network hardware is critical for peak iSCSI performance. I think a brief follow up with some details on our network configuration are in order.
We are using a Cisco Catalyst 6506 switch at the core of our network, which handles all of our iSCSI traffic. The current configuration looks like this:
- (1) WS-X6K-SUP2-MSFC2 with PFC2 supervisor module
- (2) WS-X6148A-GE-TX gigabit modules (connects all server and iSCSI devices)
- (1) WX-X6414-GBIC fiber module (backbone to all of our IDF’s)
All SAN ports are configured for Jumbo Frames and Flow Control.
The servers are HP DL360 G5’s with NC360T Nics. I just deployed a new ESX server with with a QLogic iSCSI HBA, but I don’t really have any benchmarks on that yet. I’ll post some details on that once I run some benchmarks. I’m interested in whether there will be a big performance increase over the ESX software iSCSI initiator.
May 23rd, 2008 @ 2:00 pm
Our new Xserve arrived yesterday. I got all the initial configuration done and got it racked. Apple definitely makes some “Pretty” servers.
Over the next few weeks, I’ll be getting Open Directory and Update Services configured and rolled out to all of our Mac workstations. At some point, we’ll also be installing Final Cut Server. I’ll be post updates as we get all of this configured. In the meantime, here’s a few pics:


May 18th, 2008 @ 2:45 pm
If you run any Linux guests under VMware, you’ve probably had issues with the clock in the VM drifting or just totally running away.
The Linux clock works by counting timer interrupts. In older kernels, this was usually done at a rate of 100Hz, or 100 times per second. Beginning with the 2.6 kernel, the interrupt timer is now set at 1000Hz, so interrupts are counted 10 times as often.
Due to the fact that VMware divides the host up into “time slots” for each guest OS, and depending on the system load, interrupts are often missed in the guest machines. The more often the guest kernel counts interrupts, the more apparent these “missed” interrupts become and the result clock skew in the gust machine. VMware Tools has the ability to sync the guest clock with the host, but this only occurs once per minute, and can only advance the clock, it can’t slow it down. Generally, the VMware Tools clock sync alone is not enough.
Here’s the steps that are needed in order to keep the clock skew under control (these apply to VMware Server running on a Linux host - in my case, CentOS). The guest OS changes will also apply to ESX.:
- VMware server needs to be told what clock speed the CPU(s) run at. This can be found by running “cat /proc/cpuinfo”, which will return all kinds of information about the CPU’s, including the clockspeed. You’ll need to edit /etc/vmware/config and add the following lines (where host.cpukHz is the host CPU speek in KHz (2.8GHz in my example below)
host.cpukHz = 2800000
host.noTSC = TRUE
ptsc.noTSC = TRUE
- VMware Tools needs to be installed in the guest OS. VMware provides instructions on how to install VMware Tools in a Linux guest here.
- VMware Tools time synchronization needs to be enabled. This is done by editing the VMX file in the virtual machine directory and adding the following line:
tools.syncTime = “TRUE”
Note that the host should use NTP to sync to an outside time source, while NTP should be disabled in each guest
- Now, we need to lower the interrupt frequency in the guest kernel. Generally, this will require installing the kernel source, modifying the CONFIG_HZ parameter to a rate of 100Hz, and then recompiling the kernel. CentOS has made this easy for us by releasing a “VM Optimized” kernel for CentOS 5. Although perfectly stable, this kernel is presently in the “Testing” repository. Here’s how to install the VM Kernel using yum in a CentOS 5 system:Add the “Testing” repo as follows:
cd /etc/yum.repos.d
wget http://dev.centos.org/centos/5/CentOS-Testing.repo
Now, install the VM Optimized kernel:
yum enablerepo=c5-testing install kernel-vm kernel-vm-devel
- Now, we need to make sure Grub is set to boot the new kernel, and also add the “clock=pit” parameter to the kernel boot options. We do that by editing /etc/grub.conf and making the following changes:
default=0
Where “0″ is the first kernel listed. If the VM Kernel is not the first item, you’ll need to adjust the value accordingly. For example, if it’s second in the list, you’d use “default=1″Now, add the clock=pit parameter to the kernel boot options. That section of the grub.conf file will look something like this:
title CentOS (2.6.18-53.1.19.el5) root (hd0,0)
kernel /vmlinuz-2.6.18-53.1.19.el5 ro root=LABEL=/ clock=pit
initrd /initrd-2.6.18-53.1.19.el5.img
Once all of the above changes are made, reboot the guest, and you should see significantly better clock performance. I had some VM’s where the time would drift by hours, and after making these changes, they stay within a few seconds.
May 16th, 2008 @ 5:53 pm
We worked for a while yesterday to get Bob’s Windows Mobile phone to sync with Exchange (Bob just joined our IT team - welcome Bob!). Without much luck. Bob is our first user with Windows Mobile. Everyone else uses Blackberry devices.
We use an ISA 2006 server in the DMZ with RADIUS authentication as a front-end server to Exchange. I initially added the Microsoft-Server-ActiveSync virtual directory to the list of paths in the existing ISA rule. We got errors about not having the correct privileges to do ActiveSync, which we obviously did have. After messing with this for a little while, I realized I needed to create a separate rule for the ActiveSync path and place it above my OWA redirect rule. I have a rule that allows the user to type in just http://webmail.jfbc.org and get automatically redirected to https://webmail.jfbc.org/owa. It seems that this rule was also redirecting the ActiveSync directory. Here’s what the “Correct” setup looks like in ISA server:

Apparently, that wasn’t the only issue. Next problem: It kept complaining about an incorrect username or password. Obviously, the username and password were correct. Some monitoring in ISA server revealed the authentication didn’t seem to be happening. All of the requests were marked as “anonymous.”
You won’t believe how simple this was. On the handheld, there are 3 boxes: username, password, and domain. We run split DNS, with JFBC.ORG as the internal domain name, so that’s what we entered. Turns out that ISA server wants the NETBIOS name instead, which is simply JFBC. It’s amazing how something so simple can create such a big issue.
May 14th, 2008 @ 4:15 am
Currently, our network at JFBC is about 15% Mac. One of the big ongoing projects I’ve been working on is better integration and management of the growing number of Macs in our environment. We currently leverage Active Directory for single signon, but, beyond that, there are no real management tools in place for Macs.
Some things are possible by extending the Active Directory schema to add some of the apple-specific LDAP attributes. However, this moves the AD environment into a somewhat “unsupported” configuration and still doesn’t provide for full control when it comes to Mac management.
The best way to fully manage the Mac clients - including centralized update management and general settings, including appearance, shortcuts, scripting, etc. is through the use of Apple’s Open Directory system. There was definitely some effort put forth on Apple’s part here, because Open Directory can fully integrate with Active Directory. Basically, AD gets used for authentication, then AD users and groups can be linked to OD groups. Specific management settings can then be applied to the OD groups.
I’ve just ordered a new Apple Xserve to handle this task, which should arrive next week. I’m excited about being able to take integration and management of our Mac environment to the next level.
Other Mac stuff on my radar:
- OS X Leopard deployment (Jonathan has agreed be my next victim beta tester).
- Office 2008 deployment.
- Possible Final Cut Server implementation (Already briefly discussed with our media team, will be exploring this further, including storage requirements).
- Migration of our closed circuit TV announcements from PowerPoint on Windows to Keynote on Mac (Currently working with our communications team on this).
Expect lots of Mac related posts in the coming weeks/months!
May 13th, 2008 @ 1:12 pm
We have had a VMware ESX cluster for a while now, but last night I put together our first diskless ESX server. I’m excited about this because it eliminates a failure point from the environment - the local disks in the servers. I’m using Qlogic iSCSI HBA’s and booting from a 10GB volume on our Equallogic SAN.
I got everything configured and tested last night. Today, it gets racked and added to our cluster in Virtualcenter. Here’s a few pictures:
No disks The machine on the bottom and the Mac are a test environment for our upcoming Windows 2008 and Mac OSX Leopard deployment. The procurve switch is just for testing on the workbench, once racked, it will be attached to our Cisco 6500 core switch.

It doesn’t even know there’s no disks (boots up really fast too)

VI Client showing specs of new machine - 8 x 2.5GHz cores and 20GB of RAM - lots of horsepower

It’s home once I rearrange a few things tonight. The 4 machines at the top are our current ESX cluster. The disk array just underneath is for disk-based backup. The SAN is in another rack.

|
| |