Wednesday 13 June 2012

The setup of a fault tolerant citrix provisioning services solution and the tests performed to ensure high availability

Hi everyone, I recently completed the setup of citrix provisioning services for a client. It included two PVS servers running on Windows 2008 R2 enterprise.

The environment looked like this: 
  • 2 x Windows 2008 R2 enterprise running microsoft clustering services.
  • Clustered citrix licensing service. See how to complete here
  • Clustered citrix web interface. See how to complete here
  • Clustered NFS share for ISO’s for provisioned servers to boot.  The NFS is available for the ESX hosts.
  • Clustered file server for VDisk store, licenseing server, and application profiles. 
  • Clustered DFS to point to all shares used for the PVS environment
Here is a visual look at the setup:

This environment is the bee's knees for fault tolerance and high availability. Everything is available from two servers. There is only one single point of failure. This is within the SAN. To ensure high availability and fault tolerance I performed the following tests with the following outcomes. 


Test 1 - Stopping the PVS service on CITRIX-PVS-01.  
  • Not a problem. The provisioned servers just flick over to the other provisioing server.

Test 2- Does a rebalance occur when I restart the PVS service or a server is powered back on
  • Not a problem. It is set on the vDisks. I have set the subnet affinity to fixed and enabled rebalance with a trigger precent of 50. After 10 minutes it does rebalance and since I have 16 provisioned servers, 8 goto each server.

Test 3 - Failover of Server; shutting down one provisioning server (CITRIX-PVS-01).
  • Not a problem with 16 provisioned systems. It transferred the 8 systems to the other provisioning server. I did see a couple of provisioned systems not respond for about 10 seconds.
  • After powering it back up it did rebalance after 10 minutes.  

Test 4 - Failover of cluster services; moving the service CITIRIX-PVS-FS which hosts the VDisk repository, citrix license server, applications profiles, and ISO’s from CITRIX-PVS-01 TO CITRIX-PVS-02 
  • Not a problem and not noticeable to the user.

Test 5 - Taking the cluster service CITRIX-PVS-FS offline which hosts the VDisk repository, boot ISO’s, citrix license server and application profiles.
  • VDisk store freezes. Servers respond for about 1 minute then stop responding. From this point it doesn’t matter about the other results however, I did test them separately to simulate the failure of the single services: 
    • Taking the NFS offline has not effect on the systems loaded however, systems cannot be restarted and new systems cannot be started.
    • Taking the citrix license server offline forces the systems to go into grace periods.
    • Since the streamed applications cannot be accessed, new sessions cannot be opened to the streamed applciations. Not an issue with acitve streamed applications. 

Test 6 – Moving the citrix web interface from one node to another. The service (CITRIX-PVS-WI)
  • Not a problem. No effect on current sessions logged on via the web interface. If they try to load new applications they need to log back into the web interface. No issue with session sharing.

Test 7 - Failover of node (CITRIX-PVS-02). Effect on citrix web services and web interface.
  • No effect on citrix web services. Current applications stay active and new applications can be opened. No effect of session sharing.
  • The citrix web interface is effected the same as Test 6

Test 8 - Failover of both node members

  • As expected nothing is available. All streamed servers stop responding. I did have notpad running and that still reponded perfectly which was interesting.

Test 9 - Failover of both node members when I have a plublished application on.
  • From the results of Test 8, I would like to see the results from this test. I haven't created the XenApp 6.5 yet so I will need to wait to test this.

No comments:

Post a Comment