Friday, 20 May 2011

Unclaiming a device from ESX.


Need for unclaiming an ESX device usually arises when you want to change, the plugin claiming the device or paths to the device. For example if you want to mask a device, then you may need to first add the claimrules and then unclaim the claimrules that are currently acting upon the devices.

User needs to note that, path, adapter, plugin etc based  unclaims succeed only when device is free. In other words device should not be actively servicing IOs. If VMs are powered on, or there are IOs issued to a RDM disks, then the command is bound to fail. Unclaim often fails on local disks, as you may have scratch partition and dump partition configured on it.

There are different ways to unclaim a device.
You can uncalim claimrules on device basis as follows
~ # esxcli corestorage claiming unclaim  -t  device --device naa.6009999999999284000064c349cc3cd9

Claimrules can also be claimed on basis of device vendor names too.
~ # esxcli corestorage claiming unclaim  -t  vendor --vendor IBM

In ESX user can unclaim claimrules based on path too.
 esxcli corestorage claiming unclaim  -t  path --path vmhba2:C0:T0:L111

Less popular version are: 
Driver based unclaiming
~# esxcli corestorage claiming unclaim  -t  driver --driver qla2xxx

Plugin based unclaim.
There is also provision to unclaim devices on basis of plugin names.
~ # esxcli corestorage claiming unclaim  -t  plugin --plugin MASK_PATH

If all the claimrules are hard to remember, the you can try to unclaim all the devices in ESX.
ESX will try to unclaim all the claimrules working on non busy devices. Please note that this command will return device busy messages in most of the case as it tries to unclaim the local disk too,where user might have configured swap,dump and scratch partitions.
~ # esxcli corestorage claiming unclaim  -t location
Errors:
Unable to perform unclaim.  Error message was : Unable to unclaim paths.  Busy or in use devices detected.  See VMkernel logs for more information.

After unclaiming do not forget to load and run the new claimrules. Load and Run operations will read /etc/vmware/etc.conf file and apply the claimrules to unclaimed devices.
~ # esxcli corestorage claimrule load
~ # esxcli corestorage claimrule run

Saturday, 30 April 2011

Interesting storage startups

While VMware has captured a large market of server virtualization, it looks like there are some gaps in the io and storage virtualization areas. With more and more companies emerging in storage virtualization space, its time to see what are the innovations that can define future of storage.To have a glimpse of storage virtualization lets see some startups and their innovation.

StorSimple, develops solutions for  Hybrid Cloud Storage  for Windows and VMware infrastructure.
StorSimple offers has developed application-optimized hybrid cloud storage appliances to businesses and organizations that want to integrate the cloud securely and transparently into their existing on premises applications. The StorSimple 5010 and 7010 appliances have recently achieved VMware Ready™ status, passing rigorous VMware testing and interoperability criteria for use in production environments, and are now listed on the VMware Partner Product Catalog. 
See what StorSimple has to offer here.

Xsigo is the technology leader in Data Center I/O virtualization, helping organizations reduce costs and improve business agility. Xsigo’s  I/O Director consolidates server connectivity with a solution that provides unprecedented management simplicity and interoperability with open standards.Xsigo was actively used in VMwares VMworld's  demos to support IO for thousands of workloads.Xsigo helps enterprises to scale its virtual infrastructure  dynamically.It also reduces maintenance cost by reducing the number of failure points.

Fusion-io has been in enterprise market for quite some time now.It mainly deals with flash based PCIe cards.  The enterprise server vendors like IBM,HP and Dell are already shipping fusion IO based servers.All the latest updates on Fusion-io products is available here.

Apris aims to maximize application performance and minimize infrastructure costs in a data center by addressing I/O bottlenecks at the server and the storage array. The company offers a simple approach to provisioning and managing I/O resources by enabling the PCI Express protocol to traverse over the Ethernet data center fabric.You can find more information here.
VDI workloads are still a bottleneck in Desktop user experience.People see lots of performance degradation with desktop boot-storms.IO-Turbine main focus is to build innovation around VDI to reduce IO bottlenecks and application latency.
TINTRI:
A company that offers storage solutions for storing VMs.It is a VM centric storage appliance built for VMware.It also has inline de-dupe and compression.It actively uses SSDs for performance boost.
More info about Tintri.

Sunday, 24 April 2011

Different ways of configuring PSP RR for your ESX devices.

PSP RR is one of the best PSP(Path Selection Plugin) if you want to leverage multipathing in your SAN environment. PSP_RR  can help you gain higher throughput by scheduling IOs through multiple paths.

I will not discuss performance benefits here as there is a blog already on it. I would like to share information on different ways of configuring PSP RR for your SAN devices.

Currently there are three ways of configuring PSP RR in ESX4.1.
1. Change the default PSP of the SATP claiming your SAN device.  
2. Add a new SATP claimrule with PSP RR for device vendor and model.
3. Add a SATP claimrule with PSP RR with your device name.  

1. Change the default PSP of the SATP claiming your SAN device.
~ #  esxcli nmp satp setdefaultpsp  --satp=VMW_SATP_CX  --psp=VMW_PSP_RR                     

This is the most easiest way of configuring PSP_RR for your devices.By changing the default PSP of the SATP claiming your devices, you can configure all the devices in ESX to use PSP_RR. But this has a side effect. There might be some devices from different array vendor claimed by  the same SATP.This may lead to unexpected performance problems. You should use this option, only when you know what devices will be connected to your ESX host.This method is also documented in VMware KB article.
CleanUp:
Run the same claimrule, with default PSP name for the particular SATP.                                     
esxcli nmp satp setdefaultpsp --satp=VMW_SATP_CX --psp=VMW_PSP_MRU

2. Add a new SATP claimrule with PSP_RR for device vendor and model.
~ # esxcli nmp satp addrule  --satp=VMW_SATP_CX --psp=VMW_PSP_RR --vendor=myVendor  --model=mymodel
This is one of the best option that lets user select a PSP for a specific target.Lets say you have two arrays Array1 and Array2 both claimed by same SATP. If you want the devices corresponding to Array1 to be configured with PSP RR without affecting the devices from Array2, then this is the right option.This is one of the way by which you can mass configure devices with PSP RR for specific target.This will not change the default PSP for the SATP, but will insert a new SATP rule into the SATP rule list, for the target with specific Vendor and Model name.
Cleanup can be done using
 ~ # esxcli nmp satp deleterule  --satp=VMW_SATP_CX --psp=VMW_PSP_RR --vendor=myVendor  --model=mymodel 

3. Add a SATP-PSP claimrule with your device name.
When you have configured MSCS on your ESX host and using some of the LUNs for the MSCS cluster, then above to options are not the right ones. The reason SCSI3 reservations used by MSCS. There is a VMware KB article on this.When you want to configure a few of the specific devices with PSP RR then you can run
~ # esxcli nmp satp addrule  --satp=VMW_SATP_CX --psp=VMW_PSP_RR --device=naa.600a0b8000479284000004f04c8ddfa5
For cleanup:
~ # esxcli nmp satp deleterule  --satp=VMW_SATP_CX --psp=VMW_PSP_RR --device=naa.600a0b8000479284000004f04c8ddfa5
Few things to note.
For all the SATP rules to take effect, unclaim, load and run new SATP rules.
~ # esxcli corestorage claiming unclaim -t location
~ # esxcli corestorage claimrule load
~ # esxcli corestorage claimrule run
Newly added rules will be visible in the satp rule list.
~ # esxcli nmp satp listrules         
                                                                                                                      
The rules will be permanently added to /etc/vmware/esx.conf file.The changes will persist across the reboot.To undo the changes, use the esxcli commands as mentioned above.

Without host profiles you can use GUI to configure PSP RR on device basis only[Option 3].

Monday, 18 April 2011

Debugging a device in ESX using esxtop.



There is plenty of information in vmware and yellow-bricks blogs on advanced esxtop components. I believe you would get a very good insight on esxtop usage from these sites.
However I would like to discuss a small section of esxtop, which would be very helpful if you would like to debug your storage devices or watch the IOPS that are exchanged over multiple paths between ESX and storage devices(Good to check the performance of VMW_PSP_RR).
To get the stats in esxtop for storage device
1. Execute "esxtop" on ESX console.
2. Type "u" to view storage device stats.
3. If your devices are identified by  naa IDs, then  enter SHIFT+ L.
4. Type "36", this will give you complete id of the storage device.
5. Now this is the best part. If you want to watch IOs flowing through multiple paths especially if you have configured VMW_PSP_RR for the device then type "SHIFT+P". An option would appear on the top of "esxtop"  to enter the device ID. Enter the device ID of the device for which you want to view the stats.
And you would see all the paths and IOs flowing through them for the specific device.


Often you might have observed failures while unclaiming a device.This is because device might be actively servicing IOs, or some world has the device open. To know if a world has the device open, you can try option "e"(just e) after step 4.
This would again ask for device info. Enter the device name and hit enter.
This time you would see a list of worlds working on the device.


After you get the world id, you can grep the process information as follows.
~ # ps -u | grep 91559
91559 5124 vix-async-pipe hostd
In my case world '91559' that is using the device is hostd.

Saturday, 16 April 2011

Using autoclaim while connecting ESX host to existing fabric.


I had come across an interesting  scenario, a partner faced while testing ESX in their setup.Partner had a multi node ESX40 setup connected to a Active-Passive array.Customer isolated an ESX server running ESX40 from the SAN. Installed ESX41 and tried to plug the FC cables back into HBA slots on the ESX host. And suddenly all the shared LUNs on other ESX hosts trespassed to another storage processor[ESX41 too shared these shared LUNs].

The reason for this trespass was, two HBAs on each of the host were connected to two separate fabrics and the Storage Processor was connected to the fabric as shown below
HBA1 --------> Fabric1 ------------> SP1
HBA2 --------> Fabric2 ------------> SP2

When storage admin connected the HBA  corresponding to a standby SP to the SAN fabric first, ESX  trespassed the shared LUNs to Standby SP [ Activated Standby SP]  as it was the only available path for the shared LUNs as seen from ESX41 host.

ESX uses VMW_PSP_MRU for Active-Passive array , to avoid path thrashing.This results in all the ESX host sending IOs from common SP for shared LUNs.To avoid unwanted LUN  trespass while connecting a live ESX to an existing fabric  you can run below command before connecting to the SAN (on isolated ESX host).Disabling autoclaim will preventing ESX claiming further any new devices.
  esxcli corestorage claiming autoclaim --enabled false
After you restore SAN connectivity, you have to enable autoclaim by executing
esxcli corestorage claiming autoclaim --enabled true.

Do not forget to enable autoclaim. If not ESX will never claim any new devices, even when you initiate multiple rescans manually.The possible inputs autoclaim command can take are true,false,1,0,yes,no,y and n.

Disclaimer: Storage admins follow a different approach while connecting an ESX host to existing fabric. This experiment was done in a Customer's QA environment.Please try this out at your own risk :).

Masking a device from ESX.

There are different ways by which an administrator can hide LUNs from ESX.You can hide a device in ESX based on its path name, driver name, vendor/model or transport name.To hide a device ESX makes use of a plugin called MASK_PATH.This plugin is loaded into vmkernel by default.

Vendor Model based Device Masking.

To mask a device in ESX, based on it vendor/model you can use

~ # esxcli corestorage claimrule add -r 2000 -P MASK_PATH -t vendor -V IBM -M "1815"
~ # esxcli corestorage claimrule list
Rule Class Rule Class Type Plugin Matches
MP 2000 file vendor MASK_PATH vendor=IBM model=1815 



Masking the paths to the device.
Lets say you have devices with multiple paths and you do not want device to use some of the paths.Then you can mask the paths as follows.First find all the possible paths to the device

~ # esxcfg-mpath -b -d naa.600a0b80002a071c00006cb04a122230
naa.600a0b80002a071c00006cb04a122230 : IBM Fibre Channel Disk (naa.600a0b80002a071c00006cb04a122230)
 vmhba2:C0:T0:L22 LUN:22 state:standby fc Adapter: WWNN: 20:00:00:1b:32:10:46:c2 WWPN: 21:00:00:1b:32:10:46:c2 Target: WWNN: 20:04:00:a0:b8:26:4e:1a WWPN: 20:34:00:a0:b8:26:4e:1a
   vmhba3:C0:T1:L22 LUN:22 state:active fc Adapter: WWNN: 20:01:00:1b:32:30:46:c2 WWPN: 21:01:00:1b:32:30:46:c2 Target: WWNN: 20:04:00:a0:b8:26:4e:1a WWPN: 20:35:00:a0:b8:26:4e:1a
   vmhba3:C0:T0:L22 LUN:22 state:standby fc Adapter: WWNN: 20:01:00:1b:32:30:46:c2 WWPN: 21:01:00:1b:32:30:46:c2 Target: WWNN: 20:04:00:a0:b8:26:4e:1a WWPN: 20:34:00:a0:b8:26:4e:1a
   vmhba2:C0:T1:L22 LUN:22 state:active fc Adapter: WWNN: 20:00:00:1b:32:10:46:c2 WWPN: 21:00:00:1b:32:10:46:c2 Target: WWNN: 20:04:00:a0:b8:26:4e:1a WWPN: 20:35:00:a0:b8:26:4e:1a



Then if you want to mask a path "vmhba3:C0:T0:L22", execute following claimrule.

~ # esxcli corestorage claimrule add -r 2001 -P MASK_PATH -t location -A vmhba3 -C 0 -T 0 -L 22
~ # esxcli corestorage claimrule list
Rule Class Rule Class Type Plugin Matches
MP 2001 file location MASK_PATH adapter=vmhba3 channel=0 target=0 lun=22


Masking device based on its transport protocol.
If you want to mask a device on transport(i.e fc, sas, sata, ide ) then you can use [ I do not recommend using this, as it will mask all the devices which you may not intend to].

~ # esxcli corestorage claimrule add -r 2001 -P MASK_PATH -t transport --transport fc

~ # esxcli corestorage claimrule list
Rule Class Rule Class Type Plugin Matches
MP 2001 file transport MASK_PATH transport=fc

Masking device based on its driver.
There is also provision in ESX to mask devices that are using perticular driver.

~ # esxcli corestorage claimrule add -r 2001 -P MASK_PATH -t driver --driver qla2xxx

~ # esxcli corestorage claimrule list
Rule Class Rule Class Type Plugin Matches
MP 2001 file driver MASK_PATH driver=qla2xxx



Note: After adding the above claimrules, for the same to take effect you have to unclaim the existing claimrules that are acting upon the device. 

For unclaiming , based on the type of rule inserted run one of the unclaimrule.

~ # esxcli corestorage claiming unclaim -t vendor --vendor IBM
~ # esxcli corestorage claiming unclaim -t path --path vmhba3:C0:T0:L22
~ # esxcli corestorage claiming unclaim -t driver --driver qla2xxx



Then load and run the claimrules.

~ # esxcli corestorage claimrule load
~ # esxcli corestorage claimrule run
Note: For the new claimrules to come into effect, device should not be busy exchanging any IOs



Deleting the mask path claimrules.
Now the most important part, on deleting the claimrules.There are 'n' number of ways to add claimrules.But deleting part is the simplest one.First list the claimrules to know the rule ID.

~ # esxcli corestorage claimrule list
Rule Class Rule Class Type Plugin Matches
MP 0 runtime transport NMP transport=usb
MP 1 runtime transport NMP transport=sata
MP 2 runtime transport NMP transport=ide
MP 3 runtime transport NMP transport=block
MP 4 runtime transport NMP transport=unknown
MP 101 runtime vendor MASK_PATH vendor=DELL model=Universal Xport
MP 101 file vendor MASK_PATH vendor=DELL model=Universal Xport
MP 2001 runtime location MASK_PATH adapter=vmhba3 channel=0 target=0 lun=22
MP 2001 file location MASK_PATH adapter=vmhba3 channel=0 target=0 lun=22
MP 65535 runtime vendor NMP vendor=* model=*


Observe the highlighted rule.This was the rule added by me to mask a device path.

Delete the rule by running following esxcli commands

~ # esxcli corestorage claimrule delete -r 2001
~ # esxcli corestorage claiming unclaim -t path --path vmhba3:C0:T0:L22
~ # esxcli corestorage claimrule load
~ # esxcli corestorage claimrule run

A sample description on masking a path is also available in vmware KB article here

Thursday, 14 April 2011

Support for SSD on ESX4.0 and ESX4.1

With lot of enterprise SSDs in the market today, one question in the storage admins mind  is,
what performance benefits do I get, if I connect SSDs to my ESX servers.

Sadly ESX40 and ESX4.1 do not distinguish between a LUN  backed by SATA disks or SSD.Hence when SSD backed LUNs are presented to ESX server, ESX makes no effort to either use the high performace SSD disks or communicate the information to VC that, the LUNs which are presented by array are actually backed by SSDs.

Presence of SSD can essential help servers in many ways.
ESX can use the SSDs to store VM related information like swap files.This will essential improve VM performance. Other uses can be improve the boot time of  ESX.

With SSD support from ESX, usecases in virtualized environment are immense. Its just matter of time before everyone would see what VMware has to offer around SSDs.


Where did my ESX boot from?

Not many might have asked this question to themselves, but I have.
While working on testing multiple servers booted with ESX Visor, Thin installer, Stateless, Classic and some times, booted from SAN, it really matters a lot to know where is your system booted from.

The command to  your rescue is esxcfg-info -b


~ # esxcfg-info -b
49545178-05741713-8d31-0df7cceac42b

Another option that is useful while dealing with multiple flavors of ESX is
~ # esxcfg-info -e
boot type: visor-thin

Storage debugging on ESX4.0 and ESX4.1 using "esxcfg-mpath "


One of the best esxcfg command that I most often use in my ESX4.0 and ESX4.1 environment is esxcfg-mpath.Though you would find a lot of variant in VMware's furture implementations, never the less, usecases  of this command are tonnes.

For instance you want to know, what is the path status of the devices in your esx setup. You can use
esxcfg-mpath -b -d naa.600a0b8000479284000004e94c8dc18c
naa.600a0b8000479284000004e94c8dc18c : IBM Fibre Channel Disk (naa.600a0b8000479284000004e94c8dc18c)
   vmhba2:C0:T0:L100 LUN:100 state:active fc Adapter: WWNN: 20:00:00:00:c9:6f:06:f0 WWPN: 10:00:00:00:c9:6f:06:f0  Target: WWNN: 20:04:00:a0:b8:26:4e:1a WWPN: 20:34:00:a0:b8:26:4e:1a
   vmhba3:C0:T0:L100 LUN:100 state:standby fc Adapter: WWNN: 20:00:00:00:c9:6f:06:f1 WWPN: 10:00:00:00:c9:6f:06:f1  Target: WWNN: 20:04:00:a0:b8:26:4e:1a WWPN: 20:35:00:a0:b8:26:4e:1a

This would not only give you details on how many redundant paths you have for the device but also status of the paths.Depending on your array type and path health , the possible path states can be Active,Standby and Dead states.

If you ever want to know, what is the protocol through which ESX is accessing the device, then below command would be the right choice.Observe the Transport filed indicating "fc".
It indicates that the device is Fibre channel device.If it was a SAS ,IDE or SATA then you would see sas,ide or sata respectively.
esxcfg-mpath -l -d naa.600a0b80002a071c0000499249cc2130
fc.20000000c96f06f0:10000000c96f06f0-fc.200400a0b8264e1a:203400a0b8264e1a-naa.600a0b80002a071c0000499249cc2130
   Runtime Name: vmhba2:C0:T0:L31
   Device: naa.600a0b80002a071c0000499249cc2130
   Device Display Name: IBM Fibre Channel Disk (naa.600a0b80002a071c0000499249cc2130)
   Adapter: vmhba2 Channel: 0 Target: 0 LUN: 31
   Adapter Identifier: fc.20000000c96f06f0:10000000c96f06f0
   Target Identifier: fc.200400a0b8264e1a:203400a0b8264e1a
   Plugin: NMP
   State: active
   Transport: fc
   Adapter Transport Details: WWNN: 20:00:00:00:c9:6f:06:f0 WWPN: 10:00:00:00:c9:6f:06:f0
   Target Transport Details: WWNN: 20:04:00:a0:b8:26:4e:1a WWPN: 20:34:00:a0:b8:26:4e:1a


The above command would provide complete details of the  device paths.
Depending on the number of paths, the information would span across multiple lines.
If you want a consolidated details then use.

 esxcfg-mpath -L -d naa.600a0b80002a071c0000499249cc2130
vmhba2:C0:T0:L31 state:active naa.600a0b80002a071c0000499249cc2130 vmhba2 0 0 31 NMP active san fc.20000000c96f06f0:10000000c96f06f0 fc.200400a0b8264e1a:203400a0b8264e1a
vmhba3:C0:T0:L31 state:active naa.600a0b80002a071c0000499249cc2130 vmhba3 0 0 31 NMP active san fc.20000000c96f06f1:10000000c96f06f1 fc.200400a0b8264e1a:203500a0b8264e1a


You can even find the total number of paths to all you devices. Just run
 esxcfg-mpath -m | wc -l
18


Tuesday, 12 April 2011

Enabling Tech Support mode for ESX41

is
Follow the directions given in the above video to enable ssh login.This would open up the ssh port and let you run esxcli commands directly on ESXi console.
More details on this are available here.