Saturday, 22 September 2012

How to Tag/Enable a device as SSD in ESXi.

ESXi 5.0 supports SSD disk. What does it mean?
You  might already rememeber some hints on SSD support prediction in the past on this blog.Yes, ESX now officially support SSD disks.With this support user now can plugin SSD into ESX ( vsphere5.0 onwards), and configure ESX to use the same for caching or storing VM swap files.To utilize the SSD disk for holding caching purpose , you need to create a datastore on the device and then you can select the %-age space to use it for esxi caching from 1-100%. You can observe the ESXi performance better then its predecessor.

That said here is the catch.Some times even though the disk is SSD ESXi can not recognize the disk as SSD. There is a reason for that, if the disk attached to raid controller , few raid controllers donot  give ESX  the access to the mode page ,to determine whether the device behind the RAID is  SSD or not. In such cases where device does not show up as SSD we have a provision  to Tag/Enable the disk as SSD. We can as certain claimrules in ESX for the disk which we are sure to be a SSD. You can also mark a magetic disk as SSD using this method for testing.

This can be done using the PSA claim rules,

#  esxcli nmp satp addrule --satp VMW_SATP_ALUA_CX -t vendor -V DGC -M "RAID 0" --option="enable_ssd"

--satp = SATP which claims the device  

Note: you can get the device satp by 
"esxcli storage nmp device list"
vendor -- vendor of the device
model  -- Model of the device
Instead of using vendor/model there are different options available in esxi 5.x to add SATP Claimrules.
You can use device,transport....etc.



Shell script to see module parameters in esxi

This is a simple shell script to find out what are the parameters does the modules loaded in esxi will accept from the user.

#!/bin/sh
i=`esxcli system module list|awk '{print $1}'|wc -l`
j=0
while [ $j -lt $i ]
do
j=$(( $j+1 ))
if [ $j -lt 3 ]
 then
 continue
fi
echo "$j"
module=`esxcli system module list|awk 'NR == '$j' {print $1}'`
echo $module
para=`esxcli system module parameters list -m $module `
echo "**************************************************************************************************" >> param.txt
echo "$module : " >> param.txt
echo "$para" >> param.txt
done

Wednesday, 25 April 2012

Why is the device status on ESX shown "degraded"?


Today I would like to touch upon a specific  field in command output "esxcli storage core device list" related to "Status" of the device.

~ # esxcli storage core device list
naa.90090160c1e01de150060160c1e01de1
   Display Name: DGC iSCSI Disk (naa.90090160c1e01de150060160c1e01de1)
   Has Settable Display Name: true
   Size: 0
   Device Type: Direct-Access
   Multipath Plugin: NMP
   Devfs Path: /vmfs/devices/disks/naa.90090160c1e01de150060160c1e01de1
   Vendor: EFG
   Model: ABD
   Revision: 0326
   SCSI Level: 4
   Is Pseudo: true
   Status: degraded
   Is RDM Capable: true
   Is Local: false
   Is Removable: false

This field for the device is specifically reserved for indicating the path status of the device.When the device has more than one path to the target( storage array), the path status is "on". When all the paths to the target are down ( either off / dead ) the device status is "dead". If  there is only one path to the target then the status is "degraded".

Also note that when the device is in PDL( Permanent Device Loss , This scenarios arises when a device is  unmapped from the storage array while  ESX was using the device) the status of the device is "not connected".

If ESX fails to recognize the state of the device ( if all above mentioned scenarios are not applicable) then the device status is "unknown".

Permanent Device Loss -Planned/Unplanned.



Planned versus unplanned PDL

A planned PDL occurs when there is an intent to remove a device presented to the ESXi host. The datastore must first be unmounted, then the device needs to be detached before the storage device is unpresented at the storage array.
1.  esxcli storage filesystem  list --> get the vmfs datastore name
2.  esxcli storage filesystem unmount -l Data_Store_Name --> unmount the datastore
3.  esxcli storage core device set --state=off -d naa.xxxx  --> set the device state to off and safely remove the device.

An unplanned PDL occurs when the storage device is unexpectedly unmapped from the storage array without the unmount and detach being executed on the ESXi host. To cleanup an unplanned PDL:
  1. All running virtual machines from the datastore must be powered off, easy way to do is, login to esxi , kill the vmx file of the respective vms.  and unregistered from the affected datastore. 
  2. From the vSphere Client, go to the Configuration tab of the ESXi host, click Storage.
  3. Right-click on the datastore being removed, and choose Unmount.

    A Confirm Datastore Unmount window displays. When the prerequisite criteria have been passed, the OK button appears.
  4. Perform a rescan on all of the ESXi hosts that had visibility to the LUN.
  5. After removing the device the paths to the device still shows "died", to remove the died paths to the device do rescan "esxcfg-rescan -d" or "esxcli storage core adapter  rescan  --adapter vmhbaxxx"

    Note
    : If there are active references to the device or pending I/O, the ESXi host still lists the device after the rescan..

Use of -b --boot option in esxcli storage nmp satp rule remove .


Use of -b boot option is to remove the system satp rules [default satp rules]  and add user defined rules.
User cannot add a satp rule with same vendor/model of default sapt rule, as esxi won't allow duplicate rules. So if the user still wants to add a rule with same vendor/model with extra options, user has to delete the system satp rules.

~ # esxcli storage nmp satp rule remove
Error: Missing required parameter -s|--satp

Usage: esxcli storage nmp satp rule remove [cmd options]

Description:
  remove                Delete a rule from the list of claim rules for the given SATP.

Cmd options:
  -b|--boot             This is a system default rule added at boot time. Do not modify esx.conf or add to host profile.

How to find supported esxcli commands in esxi5.0

Get all esxcli commands supported in  ESXi 5.0

~ # localcli esxcli command list
Namespace                                                   Command
---------------------------------------------------------------
storage.core.device                                        setconfig
storage.core.device.smart                               get
storage.core.device.stats                                get
storage.core.plugin.registration                        remove
storage.nfs                                                     list
storage.nmp.psp.generic.deviceconfig             set
storage.nmp.psp                                             list
storage.nmp.psp.roundrobin.deviceconfig        get
storage.nmp.psp.roundrobin.deviceconfig        set
storage.nmp.satp.generic.deviceconfig             get
storage.nmp.satp.generic.deviceconfig             set
storage.nmp.satp.generic.pathconfig                get
storage.nmp.satp.generic.pathconfig                set
storage.nmp.satp                                            list

:
:
:
:
....etc....

Friday, 20 May 2011

Unclaiming a device from ESX.


Need for unclaiming an ESX device usually arises when you want to change, the plugin claiming the device or paths to the device. For example if you want to mask a device, then you may need to first add the claimrules and then unclaim the claimrules that are currently acting upon the devices.

User needs to note that, path, adapter, plugin etc based  unclaims succeed only when device is free. In other words device should not be actively servicing IOs. If VMs are powered on, or there are IOs issued to a RDM disks, then the command is bound to fail. Unclaim often fails on local disks, as you may have scratch partition and dump partition configured on it.

There are different ways to unclaim a device.
You can uncalim claimrules on device basis as follows
~ # esxcli corestorage claiming unclaim  -t  device --device naa.6009999999999284000064c349cc3cd9

Claimrules can also be claimed on basis of device vendor names too.
~ # esxcli corestorage claiming unclaim  -t  vendor --vendor IBM

In ESX user can unclaim claimrules based on path too.
 esxcli corestorage claiming unclaim  -t  path --path vmhba2:C0:T0:L111

Less popular version are: 
Driver based unclaiming
~# esxcli corestorage claiming unclaim  -t  driver --driver qla2xxx

Plugin based unclaim.
There is also provision to unclaim devices on basis of plugin names.
~ # esxcli corestorage claiming unclaim  -t  plugin --plugin MASK_PATH

If all the claimrules are hard to remember, the you can try to unclaim all the devices in ESX.
ESX will try to unclaim all the claimrules working on non busy devices. Please note that this command will return device busy messages in most of the case as it tries to unclaim the local disk too,where user might have configured swap,dump and scratch partitions.
~ # esxcli corestorage claiming unclaim  -t location
Errors:
Unable to perform unclaim.  Error message was : Unable to unclaim paths.  Busy or in use devices detected.  See VMkernel logs for more information.

After unclaiming do not forget to load and run the new claimrules. Load and Run operations will read /etc/vmware/etc.conf file and apply the claimrules to unclaimed devices.
~ # esxcli corestorage claimrule load
~ # esxcli corestorage claimrule run