NAS:VMWare SRP Guide: Difference between revisions

From Wiki³
Line 4: Line 4:
(although the process should work under any RHEL / CentOS 7.x build)
(although the process should work under any RHEL / CentOS 7.x build)


This guide should cover all of the steps from Mellanox OFED Driver compiling/installing/configuring, to SCST compiling/installing, to adding ZFS on Linux (ZoL), and finally configuring the SCST with all of the above.  As well as the few ESX steps required to remove all of the inband drivers, install the OFED drivers, and be an active SRP initiator.
This guide should cover all of the steps from Mellanox OFED Driver compiling/installing/configuring, to SCST compiling/installing, to adding ZFS on Linux (ZoL), and finally configuring the SCST with all of the above.  As well as the few ESX6 steps required to remove all of the 'inband' drivers, install the OFED drivers, and be an active SRP initiator.


Not all of these steps are required for everyone, but i'm sure *someone* will appreciate them all together in one place :)
Not all of these steps are required for everyone, but i'm sure *someone* will appreciate them all together in one place :)
Line 12: Line 12:




== '''VMware ESX 6.0x SRP Initiator Setup''' ==
== '''VMware ESX 6.0.x SRP Initiator Setup''' ==
 


To be continued...


== '''RedHat/CentOS 7.3 SRP Target Server Setup''' ==
== '''RedHat/CentOS 7.3 SRP Target Server Setup''' ==
Line 23: Line 23:
They should be viable for Mellanox ConnectX-2/3/4 Adapters, with or without an Infiniband Switch.
They should be viable for Mellanox ConnectX-2/3/4 Adapters, with or without an Infiniband Switch.


'''NOTE:''' '''All''' Infiniband connectivity requires 'a' subnet manager functioning 'somewhere' in the 'fabric'.  I will cover the very basics of this shortly, but the gist of it is;  You want (1) subnet manager configured and running.  On this subnet manager you need to configure at least one 'partition'.  This acts like an ethernet VLAN, except that Infiniband wont play nice without one.  For the purpose of this guide you wont need more than one.  But...If you are on top of managing your subnet manager and partitions already, consider the pro's/con's of potentially creating one specifically for SRP traffic, and segmenting it from IPoIB, and binding all of your SRP only interfaces to that partition. 


The basic order you want to do things in is:  Install your base OS, and update it to current.  Recommendation is Minimal OS installation.  Highly recommended on OS installation that, you do NOT add any of the Infiniband or iSCSI packages that come with the OS.  I can't guarantee they wont get in the way somewhere down the line.  There may be some development type packages that show up as missing/required when making/installing, add them manually, and retry the step.
 
The basic order you want to do things in is:  Install either base OS, and update it to current.  Recommendation is Minimal OS installation.  Highly recommended on OS installation that, you do '''NOT''' add any of the Infiniband or iSCSI packages that come with the OS.  I can't guarantee they wont get in the way somewhere down the line.  There may be some development type packages that show up as missing/required when making/installing, add them manually, and retry the step.




Line 114: Line 116:
{{Console|1=
{{Console|1=
[root@NAS01 ~]$ yum install svn
[root@NAS01 ~]$ yum install svn
<nowiki>[root@NAS01 ~]$ svn checkout svn://svn.code.sf.net/p/scst/svn/branches/3.2.x</nowiki>
[root@NAS01 ~]$ cd /tmp
<nowiki>[root@NAS01 ~]$ svn checkout svn://svn.code.sf.net/p/scst/svn/branches/3.2.x/ scst-svn</nowiki>
}}
}}


'''Step 9:''' Install the SCST Package
{{Console|1=<nowiki>[root@NAS01 ~]$ cd /tmp/3.2.x/
[root@NAS01 ~]$ make 2perf
[root@NAS01 ~]$ cd scst
[root@NAS01 ~]$ make install
[root@NAS01 ~]$ cd ../scstadmin
[root@NAS01 ~]$ make install
[root@NAS01 ~]$ cd ../srpt
[root@NAS01 ~]$ make install
[root@NAS01 ~]$ cd ../iscsi-scst
[root@NAS01 ~]$ make install
</nowiki>
}}


 
Now if the correct Mellanox OFED drivers are loaded with Kernel support, and there is no conflicting 'inband' drivers that got in the way, then the SRPT install above, should have created a module called ib_srpt.  The 'whole' trick for me in getting this setup, was understanding that to get this 'right', the SCST make, depends on the OFED make, to have been made with Kernel support.


'''Step 10:''' Validate the correct ib_srpt.ko file is loaded for the module ib_srpt
'''Step 10:''' Validate the correct ib_srpt.ko file is loaded for the module ib_srpt
Line 125: Line 141:
}}
}}


Output should look like:
<span style="color:#00a800">'''Output should look like:'''</span>


{{Console|1=
{{Console|1=
Line 150: Line 166:
}}
}}


i.e. {{Note|1=/lib/modules/'''`uname -r`'''/extra/ib_srpt.ko  ''<-- Where'' ''''uname -r'''' is whatever comes up for you.  In my case '''3.10.0-514.el7.x86_64'''}}


If it looks like this, you have a problem with an 'inband' driver conflict.
<span style="color:#a80000">'''If it looks different, like the below example, you have a problem with an 'inband' driver conflict.'''</span>


{{Console|1=
{{Console|1=
<nowiki>
filename: <span style="color:#a80000">/lib/modules/3.10.0-514.el7.x86_64/extra/mlnx-ofa_kernel/drivers/infiniband/ulp/srpt/ib_srpt.ko</span><nowiki>
filename: /lib/modules/3.10.0-514.el7.x86_64/extra/mlnx-ofa_kernel/drivers/infiniband/ulp/srpt/ib_srpt.ko
version: 0.1
version: 0.1
license: Dual BSD/GPL
license: Dual BSD/GPL
Line 166: Line 182:
</nowiki>
</nowiki>
}}
}}
<span style="color:#a80000">'''If you have this problem'''</span>, Please perform the following steps... Manually remove the ib_srpt.ko file in whatever location it is.  Reboot.  Re-Make the SCST package per the above instructions.  Then re-check -> '''modinfo ib_srpt'''
<span style="color:#00a800">'''If you don't have this problem'''</span> You are getting close :) and have the foundations ready, we just need to setup the '''/etc/scst.conf'''
== Final Notes: ==
'''Working with the SCST Service:'''
Check the status, stop, start, restart the service.
{{Console|1=<nowiki>[root@NAS01 ~]$ service scst status
[root@NAS01 ~]$ service scst stop
[root@NAS01 ~]$ service scst start
[root@NAS01 ~]$ service scst restart</nowiki>}}
You want to restart the service, after making changes to '''/etc/scst.conf'''
Be aware when restarting the service, there will be a temporary disruption in all SCST presented traffic.
The SCSI Disks in ESX may disappear/reappear, '''this may take up to 60 seconds''' or so, while the SRP discovery/login process happens.  So if you have critical VMs/Databases running that have strict time-outs, you want to plan accordingly. 
I don't want to overstate the speed/flexibility that normally occurs during a quick 'add a quick LUN mapping' and restart.  I've tested it with a Windows 2016 Server with Resource Monitor open, and you can see it hang for a bit, and then wallah its back and life is good.  If that works for you, then don't worry about it.
However if you make a mistake in your /etc/scst.conf for example, and it takes you longer to get back up then you planned, well then, there's that...

Revision as of 20:17, 23 January 2017

VMware SRP Guide

This wiki is meant to help people get Infiniband SRP Target working under RedHat/CentOS 7.3 to VMWare ESXi 6.0x SRP Initiator. (although the process should work under any RHEL / CentOS 7.x build)

This guide should cover all of the steps from Mellanox OFED Driver compiling/installing/configuring, to SCST compiling/installing, to adding ZFS on Linux (ZoL), and finally configuring the SCST with all of the above. As well as the few ESX6 steps required to remove all of the 'inband' drivers, install the OFED drivers, and be an active SRP initiator.

Not all of these steps are required for everyone, but i'm sure *someone* will appreciate them all together in one place :)

For the purposes of this guide, the syntax assumes you are always logged in as 'root'


VMware ESX 6.0.x SRP Initiator Setup

To be continued...

RedHat/CentOS 7.3 SRP Target Server Setup

These instructions are meant to be used with: SCST 3.2.x (latest stable branch) as well as Mellanox OFED Drivers 3.4.2 (latest) [as the time of writing]

They should be viable for Mellanox ConnectX-2/3/4 Adapters, with or without an Infiniband Switch.

NOTE: All Infiniband connectivity requires 'a' subnet manager functioning 'somewhere' in the 'fabric'. I will cover the very basics of this shortly, but the gist of it is; You want (1) subnet manager configured and running. On this subnet manager you need to configure at least one 'partition'. This acts like an ethernet VLAN, except that Infiniband wont play nice without one. For the purpose of this guide you wont need more than one. But...If you are on top of managing your subnet manager and partitions already, consider the pro's/con's of potentially creating one specifically for SRP traffic, and segmenting it from IPoIB, and binding all of your SRP only interfaces to that partition.


The basic order you want to do things in is: Install either base OS, and update it to current. Recommendation is Minimal OS installation. Highly recommended on OS installation that, you do NOT add any of the Infiniband or iSCSI packages that come with the OS. I can't guarantee they wont get in the way somewhere down the line. There may be some development type packages that show up as missing/required when making/installing, add them manually, and retry the step.


Mellanox OFED Driver Download Page http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers

My Driver Direct Download Link for RHEL/CentOS 7.3 x64 (Latest w/ OFED) http://www.mellanox.com/page/mlnx_ofed_eula?mtag=linux_sw_drivers&mrequest=downloads&mtype=ofed&mver=MLNX_OFED-3.4-2.0.0.0&mname=MLNX_OFED_LINUX-3.4-2.0.0.0-rhel7.3-x86_64.tgz


Step 1: Install the prerequisite packages required by the Mellanox OFED Driver package

# [root@NAS01 ~]$ yum install tcl tk -y


Step 2: Download the Mellanox OFED drivers (.tgz version for this guide), and put them in /tmp

# [root@NAS01 ~]$ cd /tmp

[root@NAS01 ~]$ tar xvf MLNX_OFED_LINUX-3.4-2.0.0.0-rhel7.3-x86_64.tgz

[root@NAS01 ~]$ cd /MLNX_OFED_LINUX-3.4-2.0.0.0-rhel7.3-x86_64

NOTE: If you just run the ./mnlxofedinstall script, with the latest & greatest RedHat/CentOS kernel, you will fail 'later down the road' in the SCST installation, specifically in the ib_srpt module, which is required for this exercise.


Step 3: Initially run the mlnxofedinstall script with the --add-kernel-support flag (REQUIRED)

# [root@NAS01 ~]$ ./mlnxofedinstall --add-kernel-support --without-fw-update

NOTE: This will actually take the installation package, and use it to rebuild an entirely new installation package, customized for your specific Linux kernel. Note the name and location of the new .tgz package it creates.


Step 4: Extract the new package that was just created, customized for your Linux kernel.

# [root@NAS01 ~]$ cd /tmp/MLNX_OFED_LINUX-3.4-2.0.0.0-3.10.0-514.el7.x86_64/

[root@NAS01 ~]$ tar xvf MLNX_OFED_LINUX-3.4-2.0.0.0-rhel7.3-ext.tgz

[root@NAS01 ~]$ cd MLNX_OFED_LINUX-3.4-2.0.0.0-rhel7.3-ext

NOTE: In my example I'm using the RedHat 7.3 OFED Driver, so my file names may differ from yours.Look for the -ext suffix before the .tgz extension.


Step 5: Now we can run the Mellanox OFED installation script

# [root@NAS01 ~]$ ./mlnxofedinstall


Step 6: Validate the new Mellanox Drivers can Stop/Start

# [root@NAS01 ~]$ /etc/init.d/openibd restart

Unloading HCA driver: [ OK ]

Loading HCA driver and Access Layer: [ OK ]

NOTE: If you get an error here, about iSCSI or SRP being 'used', and the service doesn't automagically stop and start, then you have a conflict with a 'inband' driver. You should try and resolve that conflict, before you try and move forward.

Step 7: Validate thew new Mellanox Drivers using the supplied Self Test script

# [root@NAS01 ~]$ hca_self_test.ofed

Validate Output for me, looks like:

#  ---- Performing Adapter Device Self Test ---- Number of CAs Detected ................. 1 PCI Device Check ....................... PASS Kernel Arch ............................ x86_64 Host Driver Version .................... MLNX_OFED_LINUX-3.4-2.0.0.0 (OFED-3.4-2.0.0): modules Host Driver RPM Check .................. PASS Firmware on CA #0 HCA .................. v2.10.0720 Host Driver Initialization ............. PASS Number of CA Ports Active .............. 0 Error Counter Check on CA #0 (HCA)...... PASS Kernel Syslog Check .................... PASS Node GUID on CA #0 (HCA) ............... NA ------------------ DONE ---------------------


Step 8: Prepare to install the SCST Package

# [root@NAS01 ~]$ yum install svn

[root@NAS01 ~]$ cd /tmp

[root@NAS01 ~]$ svn checkout svn://svn.code.sf.net/p/scst/svn/branches/3.2.x/ scst-svn

Step 9: Install the SCST Package

# [root@NAS01 ~]$ cd /tmp/3.2.x/ [root@NAS01 ~]$ make 2perf [root@NAS01 ~]$ cd scst [root@NAS01 ~]$ make install [root@NAS01 ~]$ cd ../scstadmin [root@NAS01 ~]$ make install [root@NAS01 ~]$ cd ../srpt [root@NAS01 ~]$ make install [root@NAS01 ~]$ cd ../iscsi-scst [root@NAS01 ~]$ make install

Now if the correct Mellanox OFED drivers are loaded with Kernel support, and there is no conflicting 'inband' drivers that got in the way, then the SRPT install above, should have created a module called ib_srpt. The 'whole' trick for me in getting this setup, was understanding that to get this 'right', the SCST make, depends on the OFED make, to have been made with Kernel support.

Step 10: Validate the correct ib_srpt.ko file is loaded for the module ib_srpt

# [root@nas01# modinfo ib_srpt

Output should look like:

# filename: /lib/modules/3.10.0-514.el7.x86_64/extra/ib_srpt.ko license: Dual BSD/GPL description: InfiniBand SCSI RDMA Protocol target v3.2.x#MOFED ((not yet released)) author: Vu Pham and Bart Van Assche rhelversion: 7.3 srcversion: D993FDBF1BE83A3622BF4CC depends: rdma_cm,ib_core,scst,mlx_compat,ib_cm,ib_mad vermagic: 3.10.0-514.el7.x86_64 SMP mod_unload modversions parm: rdma_cm_port:Port number RDMA/CM will bind to. (short) parm: srp_max_rdma_size:Maximum size of SRP RDMA transfers for new connections. (int) parm: srp_max_req_size:Maximum size of SRP request messages in bytes. (int) parm: srp_max_rsp_size:Maximum size of SRP response messages in bytes. (int) parm: use_srq:Whether or not to use SRQ (bool) parm: srpt_srq_size:Shared receive queue (SRQ) size. (int) parm: srpt_sq_size:Per-channel send queue (SQ) size. (int) parm: use_port_guid_in_session_name:Use target port ID in the session name such that redundant paths between multiport systems can be masked. (bool) parm: use_node_guid_in_target_name:Use HCA node GUID as SCST target name. (bool) parm: srpt_service_guid:Using this value for ioc_guid, id_ext, and cm_listen_id instead of using the node_guid of the first HCA. parm: max_sge_delta:Number to subtract from max_sge. (uint)

i.e.

Icon/lib/modules/`uname -r`/extra/ib_srpt.ko <-- Where 'uname -r' is whatever comes up for you. In my case 3.10.0-514.el7.x86_64

If it looks different, like the below example, you have a problem with an 'inband' driver conflict.

# filename: /lib/modules/3.10.0-514.el7.x86_64/extra/mlnx-ofa_kernel/drivers/infiniband/ulp/srpt/ib_srpt.ko version: 0.1 license: Dual BSD/GPL description: ib_srpt dummy kernel module author: Alaa Hleihel rhelversion: 7.3 srcversion: 646BEB37C9062B1D74593ED depends: mlx_compat vermagic: 3.10.0-514.el7.x86_64 SMP mod_unload modversions​

If you have this problem, Please perform the following steps... Manually remove the ib_srpt.ko file in whatever location it is. Reboot. Re-Make the SCST package per the above instructions. Then re-check -> modinfo ib_srpt

If you don't have this problem You are getting close :) and have the foundations ready, we just need to setup the /etc/scst.conf


Final Notes:

Working with the SCST Service:

Check the status, stop, start, restart the service.

# [root@NAS01 ~]$ service scst status [root@NAS01 ~]$ service scst stop [root@NAS01 ~]$ service scst start [root@NAS01 ~]$ service scst restart

You want to restart the service, after making changes to /etc/scst.conf Be aware when restarting the service, there will be a temporary disruption in all SCST presented traffic. The SCSI Disks in ESX may disappear/reappear, this may take up to 60 seconds or so, while the SRP discovery/login process happens. So if you have critical VMs/Databases running that have strict time-outs, you want to plan accordingly.

I don't want to overstate the speed/flexibility that normally occurs during a quick 'add a quick LUN mapping' and restart. I've tested it with a Windows 2016 Server with Resource Monitor open, and you can see it hang for a bit, and then wallah its back and life is good. If that works for you, then don't worry about it.

However if you make a mistake in your /etc/scst.conf for example, and it takes you longer to get back up then you planned, well then, there's that...