My target here is to describe little bit the concept of OFED and then how to install it. We need to install it for both Intel true Scale infiniband or Mellanox infiniband.
The OpenFabrics Enterprise Distribution (OFED) is a collection of InfiniBand hardware diagnostic utilities, the InfiniBand fabric management daemon, the Infiniband kernel module loader, as well as libraries and development packages for writing applications that use Remote Direct Memory Access (RDMA) technology.
What is RDMA:
Simply is a technology that let data being transferred from one Machine to another Machine with much less work being done with CPUs of either systems.
Our Scenario
We have True Scale infiniband in Master node and in all Compute nodes. Keep it in mind that the following procedure is quite same in Mellanox Infiniband. I will go through Mellanox OFED installation at the next step.
Installation/Configuration (Intel True Scale)
a. In Master server
we can check first to find out the type of infiniband we have in the system:
[root@qingcl-master ~]# lspci | grep -i infiniband
82:00.0 InfiniBand: QLogic Corp. IBA7322 QDR InfiniBand HCA (rev 02)
what we need here is : Intel True Scale Fabric Host Channel Adapter Host Drivers
which we need to download for our Centos7.2 the proper version. We download from:
And then untar it (tar -xvf IntelIB….) and go to the directory and simply execute following command:
./INSTALL
And I have choosen following options for my case.
Please Select Install Action (screen 1 of 3):
0) OFED IB Stack [Install][Available] 3.18.1.21
- 1) True Scale HCA Libs [ Install ][Available] 3.3.0.0.75029
- 2) OFED mlx4 Driver [Don’t Install][Available] 3.18.1.21
- 3) IB Tools [ Install ][Available] 7.4.1.0.5
- 4) OFED IB Development [ Install ][Available] 3.18.1.21
- 5) FastFabric [Don’t Install][Not Avail]
- 6) OFED IP over IB [ Install ][Available] 3.18.1.21
- 7) OFED IB Bonding [Don’t Install][Not Avail]
- 8) OFED SDP [ Install ][Available] 3.18.1.21
- 9) IFS FM [Don’t Install][Not Avail]
- a) MVAPICH (gcc) [Don’t Install][Available] 3.18.1.21
- b) MVAPICH2 (gcc) [Don’t Install][Available] 3.18.1.21
- c) OpenMPI (gcc) [Don’t Install][Available] 3.18.1.21
- d) MVAPICH/PSM (gcc) [Don’t Install][Available] 1.2.0-3635
- Please Select Install Action (screen 2 of 3):
- 0) MVAPICH/PSM (PGI) [Don’t Install][Available] 1.2.0-3635
- 1) MVAPICH/PSM (Intel) [Don’t Install][Available] 1.2.0-3635
- 2) MVAPICH2/PSM (gcc) [Don’t Install][Available] 2.1-1
- 3) MVAPICH2/PSM (PGI) [Don’t Install][Available] 2.1-1
- 4) MVAPICH2/PSM (Intel)[Don’t Install][Available] 2.1-1
- 5) OpenMPI/PSM (gcc) [Don’t Install][Available] 1.10.0-1
- 6) OpenMPI/PSM (PGI) [Don’t Install][Available] 1.10.0-1
- 7) OpenMPI/PSM (Intel) [Don’t Install][Available] 1.10.0-1
- 8) SHMEM [Install][Available] 3.3-75029.1218_rhel7_qlc
- 9) MPI Source [Install][Available] 3.18.1.21
- a) OFED uDAPL [Install][Available] 3.18.1.21
- b) OFED RDS [Install][Available] 3.18.1.21
- c) OFED SRP [Install][Available] 3.18.1.21
- d) OFED SRP Target [Don’t Install][Available] 3.18.1.21
so we chose above options and hit () to start installation. In our case it stoped due to the needs for some dependencies, so I installaed following packageds:
yum install tcl tk
and afterward it stoped again due to another dependency which was glibc.i686. It’s important to understand that eventhough our system is x86_64 based, but still we need to install glibc.i686
yum install glibc.i686
and again we got following error:
- Uninstalling previous OFED RPMs
- error: Failed dependencies:
- libibverbs.so.1()(64bit) is needed by (installed) qemu-kvm-10:1.5.3-105.el7_2.4.x86_64
- libibverbs.so.1(IBVERBS_1.0)(64bit) is needed by (installed) qemu-kvm-10:1.5.3-105.el7_2.4.x86_64
- libibverbs.so.1(IBVERBS_1.1)(64bit) is needed by (installed) qemu-kvm-10:1.5.3-105.el7_2.4.x86_64
- librdmacm.so.1()(64bit) is needed by (installed) qemu-kvm-10:1.5.3-105.el7_2.4.x86_64
- librdmacm.so.1(RDMACM_1.0)(64bit) is needed by (installed) qemu-kvm-10:1.5.3-105.el7_2.4.x86_64
- Unable to uninstall previous OFED RPMs
- Unable to Prepare OFED IB Stack for Install
- Hit any key to continue…
Just to give you a clue how to solve these kind of issues:
- yum provides libibverbs.so.1
- yum install libibverbs-1.1.8-8.el7.i686
- yum remove qemu-kvm-10:1.5.3-105.el7.x86_64
And then proceed.
The first things we need to make sure is that openibd is running without any problem.
[root@node601 ~]# systemctl status openibd
openibd.service – openibd – configure RDMA devices
Loaded: loaded (/usr/lib/systemd/system/openibd.service; enabled)
Active: active (exited) since Fri 2016-08-05 15:15:05 CEST; 4 months 4 days ago
Docs: file:/etc/infiniband/openib.conf
Main PID: 1013 (code=exited, status=0/SUCCESS)
Installation/Configuration (Mellanox)
a. we can check first to find out the type of infiniband we have in the system:
[root@fadmin1 ~]# lspci | grep -i mellanox
02:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
b. Same as before, we need to download OFED from mellanox website based on our OS distribution. We also can install it through a script (command) as I use it here:
#install Mellanox OFED
cd /tmp/software/mellanox/ofed/MLNX_OFED_LINUX-3.4-2.0.0.0-rhel7.2-x86_64/
./mlnxofedinstall –force -c ofed-hossein.conf –skip-kmp-verify
I copied the ofed-hossein.conf from ../docs/conf directory which comes with OFED package. You can change it based on your needs.
c. openibd service should be there and we can check the status of that afterward:
[root@fadmin ~]# systemctl status openibd.service
● openibd.service – openibd – configure Mellanox devices
Loaded: loaded (/usr/lib/systemd/system/openibd.service; enabled; vendor preset: disabled)
Active: active (exited) since Sun 2017-02-19 12:00:48 CET; 1 weeks 2 days ago
Docs: file:/etc/infiniband/openib.conf
Process: 1067 ExecStart=/etc/init.d/openibd start bootid=%b (code=exited, status=0/SUCCESS)
Main PID: 1067 (code=exited, status=0/SUCCESS)
CGroup: /system.slice/openibd.service
Feb 19 12:00:34 fadmin1 systemd[1]: Starting openibd – configure Mellanox devices…
Feb 19 12:00:38 fadmin1 openibd[1067]: Unloading HCA driver:[ OK ]
Feb 19 12:00:48 fadmin1 openibd[1067]: Loading HCA driver and Access Layer:[ OK ]
Feb 19 12:00:48 fadmin1 systemd[1]: Started openibd – configure Mellanox devices.