xCAT Cluster Management

xCAT  is open-source management software developed by IBM and can be used for the deployment and administration of Linux-based Clusters. It can create and manage both Diskless and Diskfull nodes.

xcat1

As can be seen in the above figure, the Management node (Master node) which is running xCAT daemon is used for complete deployment of the cluster. The default database which is used is sqlite, but can be changed to mysql or postgres.

Understanding the Concepts

PXE (pre-boot eXecution environment) is a concept that enable hands free provisioning of nodes through network. What we need here is a Server (PXE server) and a client. Both nodes NICs should have PXE capability (also enable through Bios). The important point is that the Server should have DHCP and TFTP service (server) enabled. So when the client is turned on, it will send DHCPDISCOVER packets (this capability is part of client PXE firmware). The Server will answer with DHCPOFFER containing network info (IP add) and the address of TFTP server (usually is the same server). Afterward the Linux kernel and initrd will be transferred through TFTP protocol.

gPXE is another version of PXE which has more features and also enable nodes that do not have build-in PXE support to be booted from Network. The main difference for us here is that it adds several other protocols for transferring data beside TFTP such as HTTP. It is usually the case that here HTTP is used for transferring of kernel/initrd.

Xnba is another version of gPXE that has been a bit improved and is known as xCAT network boot agent. It usually uses HTTP protocol for transferring of kernel/initrd. It is important to note that in old xCAT versions simple PXE has been used, and this is the reason that in many documents tftp server in Master node mentioned. In new versions that usually Xnba is used, we don’t need tftp server, and by default httpd deamon is installed (instead) in master node. However as I explain later, still the directory that bootloader, configuration file of bootloader, initrd and osimage is located called /tftp/… and it should not make confusion.

Diskless vs Diskfull installation

In both case the clients (nodes) need to get the kernel/initrd through network from Matster node (PXE server). Lets assume we are using Xnba and as a result http is used for transfering data between Server and client. In both, we should have a directory in Master node as follow:

[root@hrz-master nodes]# pwd
/tftpboot/xcat/xnba/nodes

And inside we should have several files for nodes that need to be deployed for diskless/diskfull installation. These files define what exactly need to be loaded in row until the installation being completed. Let’s first have a look at Diskless corresponding file:

[root@hrz-master nodes]# cat /tftpboot/xcat/xnba/nodes/node01
#!gpxe
#netboot centos7.2-x86_64-compute
imgfetch -n kernel http://${next-server}/tftpboot/xcat/osimage/centos7.2-x86_64-netboot-compute-ig/kernel
imgload kernel
imgargs kernel imgurl=http://${next-server}:80//install/netboot/centos7.2/x86_64/compute/rootimg.gz XCAT=${next-server}:3001 NODE=node01 FC=0  BOOTIF=01-${netX/machyp}
imgfetch http://${next-server}/tftpboot/xcat/osimage/centos7.2-x86_64-netboot-compute-ig/initrd-stateless.gz
imgexec kernel

As can be seen, 3 things is loaded. First the kernel and second our file system which is zipped as rootimg.gz and then initrd-stateless.gz. In the diskless method, we install everything in rootimg directory (/install/netboot/centos7.2/rootimg/) and will be loaded to the RAM of the booted node.

Now let’s have a look at Diskfull installation.

[root@hrz-master nodes]# cat hadoop-01
#!gpxe
#install centos7.2-x86_64-storage
imgfetch -n kernel http://${next-server}/tftpboot/xcat/osimage/centos7.2-x86_64-install-hort/vmlinuz
imgload kernel
imgargs kernel quiet inst.repo=http://${next-server}:80/install/centos7.2/x86_64 inst.ks=http://${next-server}:80/install/autoinst/hadoop-01 ip=dhcp  BOOTIF=01-${netX/machyp}
imgfetch http://${next-server}/tftpboot/xcat/osimage/centos7.2-x86_64-install-storage/initrd.img
imgexec kernel

It will load first vmlinuz which is a linux kernel executable. It is bootable and means it can load entire OS into memory in order the node be usable. It is executable simply means it can run as a program. The difference with Diskless kernel is that here vmlinuz is compressed and plus is bootable. The reason which here we have bootable is due to the reason that it will be written later to Hardisk (/boot dir) and next time we boot node from Harddisk, it can load the whole system into memory. As can be seen, here we also use http protocol for transfering the data.

At the next level, it will load the whole OS from /install/centos7.2/x86_64 directory. This directory as we will see later is created by copycds command. But the important file is the kickstart file which is loaded for automatic installation/configuration of OS on harddisk from /install/autoinst/ directory.

Diskless Deployment

A stateless node is defined as one that has no“state” (configuration changes, software updates, etc.) stored temporary on RAM. This is extremely useful in a cluster for the following reasons: All nodes will have a much greater likelihood of staying consistent. And if the administrator does suspect that a node is out of sync with the rest of the cluster, they can simply reboot it and know that it is back in its original state.

In our setup we have following design:

  • One Master node (Centos 7.2)

It has 2 SSDs whcih configured with RAID 1 (Hardware raid config from LSI) for OS and Raid 6 for having 12 TB of space. We only have 1 Raid Controller to serve both Raid groups.

It also has 2 Ethernet ports and one BMC (IPMI) port. From the two Ethernet ports which we have, we will use one of them for External Network (login node IP add) and one for Internal Network. Since we do not have enough ports here, we will use the one for Internal Network, for both Deployment purpose (management) and IPMI network by configuring alias for IPMI network. We should not get confused with dedicated IPMI port and the port that will be used for IPMI network (Internal Cluster).

  • 4 Compute nodes (planning to have also Centos 7.2)
  • Intel True Scale Fabric Edge Switch 12300
  • One Manageable Ethernet Switch

In our scenario, I will use Master node also as Login node. So basically since our cluster is quite small, I use one server to play both roles.

Steps need to be followed very carefully.

*************************************************************************

Stage 1: Master node installation

1. The two 1 TB disks are configured as RAID1 in the RAID controller menu (so Hardware Raid configuration). We have a External LSI controller (not onboard).

2. Enter Centos 7.2 bootable USB and start installation.

Explanation: In case you don’t know how to easily and quickly create a bootable USB, use following command after downloaded Centos 7.2 iso file

[root@hrz iso]# dd if=/home/hossein/iso/CentOS-7-x86_64-Everything-1511.iso  of=/dev/sdc bs=8M && sync

The last part which is synching is very important, that make sure all is written to usb. Please check the integrity of the USB (second menu after usb booted) before starting the installation.

My recommendation is to chose GNOME Desktop and Add-Ons for Selected Environment during the installation:

  • Development Tools
  • Compatibility Libraries
  • GNOME Applications
  • Secuirty tools

I also recommend to enable the kdumb.

Explanation: kdump is a feature which is used to capture crash dumps when the kernel or system crash. When we enable it, part of the RAM is reserved for this purpose. So when kernel crash (panic) or system crash happened, we can analyze the crash dumps which is written to the disk to figure out the root cause of this failure.

3. We select to partition manually the Harddisk during the OS installation (Raid 1 which is /dev/sda). So we choose like this:

  •  512 MB for /boot with ext4
  • 25 GB for swap (2 times of RAM)
  • rest for / with xfs

and continue the installation and boot the system.

4. The first thing we do after the OS booted is to configure the interfaces. We configure the Interface which is used for Internal Cluster network and compute nodes are booting from it through network.

[root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-enp3s0f0

  • TYPE=Ethernet
  • BOOTPROTO=static
  • DEFROUTE=yes
  • PEERDNS=no
  • PEERROUTES=no
  • IPV4_FAILURE_FATAL=no
  • IPADDR=192.168.1.21
  • PREFIX=24
  • IPV6INIT=no
  • NAME=enp3s0f0
  • UUID=7f3925c2-25af-4cee-9c55-b7ba27466f18
  • DEVICE=enp3s0f0
  • ONBOOT=yes
  • NM_CONTROLLED=no

As can be seen several line has been added as well as modified. Several terms is quite important in the above configuration file.

Explanation:

Default route (gateway) is used for those packets which there is no specific route can be determined for them. So basically when the Master node internal routing table cannot find any destination for a packet, the default gateway is used. To find out the default gateway in our Master node we can use ‘route -n’ command. The line which has a ‘UG’ Flag is the default gateway.

DEFROUTE = yes  means set this interface as the default gateway for us. So we chose no here since this is only Internal interface.

PEERDNS = no  prevent  /etc/resolve.conf from being modified by a DHCP server. So recommend to have this for all interfaces and instead manually define /etc/resolve.conf file.

If the public facing NIC on your management node is configured by DHCP, you may want to set ”’PEERDNS=no”’ in the NIC’s config file to prevent the dhclient from rewriting /etc/resolv.conf. This would be important if you will be configuring DNS on the management node and want the management node itself to use that DNS. In this case, set ”’PEERDNS=no”’ in each /etc/sysconfig/network-scripts/ifcfg-* file that has ”’BOOTPROTO=dhcp”’. On the other hand, if you ”’want”’ dhclient to configure /etc/resolv.conf on your management node, then don’t set PEERDNS=no in the NIC config files.

Sine we want to use the same port also for ipmi network, we can create a alias as follow with different IP add:

[root@hrz-master network-scripts]# cp ifcfg-enp3s0f0 ifcfg-enp3s0f0-ipmi

[root@hrz-master network-scripts]# cat ifcfg-enp3s0f0-ipmi

  • TYPE=Ethernet
  • BOOTPROTO=static
  • DEFROUTE=no
  • PEERDNS=no
  • PEERROUTES=yes
  • IPV4_FAILURE_FATAL=no
  • IPADDR=192.168.2.21
  • PREFIX=24
  • IPV6INIT=no
  • NAME=enp3s0f0-ipmi
  • UUID=7f3925c2-25af-4cee-9c55-b7ba27466f18
  • DEVICE=enp3s0f0:0
  • ONBOOT=yes
  • NM_CONTROLLED=no

5. We disable the NetworkManager that at next boot the service do not come up in the Master node.

  • [root@localhost ~]# systemctl stop NetworkManager
  • [root@localhost ~]# systemctl disable NetworkManager

Explanation: NetworkManager is a service for providing detection and configuration for systems to automatically connect to network.

6. We also recommend to disable the selinux

  • [root@hrz-master ~]# cat /etc/sysconfig/selinux
  • # This file controls the state of SELinux on the system.
  • # SELINUX= can take one of these three values:
  • # enforcing – SELinux security policy is enforced.
  • # permissive – SELinux prints warnings instead of enforcing.
  • # disabled – No SELinux policy is loaded.
  • SELINUX=disabled

Explanation: SeLinux is a access control security mechanism which implemented in the kernel.

7. Then we change the hostname of the Master-node from localhost to intended name

a. First modify the /etc/hosts like this

  • [root@hrz-master ~]# cat /etc/hosts
  • 127.0.0.1 localhost
  • 192.168.1.21 hrz-master hrz-master.hpc.cluster

b. Second we modify the /etc/sysconfig/network

  • [root@hrz-master ~]# cat /etc/sysconfig/network
  • # Created by anaconda
  • NOZEROCONF=yes
  • HOSTNAME=hrz-master.hpc.cluster
  • NETWORKING=yes

The domain that we have chosen here is hpc.cluster and you will have free of choice off course for this.

We reboot the system at the next step and it must come up with the new name.

Explanation: Every time the system boots, the zeroconf route (169.254.0.0) is enabled. To disable the zeroconf route during system boot, we can put the NOZEROCONF=yes.

8. Installing the Epel repositoy is useful.

  • [root@hrz-master ~]# yum install epel-release

9. Installing the xCAT latest version.

I believe the best and easiest way is to download the repositories and then install xCAT from repository. You can download them from http://xcat.org. Two files we need here for our repository directory, One is xCAT-dep.repo and the other one is xCAT-core.repo.

[xcat-dep]
name=xCAT 2 depedencies
baseurl=http://xcat.org/files/xcat/repos/yum/xcat-dep/rh7/x86_64
enabled=1
gpgcheck=1
gpgkey=http://xcat.org/files/xcat/repos/yum/xcat-dep/rh7/x86_64/repodata/repomd.xml.key

The latest xCAT-core can be find here: https://xcat.org/files/xcat/repos/yum/latest/xcat-core/xCAT-core.repo

So in the same location (/etc/yum.repos.d) we create a file called xCAT-core.repo and copy following as we have downloaded.

[xcat-2-core]
name=xCAT 2 Core packages
baseurl=http://xcat.org/files/xcat/repos/yum/latest/xcat-core
enabled=1
gpgcheck=1
gpgkey=http://xcat.org/files/xcat/repos/yum/latest/xcat-core/repodata/repomd.xml.key

and then install the xCAT like this:

  • xcat1 # yum clean metadata
  • xcat1 # yum install xCAT.x86_64
  • we will see at the end somewhere following message:
  • Installed: xCAT.x86_64 0:2.12.1-snap201607070618

10. Now we need to Enter the xCAT path to our default shell PATH.

Explanation: we can see our current path by following ways:

  • [root@hrz-master ~]# env

and there is a line like this:

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin

or we can use following command:

  • [root@hrz-master ~]# echo “$PATH”
  • which shows only PATH=/usr/…. Line.

The PATH is an environment variable. It is a colon delimited list of directories that your shell searches through when you enter a command. All executable are kept in different directories on the Linux and Unix like operating systems.

So in order to add the needed xCAT directories to our shell (here is bash) we use Source command with a script (/etc/profile.d/xcat.sh) that come with xCAT installation.

  • [root@hrz-master ~]# source /etc/profile.d/xcat.sh
  • [root@hrz-master ~]# echo “$PATH”
  • /opt/xcat/bin:/opt/xcat/sbin:/opt/xcat/share/xcat/tools:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin

So as can be seen the new xcat directories are added to our path.

11. So by now the xCAT service should be up and running. We can check it by:

  • [root@hrz-master ~]# systemctl status xcatd.service
  • ● xcatd.service – LSB: xcatd
  • Loaded: loaded (/etc/rc.d/init.d/xcatd)
  • Active: active (running) since Thu 2016-07-14 09:19:02 CEST; 46min ago
  • Docs: man:systemd-sysv-generator(8)
  • CGroup: /system.slice/xcatd.service
  • ├─14457 /usr/sbin/in.tftpd -v -l -s /tftpboot -m /etc/tftpmapfile4xcat.conf
  • ├─14458 xcatd: SSL listener
  • ├─14459 xcatd: DB Access
  • ├─14460 xcatd: UDP listener
  • ├─14461 xcatd: install monitor
  • ├─14462 xcatd: Discovery worker
  • └─14463 xcatd: Command log writer

Verify that it is always come up with booting the system:

xcat1 # chkconfig –list xcatd

xcatd 0:off 1:off 2:off 3:on 4:on 5:on 6:off

or simply by following command:

[root@hrz-master ~]# systemctl list-units | grep -i xcatd

12. We can install “Development tools” if not already installed. We already installed it during the OS installation.

  • [root@hrz-master ~]# yum groups list installed | grep -i “Development Tools”
  • Development Tools

13. Now you should have a directory /install that’s being used by xCAT. Afterward, we create following directories that be used for xCAT.(mkdir …)

  • /install/iso
  • /install/custome
  • /install/software/rpm
  • /install/software/src

14. Creating netboot image: The following figure gives a good overview about the concept of creating a netboot image (diskless).

Selection_655

In summary, to create the root image for diskless provisioning:

  •  Unpack the ISO image of the OS distribution using the xCAT copycds tool.
  • Generate a stateless root image for net-booting the operating system from the management node using the xCAT genimage tool.
  • Modify or install whatever we need inside the image.
  • Pack the stateless root image into a compressed file that the net-booted nodes fetch using the xCAT packimage tool.

So practically we do following steps:

a. First copy the iso of Centos 7.2 (the one which we installed OS from) to the /install/iso directory. Then we use following command.

  • [root@hrz-master iso]# copycds -n centos7.2 -a x86_64 CentOS-7-x86_64-Everything-1511.iso

Explanation: The copycds command copies all contents of the ISO to a destination directory. The destination directory can be specified by the -p option. If no path is specified, the default destination directory will be formed from the installdir site attribute and the distro name and architecture, for example here will be: /install/centos7.2/x86_64/.

At this point we should have a default created Osimage for netboot (stateless) and statefull installation which can be seen by following command:

[root@hrz-master x86_64]# lsdef -t osimage
centos7.2-x86_64-install-compute  (osimage)
centos7.2-x86_64-netboot-compute  (osimage)

Afterward, we have a option to create local repository if needed to install anything in the image. However, we prefer to use Internet repositories. If needed can be done like this: (I don’t use this method here)

  • [root@hrz- master]# cat /etc/yum.repos.d/centos7-local.repo
  • [centos7-local]
  • name = Local repository for CentOS 7.x
  • baseurl = file:///install/centos7/x86_64
  • gpgcheck=0
  • enabled=1

We have a option inside the image settings (netboot) that can choose what to include or exclude when image is booting. These are 2 files which are called *.exlist and *.pkglist. There some default files which are coming with xCAT installation and we can copy them to our desired location, like following:

  • mkdir -p /install/custom/netboot/centos7.2
  • [root@hrz-master centos7.2]# cp /opt/xcat/share/xcat/netboot/centos/compute.centos7.* /install/custom/netboot/centos7.2

so now we have 3 files here which are:

  • [root@hrz-master centos7.2]# ls
  • compute.centos7.exlist compute.centos7.pkglist compute.centos7.postinstall

now if we want to include or exclude more packages we can do it here by editing the exlist or pkglist files. The default exlist file has some problem and I advise to check it properly that important things is not excluded.

b. Now we want to create our own OS definition that will be specific for this project. The advantage is that we can create different OS image for different group of nodes. The easiest or more efficient way is to create first the stanza file and modify it as we see later and then create the new OS definition from it.

The list of all osimage of the system (xcat by default):

  • [root@hrz-master centos7.2]# lsdef -t  osimage
  • centos7.2-x86_64-install-compute (osimage)
  • centos7.2-x86_64-netboot-compute (osimage)

Since we want to install the compute nodes in stateless mode, we need centos7.2-x86_64-netboot-compute (osimage). However the target is to create something like this but specific for our project (new name).

Explanation: what we mean with attributes of the image can be seen here:

  • [root@hrz-master centos7.2]# lsdef -t osimage -o centos7.2-x86_64-netboot-compute
  • Object name: centos7.2-x86_64-netboot-compute
  • exlist=/opt/xcat/share/xcat/netboot/centos/compute.centos7.exlist
  • imagetype=linux
  • osarch=x86_64
  • osdistroname=centos7.2-x86_64
  • osname=Linux
  • osvers=centos7.2
  • otherpkgdir=/install/post/otherpkgs/centos7.2/x86_64
  • pkgdir=/install/centos7.2/x86_64
  • pkglist=/opt/xcat/share/xcat/netboot/centos/compute.centos7.pkglist
  • postinstall=/opt/xcat/share/xcat/netboot/centos/compute.centos7.postinstall
  • profile=compute
  • provmethod=netboot
  • rootimgdir=/install/netboot/centos7.2/x86_64/compute

The above is the default one, and now we want to change that based on our needs. For example we want to change the exlist=dir to our own directory (/install/custome/centos7.2)

So the first step is to create the default stanza file. We can use following command for that:

  • [root@hrz-master centos7.2]# lsdef -t osimage centos7.2-x86_64-netboot-compute -z > /install/custome/netboot/centos7.2/compute.centos7.stanza

-z is for writing to a stanza file. What happening here is that the attribute of default image will be written to stanza file and will be saved here /install/custom/netboot/centos7.2.

we can see the stanza file here:

  • [root@hrz-master centos7.2]# cat /install/custome/netboot/centos7.2/compute.centos7.stanza
  • centos7.2-x86_64-netboot-compute:
  • objtype=osimage
  • exlist=/opt/xcat/share/xcat/netboot/centos/compute.centos7.exlist
  • imagetype=linux
  • osarch=x86_64
  • osdistroname=centos7.2-x86_64
  • osname=Linux
  • osvers=centos7.2
  • otherpkgdir=/install/post/otherpkgs/centos7.2/x86_64
  • pkgdir=/install/centos7.2/x86_64
  • pkglist=/opt/xcat/share/xcat/netboot/centos/compute.centos7.pkglist
  • postinstall=/opt/xcat/share/xcat/netboot/centos/compute.centos7.postinstall
  • profile=compute
  • provmethod=netboot
  • rootimgdir=/install/netboot/centos7.2/x86_64/compute

As can be seen, the stanza file is exactly the same as the default OS definition. So before creating our own OS image definition from this stanza file, we need to modify the stanza file based on our needs. The parts that are needed to change are exlist, pkglist and postinstall plus the name of OS image definition (added ig). We edit it by vim and afterwards it will look like this:

  • [root@hrz-master centos7.2]# cat /install/custome/netboot/centos7.2/compute.centos7.stanza
  • centos7.2-x86_64-netboot-compute-ig:
  • objtype=osimage
  • exlist=/install/custome/netboot/centos7.2/compute.centos7.exlist
  • imagetype=linux
  • osarch=x86_64
  • osdistroname=centos7.2-x86_64
  • osname=Linux
  • osvers=centos7.2
  • otherpkgdir=/install/post/otherpkgs/centos7.2/x86_64
  • pkgdir=/install/centos7.2/x86_64
  • pkglist=/install/custome/netboot/centos7.2/compute.centos7.pkglist
  • postinstall=/install/custome/netboot/centos7.2/compute.centos7.postinstall
  • profile=compute
  • provmethod=netboot
  • rootimgdir=/install/netboot/centos7.2/x86_64/compute

Now out of Osimage definition (stanza) that we created here, we can create a new osimage.

  • [root@hrz-master centos7.2]# cat compute.centos7.stanza | mkdef -z osimage=centos7.2-x86_64-netboot-compute-ig

look at the name, we added ig at the end. Now if we check the attribute of this new image we have what we wanted.

  • [root@hrz-master centos7.2]# lsdef -t osimage -o centos7.2-x86_64-netboot-compute-ig
  • Object name: centos7.2-x86_64-netboot-compute-ig
  • exlist=/install/custome/netboot/centos7.2/compute.centos7.exlist
  • imagetype=linux
  • osarch=x86_64
  • osdistroname=centos7.2-x86_64
  • osname=Linux
  • osvers=centos7.2
  • otherpkgdir=/install/post/otherpkgs/centos7.2/x86_64
  • pkgdir=/install/centos7.2/x86_64
  • pkglist=/install/custome/netboot/centos7.2/compute.centos7.pkglist
  • postinstall=/install/custome/netboot/centos7.2/compute.centos7.postinstall
  • profile=compute
  • provmethod=netboot
  • rootimgdir=/install/netboot/centos7.2/x86_64/compute

The next step is to create a stateless image. We can use a command called genimage which comes with xCAT. In the new version of the xCAT as we use here, we just need to give the name of the osimage definition and it will read all info from the osimage definition and will create our directory. The directory that it will use to create a stateless image is inside the osimage definition, which for us is pkgdir=/install/centos7.2/x86_64. This is the directory that in previous step by copycds has been created. So we need to run following command:

  • [root@hrz-master rootimg]# genimage centos7.2-x86_64-netboot-compute-ig

Explanation: The genimage command must be run on a system that is the same architecture and same distro with same major release version as the nodes it will be used on. If the management node is not the same architecture or same distro level, copy the contents of /opt/xcat/share/xcat/netboot/ to a system that is the proper architecture, and mount /install from the management node to that system. Then change directory to /opt/xcat/share/xcat/netboot/ and run ./genimage.

The rootimage directory here is:

rootimgdir=/install/netboot/centos7.2/x86_64/compute

as can be seen in the osimage definition. Therefore after successful execution of above command, we will have a directory created (/install/netboot/centos7.2/x86_64/compute).

There are several things located inside this newly created directory which are:

  • [root@hrz-master compute]# ls
  • initrd-stateless.gz  initrd-statelite.gz  kernel  rootimg

By running the above genimage command, as can be seen several things is created. The initial Ramdisk initrd-stateless.gz and kernel is important for us in stateless deployment of the nodes.

Explanation: The initial RAM disk (initrd) is an initial root file system that is mounted prior to when the real root file system is available. The initrd is bound to the kernel and loaded as part of the kernel boot procedure. The kernel then mounts this initrd as part of the two-stage boot process to load the modules to make the real file systems available and get at the real root file system. The initrd contains a minimal set of directories and executable to achieve this, such as the insmod tool to install kernel modules into the kernel.

Until now everything regarding preparation of rootimage for stateless deployment is done. At the next level we need to compresse (pack) the rootimage. For this we can use a command called packimage. This tool will compress (with gzip method) the content of rootimage which in our case is located in /install/netboot/centos7.2/x86_64/compute/rootimg and will create rootimg.gz in the compute directory. Before doing that, we can rsync our repositories from Maser node to image directory which we need for installing packages inside the image.

[root@hrz-master ~]# rsync -av /etc/yum.repos.d/ /install/netboot/centos7.2/x86_64/compute/rootimg/etc/yum.repos.d/

however we can remove the xcAT repos from image afterwards which we don’t need.

  • [root@hrz-master compute]# packimage centos7.2-x86_64-netboot-compute-ig
  • Packing contents of /install/netboot/centos7.2/x86_64/compute/rootimg
  • compress method:gzip

Until now, the first stage finished and we need to continue with configuring the xCAT tables.

**********************************************************************

Stage 2: Configuration (tables)

We have a database in xCAT which contains all the settings of the cluster. We can have different database applications such as SQLite and postgresql. The default database is SQLite which we will use here.

The database consists of a series of tables which make the management of the database much easier. The tables can be modified manually through tabedit command or can be used other commands such as chtab to be edited.

1. Site table: It is a table with Global settings for the whole cluster. By default, when we install the xCAT, it will fill most parts and we just need to take care of following parts:

  • [root@hrz-master ~]# tabedit site
  • “master”,”192.168.1.21″,,
  • “domain”,”hpc.cluster”,,
  • “forwarders”,”ExternalIPadd,192.168.4.50″,,
  • “nameservers”,”192.168.1.21″,,
  • “dhcpinterfaces”,”enp3s0f0″,,

Master is the IP address of the interface that cluster will be booted and deployed which is enp3s0f0 interface in our case. Forwarders are the IP add of external DNS servers, and nameservers are the local DNS server which will be our Master node. xCAT will automatically install DNS and DHCP server for us and there is no need for separate installation as I will explain later.

2. Networks table: I would say this is the most sensitive table and any minor mistake can cause huge problem. This table will set the whole Cluster’s network and set the important settings.

  • [root@hrz-master compute]# tabedit networks
    “hpc-net”,”192.168.1.0″,”255.255.255.0″,”enp3s0f0″,”192.168.1.21″,”192.168.1.21″,
  • “192.168.1.21”,”192.168.1.21″,”192.168.1.21″,”192.168.1.21″,”192.168.1.50-192.168.1.70″,
  • “192.168.1.22-192.168.1.30″,”1″
    ,,”hpc.cluster”,,”hpc.cluster”,,
  • “ipmi-net”,”192.168.2.0″,”255.255.255.0″,,”192.168.2.21″,”192.168.2.21″,,,,,
  • “192.168.2.50-192.168.2.70″,”192.168.2.22-192.168.2.30″,”1″,,”hpc.cluster”,,
  • “hpc.cluster”,,
  • “ib-net”,”192.168.3.0″,”255.255.255.0″,,,,,,,,,,,,”hpc.cluster”,,”hpc.cluster”,,
  • “external”,”172.16.0.0″,”255.255.0.0″,”enp3s0f1″,,,,,,,,,,,,,,,”1″

we can copy completely above to our networks table. The part which are important to understand is dynamic range and static range. The dynamic range is the range of IP add that compute gets during the genesis image boot, but afterward based on Mac address (or Switch table in case of automatic discovery) it will receive the fixed IP add from static range from DHCP.

As can be seen we chose the hpc.cluster as the internal domain name of the cluster. So I plan to have 3 subnets, one for Internal Cluster which is called hpc-net, one for IPMI network and one for Infiniband network. So alltogether we have 4 lines.

3. Passwd table: it contains the default Username/Password for the xCAT to access the cluster. xCAT will set this table automatically and we only need ro add the IPMI username/password.

  • [root@hrz-master ~]# tabdump passwd
  • #key,username,password,cryptmethod,authdomain,comments,disable
  • “system”,”root”,”RootPasswd”,,,,
  • “ipmi”,”UsernameOfIPMI”,”PasswordOfIMPI”,,,,

If the IPMI interface of nodes, already have default Username and Password, we can write it here, otherwise we can choose whatever we want and xCAT will rewrite it here.

4. nodelist table: Here we need to list all the nodes in the cluster and the group it belongs. The really important part is compute nodes which all are belonging to Compute group here in our case. Instead of filling this table manually, we can also use nodeadd command which is much easier for big clusters.

To fill it we can use following command. And for switch we manually can write it or also command as I wrote at the bottom.

  • [root@hrz-master compute]# nodeadd node-[01-04] groups=compute,ipmi
  • [root@hrz-master compute]# nodeadd cluster-switch groups=switch
  • [root@hrz-master compute]# tabdump nodelist
  • “node-01″,”compute,ipmi”,,,,,,,,,,,
  • “node-02″,”compute,ipmi”,,,,,,,,,,,
  • “node-03″,”compute,ipmi”,,,,,,,,,,,
  • “node-04″,”compute,ipmi”,,,,,,,,,,,
  • “cluster-switch”,”switch”,,,,,,,,,,,

5. hosts table: The main idea come from the fact that we want to map the nodes to corresponding IP addresses and need to do this by filling /etc/hosts file. One way offcourse is to fill /etc/hosts manually and in this case we do not need to touch hosts table at all. Another way which I believe is the best way, is by using the hosts table as we do here and then use a command called makehost which will read the info from hosts table and will automatically fill the /etc/hosts table for us.We can also take advantage of this table for filling the DNS table by using makedns command, as I will talk about it later.

For this purpose, we can directly edit the hosts table or use chtab command. For example for entering a line for our Master node we use following command:

[root@hrz-master ~]# chtab node=qingcl-master hosts.ip=”192.168.1.21″ hosts.otherinterfaces=”-ib:192.168.3.21″

For our compute nodes we can do following:

[root@hrz-master ~]# chtab node=compute hosts.ip=’|192.168.1.($1+0)|’ hosts.otherinterfaces=’|-ipmi:192.168.2.($1+0),-ib:192.168.3.($1+0)|’

or just for all of them manually adding following lines to the hosts table.

  • “cluster-switch”,”192.168.1.100″,,,,
  • “compute”,”|192.168.1.($1+0)|”,,”|-ipmi:192.168.2.($1+0),-ib:192.168.3.($1+0)|”,,
  • “hadoo-master2″,”10.151.50.1″,,”-ipmi:10.158.50.1,-ib:10.157.50.1″,,

As can be seen, I used regular expression here which will save our time. The first column in the row says take the nodegroup which is compute as we already defined it in nodelist table. The second column in the row is the regular expression substitution. Here we are taking every non-digit (\D+) and then capturing the digit portion of it in (\d+). We then create an IP address from the host name by using the same suffix that we captured in the matching (\d+) portion. For example, node010 would have the 010 captured in the first part. The substitution part would take 010 and append it to 192.168.1. We add 0 to 010 to get rid of the leading 0. This then gives us 192.168.1.10. The part with -ipmi and -ib is the same and it only add those prefix to the name. Afterwards in order to fill the /etc/hosts we just need to execute the following command.

  • makehosts

It will use the hosts table and will fill properly /etc/hosts file for us. So as an example, for node01 we should have following in our /etc/hosts file.

192.168.1.1 node01 node01.cluster.intern
192.168.2.1 node01-ipmi node01-ipmi.cluster.intern
192.168.3.1 node01-ib node01-ib.cluster.intern

6. nodehm table : node Hardware management. We now need to figure out how to manage them: E.g: How do we turn them off and on remotely?  Since here our plan is to use ipmi, we do following configuration by using following command:

chtab node=”compute” nodehm.power=”ipmi” nodehm.mgt=”ipmi”

or just manually entering following line to the table:

“compute”,”ipmi”,”ipmi”,,,,,,,,,,,,

7. ipmi table: Since in the nodehm table, we said that it will be managed by ipmi then we need to configure the ipmi settings inside the ipmi table. If the IP addresses and passwords already defined in the firmware of these nodes, then we just fill in what you have. If nothing has been set yet, then we can fill in this table and as xCAT discovers the nodes, it will configure it for us. I did not use the Username/Passw here since I already put it in passwd table and there is no need for that here.

  • #node,bmc,bmcport,taggedvlan,bmcid,username,password,comments,disable
  • “compute”,”|192.168.2.($1+0)|”,,,,,,,

If we have other standalone nodes as well, we also need to put it here, for example let say we have a login node, then:

  • “loginNodeName”,”192.168.1.150″,,,,,,,

In the bmc part, we can put the hostname or IP add of the BMC adapter.

8. chain table: This table will define a series of tasks or operations that will be executed in row on the targeted node. The chain mechanisam is part of the xCAT genisis system that is installed by default when we install xCAT.

Explanation: Genesis is a customized Linux system that can be used to mainly do discovery and configuration after booted on the targeted node. The kernel of genesis image is located at /tftpboot/xcat directory which in our case called genesis.kernel.x86_64. If its not here, we need to make sure that it has been installed:

  • [root@hrz-master ~]# rpm -qa | grep -i genesis
    xCAT-genesis-scripts-x86_64-2.11.1-snap201604140932.noarch
    xCAT-genesis-base-x86_64-2.12-snap201605051534.noarch

Among the 3 attributes (currstate, currchain and chain), the important one for us is chain that need to be set. There are several tasks that can be set here in chain attribute, but the ones which are important and can meet our needs are: runcmd and osimage.

Explanation: runcmd is being used to configure the bmc (baseboard management controller) of compute nodes. At the moment, only bmcsetup command is supported by xCAT to configure the bmc port of nodes. In our case, the bmcsetup command is located at following directory which even can be modified if there is a need.

  • [root@hrz-master ~]# cat /opt/xcat/share/xcat/netboot/genesis/x86_64/fs/bin/bmcsetup

Osimage: This task is used to specify the image that should be deployed onto the compute node. We can use chdef command to change the chain attribute before booting the nodes.

  • [root@hrz-master bin]# chdef compute chain=’runcmd=bmcsetup,osimage=centos7.2-x86_64-netboot-compute-ig’
  • [root@hrz-master bin]# tabdump chain
  • #node,currstate,currchain,chain,ondiscover,comments,disable
  • “node01″,,,”runcmd=bmcsetup,osimage=centos7.2-x86_64-netboot-compute-ig”,,,
  • “node02″,,,”runcmd=bmcsetup,osimage=centos7.2-x86_64-netboot-compute-ig”,,,
  • “node03″,,,”runcmd=bmcsetup,osimage=centos7.2-x86_64-netboot-compute-ig”,,,
  • “node04″,,,”runcmd=bmcsetup,osimage=centos7.2-x86_64-netboot-compute-ig”,,,

Note: the command mknb is needed before rebooting the nodes.

So above chain simply means, first configure the bmc port for us (give the right IP address and if needed right Username/Passw) and then boot the right image to the node.

After the first time systems booted and bmc ports configured correctly, the order of the chain table will change automatically and first is osimage and then runcmd. This is useful since the next time we reboot the compute nodes, there is no need to go through bmc setup.

9. switches and switch table

In order to automate the discovery of nodes, there needs to be a mapping of nodes to switch ports. This relationship is defined in the switch table. However, in order to define the method of communication with our ethernet switch, needs to be defined in swiches table.

a. switches table

  • [root@hrz-master ~]# tabch switch=cluster-switch switches.snmpversion=2c switches.password=public100
    or just manually copy following line into switches table:
    “cluster-switch”,”2c”,,”public”,,,,,,,,,

So basically above command means the we want to communicate with our Ethernet switch with snmp version 2 and the community string is ‘public’.

Explanation: The SNMP community string is like a password that allows access to our Ethernet switch. So always switch will check if the community string is correct in order to respond, otherwise will simply discard the requests.

Important: We must configure our Ethernet Switch(s) for this purpose. This is quite easy and we just need to connect to the switch configuration page and enable first SNMP version 2 and set the community string to public in our case.

b. switch table: This table shows each port of the switch is correspondent to which node of the cluster. We need this Only if we want to have Auto Discovery. So basically we fill this assuming that all nodes are connected in row to the Ethernet switch ports. So having this table helps xCAT to determine each port of the switch is connected to which compute nodes and figure out its corresponding name. Based on the name then it can have the IP address and give the appropriate IP address with DHCP to that node.

The command that we can use for this purpose is following for adding node01 and refer it the port number 1 of the Ethernet swicth which is called cluster-swicth.

nodech node01 switch.switch=cluster-switch switch.port=1

However, if we have lots of compute nodes we can write small loop and make it automatic for all the nodes:

for i in 0{1..9} {10..32} ; do  nodech node${i} switch.switch=cluster-switch switch.port=${i}; done

We also can do it manually directly by tabedit switch command. At the end for our case it should look like this:

Here we did manually and the outcome is like this:
[root@hrz-master ~]# tabdump switch
#node,switch,port,vlan,interface,comments,disable
"node01","cluster-switch","1",,,,
"node02","cluster-switch","2",,,,
"node03","cluster-switch","3",,,,
"node04","cluster-switch","4",,,,

Through snmp and having switch table, xcat realizes that what is the corresponidng compute nodes to each switch port and based on that give proper IPMI IP address at first step and then IP address for cluster. Both are given by DHCP server which is taken most probably from /etc/hosts table.

10. noderes table: resources and settings to use when installing nodes.

We just add following line:

“compute”,,”xnba”,,,,,,,,,,,,,,,,,,

The third option as we wrote xnba is netboot option. For us since we have a x86_64 architecture, the xnba is good and would be our case. The rest we keep empty as automatically can be handled.

Please refer to understand the concept of xnba in the begining of this page. In the older version of xCAT, the netboot version were PXE and as result we need to fill the table like this:

“compute”,,”pxe”,”192.168.1.21″,,”192.168.1.21″,,,”eth0″,”eth0″,”eth0″,,”192.168.1.21″,,,,,,

***************************************************************************

Stage 3: Advanced Configuration

1. mknb: its a command which will create a network boot root image for node discovery and flashing. This command normally is run by xCAT the first time it has been installed. I recommend to run it again since I have seen that some files still is missing in /tftpboot/xcat directory.

[root@hrz-master ~]# mknb x86_64

Creating genesis.fs.x86_64.gz in /tftpboot/xcat

2. DNS : we need to take care of several things regarding the DNS.

a. we first need to set up properly the /etc/resolv.conf file. Here we added first our domain name which is hpc.cluster and also a nameserver which is our Master node internal IP add.

  • search lan.hpc hpc.cluster
  • nameserver 192.168.1.21

Important: If in the setting of the external interface the PeerDNS option is ‘yes’, then it automatically rewrite /etc/resolve.conf file for us.  So I suggest to set the peerdns to ‘no’, since we already set the External DNS servers in forwarder tab of site table. However, for more security we can add forwarder IP addresses also here as nameserver.

  • [root@hrz-master network-scripts]# cat ifcfg-enp4s0f1
    TYPE=Ethernet
    BOOTPROTO=static
    DEFROUTE=no
    PEERDNS=no
    PEERROUTES=yes
    IPV4_FAILURE_FATAL=no
    IPV6INIT=yes
    NAME=enp4s0f1
    UUID=c259b2ba-eba7-4ab6-b91a-f1030cca9662
    DEVICE=enp4s0f1
    ONBOOT=yes
    IPADDR=10.0.0.1
    NETMASK=255.255.0.0
    GATEWAY=10.0.0.254

b. Configuration: by default, xCAT will install the DNS server for us. We can confirm it by checking:

  • [root@hrz-master ~]# yum list installed | grep -i bind*

so the bind and bind-utils packages must already being installed. Bind process is well-known as named, therefore many of the files are refered to ‘named’ instead of ‘bind’.

The configuration of the DNS server is located at /etc/named.conf and the database for each Zone is located at /var/named. However we do not need to change anything manually here, Since xCAT provides a script that configure the DNS server completely. For this we need to run following command:

  • makedns -n

It might takes some time and have to finish without any error. If we faced with any errors, the first part that need to be fully checked is networks table and then /etc/hosts. Afterwards, we can restart the DNS server:

  • [root@hrz-master ~]# systemctl restart named.service

we need to make sure that dns is working properly and for this we can use nslookup command.

  • [root@hrz-master]# nslookup node01
  • Server: 192.168.1.21
  • Address: 192.168.1.21#53
  • Name: node01.hpc.cluster
  • Address: 192.168.1.1
  • [root@hrz-master]# nslookup node01-ipmi
  • Server: 192.168.1.21
  • Address: 192.168.1.21#53
  • Name: node01-ipmi.hpc.cluster
  • Address: 192.168.2.1

After the compute nodes booted, we need to make sure that proper /etc/resolve.conf has been created in compute nodes such as:

  • [root@node01 ~]# cat /etc/resolv.conf
    ; generated by /usr/sbin/dhclient-script
    search hpc.cluster
    nameserver 192.168.1.21

3. DHCP: By default, xCAT will install the DHCP server for us. We can confirm it by checking:

  • [root@hrz-master ~]# yum list installed | grep -i dhcp*

We musst have dhcp.x86_64 which is DHCP server and is installed by xCAT installation. The configuration of the DHCP server will be located at /etc/dhcp/dhcpd.conf directory. DHCP also uses the file /var/lib/dhcpd/dhcpd.leases to store the client lease database.

Here also we do not need to configure the DHCP server ourself, since xCAT already provides a script that can do this automatically. We just need to execute following command:

  • makedhcp -n

4. snmp : this part is necessary if we are planning for automatic discovery with switch. We need to enable snmb in switch and choose snmp version 2 for our case. Then need a public name for our snmp in switch which we can set to public.

Then we need to test if snmp is working properly. For this we need first install following package in Master node.

yum -y install net-snmp-utils
test if snmp works:
[root@haswell-master ~]# snmpwalk -v 2c -c public 192.168.1.100

And this should query lots of stuff from switch for us. 192.168.1.100 is the IP of the Ethernet switch which we already set.

5. Now we boot the compute nodes. It should all come up without any problem. Since we defined our chain quite well with both bmc setup and proper image booting, we don’t need probably using nodeset.

Explanation: nodeset configure the next boot state for a node or range of nodes. So basically it tells xCAT what needs to be done the next time the nodes are booted up. nodeset command will do it by changing the network boot files (pxelinux.0 in /tftboot).

6. IpoIB: we also need to give IP address to Infiniband port of the compute nodes. We can make a automatic way by using a script and put in postscripts table of the xCAT.

So first we need to fill the name of the script in the postscripts table of the xCAT by adding the ‘compute’ and the script name.

[root@haswell-master ~]# tabdump postscripts
#node,postscripts,postbootscripts,comments,disable
“xcatdefaults”,”syslog,remoteshell,syncfiles”,”otherpkgs”,,
“service”,”servicenode”,,,
“compute”,”infiniband”,,,

Then we need to create our script inside the /install/postscripts directory as follow: (we make it executable afterward by chmode +x infiniband)

#!/bin/bash

H=$(hostname -s)
HB=${H}-ib

IPoverIB=$(/usr/bin/dig  ${HB}  +search +short)

cat ONBOOT=yes
BOOTPROTO=none
DEVICE=ib0
IPADDR=${IPoverIB}
PREFIX=24
MTU=65520
TYPE=Infiniband
CONNECTED_MODE=yes
EOF

Explanation: Dig is a tool that can query DNS server and return some useful information. We take advantage of this tool to query the IP address of the infiniband network for the corresponding namenode. So the first step is to make sure that DNS in Master node is working properly and can return the proper IP add by querying the name of the IB interface. We also need to make sure that the package which provides dig is already installed on compute image, otherwise we can install it like this:

  • [root@hrz-master ~]# yum –installroot=/install/netboot/centos7.2/x86_64/compute/rootimg/ install bind-utils.x86_64

7. Kernel update of Image in xCAT

This is only if you really need it, otherwise can be skipped. Let say we did a ‘yum update’ on the Master node and the kernel is already updated. If we do the same ‘yum update’ on the image and reboot the nodes, the kernel will not be updated since kernel is loading beforehand from a directory (I explained before). So If we want to update also the image kernel we can do following:

[root@hrz-master ~]# genimage –onlyinitrd -k  3.10.0-327.22.2.el7.x86_64 centos7.2-x86_64-netboot-compute-ig

I recommend to remove the old kernel simply from image (inside rootimage) and immediately install kernel-devel inside the image.

[root@jameson ~]# yum –installroot=/install/netboot/centos7.2/x86_64/compute/rootimg/ install kernel-devel

It is usually very important that the kernel version and kernel-devel be exactly the same version. Since I remove the previous kernel from image, therefore the above command is only installing kernel-devel that is matching the kernel version. So we have following here:

[root@jameson ~]# yum –installroot=/install/netboot/centos7.2/x86_64/compute/rootimg/ list installed | grep -i kernel
kernel.x86_64                 3.10.0-327.28.3.el7      @updates
kernel-devel.x86_64     3.10.0-327.28.3.el7      @updates

Since I did ‘yum update’ in Master node, the kernel related things also udated but still the old stuff is remained. I removed the old kernel and kernel-devel from master node simply by ‘yum remove’ and as a result everything related to kernel is the same version as can be seen here:

[root@jameson ~]#  yum list installed | grep -i kernel*
abrt-addon-kerneloops.x86_64           2.1.11-36.el7.centos            @base
kernel.x86_64                                 3.10.0-327.28.3.el7             @updates
kernel-devel.x86_64                     3.10.0-327.28.3.el7             @updates
kernel-headers.x86_64                 3.10.0-327.28.3.el7             @updates
kernel-tools.x86_64                      3.10.0-327.28.3.el7             @updates
kernel-tools-libs.x86_64             3.10.0-327.28.3.el7             @updates

8. I also suggest following pachages to be instakked in image in order to have auto recognition of commands.

yum –installroot=/install/netboot/centos7.2/x86_64/compute/rootimg/ install bash-completion.noarch bash-completion-extras.noarch

9. To remove a osimage from xCAT, we can use following command:

[root@hrz-master ~]# rmdef -t osimage centos7.2-x86_64-netboot-compute

10. The firs time we create the osimage, xCAT will automatically create several files for each node in  /tftpboot/xcat/xnba/nodes directory and as explained before, each file specifies which image, kernel and other stuff need to be loaded to the node. So if in the future, we need to create new image, let say new image for Centos7.3, we need to update these files. For this we can use following command:

[root@jameson nodes]# nodeset compute osimage=centos7.3-x86_64-netboot-compute-ig

11. I recommend to install yum-utils inside the image, since by default is not there. It is useful to use the ‘yum’ command in nodes for testing and troubleshooting.

[root@jameson ~]# yum –installroot=/install/netboot/centos7.2/x86_64/compute/rootimg/ install yum-utils

12. We can change the pkgdir of the image (stateless or statefull) and install whatever we want afterwards directly or through script from that directory. As an example we can create a repository in Master node and put the directory inside pkgdir of Image.

[root@fadmin1 ~]# lsdef -t osimage rhels7.2-x86_64-install-compute
Object name: rhels7.2-x86_64-install-compute
imagetype=linux
osarch=x86_64
osdistroname=rhels7.2-x86_64
osname=Linux
osvers=rhels7.2
otherpkgdir=/install/post/otherpkgs/rhels7.2/x86_64
pkgdir=/install/rhels7.2/x86_64,/install/software/slurm/160508,/install/software/gpfs/4.2.1-2
pkglist=/install/custom/install/rh/compute.rhels7.pkglist
profile=compute
provmethod=install
synclists=/install/custom/install/rh/compute.synclist
template=/install/custom/install/rh/compute.rhels7.tmpl

To add those new diretory we can use chdef command:

chdef -o

 

 

 

 

 

Diskfull Deployment

Diskfull deployment simply means installing the OS on the local Drive of the target system. So we install one time through network and for next times we can boot locally from drive. All the steps until step 14 of the Diskless deployment is same here. So I continue from step 14:

14. [root@hrz-master iso]# copycds -n centos7.2 -a x86_64 CentOS-7-x86_64-Everything-1511.iso

So now we should have default osimage for Diskfull installation as can be seen here:

  • [root@hrz-master centos7.2]# lsdef -t  osimage
  • centos7.2-x86_64-install-compute (osimage)
  • centos7.2-x86_64-netboot-compute (osimage)

We have a option inside the image settings (stateful) that can choose what to include when OS being installed on compute nodes local harddisk. These are 2 files which are called *.pkglist and *.tmpl as can be seen in the statefull osimage:

  • [root@hadoop-master x86_64]# lsdef -t osimage -o centos7.2-x86_64-install-compute
  • Object name: centos7.2-x86_64-install-compute
  • imagetype=linux
  • osarch=x86_64
  • osdistroname=centos7.2-x86_64
  • osname=Linux
  • osvers=centos7.2
  • otherpkgdir=/install/post/otherpkgs/centos7.2/x86_64
  • pkgdir=/install/centos7.2/x86_64
  • pkglist=/opt/xcat/share/xcat/install/centos/compute.centos7.pkglist
  • profile=compute
  • provmethod=install
  • template=/opt/xcat/share/xcat/install/centos/compute.centos7.tmpl

Both *.pkglist and *.tmpl are refering to the default files which comes with xCAT installation. We can use them as a template for our installation and modify it based on our needs.

Explanation: The main file which is used in statefull installation is *.tmpl file. It is exactly the same concept as kickstart in Redhat with the same command formats. The kickstart file (here our *.tmpl) is used mainly to automatically perform unattended OS installation and configuration.

First step is to copy the default temlate files which come with xCAT to our directory:

  • mkdir -p /install/custom/install/centos7.2
  • [root@hrz-master centos7.2]# cp /opt/xcat/share/xcat/install/centos/compute.centos7.* /install/custom/install/centos7.2

so now we have 2 files which are:

  • [root@hadoop-master centos7.2]# ls
  • compute.centos7.pkglist compute.centos7.tmpl

15. Now we want to create our own OS definition that will be specific for this statefull deployment of nodes. The advantage is that we can create different OS image for different group of nodes. The easiest or more efficient way is to create first the stanza file and modify it as we see later and then create the new OS definition from it.

Since we want to install the compute nodes in statefull mode, we need centos7.2-x86_64-install-compute (osimage). However the target is to create something like this but specific for our project (new name).

Explanation: what we mean with attributes of the image can be seen here:

  • [root@hadoop-master centos7.2]# lsdef -t osimage -o centos7.2-x86_64-install-compute
  • Object name: centos7.2-x86_64-install-compute
  • imagetype=linux
  • osarch=x86_64
  • osdistroname=centos7.2-x86_64
  • osname=Linux
  • osvers=centos7.2
  • otherpkgdir=/install/post/otherpkgs/centos7.2/x86_64
  • pkgdir=/install/centos7.2/x86_64
  • pkglist=/opt/xcat/share/xcat/install/centos/compute.centos7.pkglist
  • profile=compute
  • provmethod=install
  • template=/opt/xcat/share/xcat/install/centos/compute.centos7.tmpl

So the first step is to create the default stanza file. We can use following command for that:

  • [root@hrz-master centos7.2]# lsdef -t osimage -o centos7.2-x86_64-install-compute -z > /install/custome/install/centos7.2/compute.centos7.stanza

-z is for writing to a stanza file. What happening here is that the attribute of default image will be written to stanza file and will be saved here /install/custome/install/centos7.2.

we can see the stanza file here:

  • [root@hadoop-master centos7.2]# cat compute.centos7.stanza
  • #
  • centos7.2-x86_64-install-compute:
  • objtype=osimage
  • imagetype=linux
  • osarch=x86_64
  • osdistroname=centos7.2-x86_64
  • osname=Linux
  • osvers=centos7.2
  • otherpkgdir=/install/post/otherpkgs/centos7.2/x86_64
  • pkgdir=/install/centos7.2/x86_64
  • pkglist=/opt/xcat/share/xcat/install/centos/compute.centos7.pkglist
  • profile=compute
  • provmethod=install
  • template=/opt/xcat/share/xcat/install/centos/compute.centos7.tmpl

As can be seen, the stanza file is exactly the same as the default OS definition. So before creating our own OS image definition from this stanza file, we need to modify the stanza file based on our needs. The parts that are needed to change are pkglist and template directory plus the name of OS image definition (added hort). We edit it by vim and afterwards it looks like this:

  • [root@hrz-master centos7.2]# cat compute.centos7.stanza
  • #
  • centos7.2-x86_64-install-compute-hort:
  • objtype=osimage
  • imagetype=linux
  • osarch=x86_64
  • osdistroname=centos7.2-x86_64
  • osname=Linux
  • osvers=centos7.2
  • otherpkgdir=/install/post/otherpkgs/centos7.2/x86_64
  • pkgdir=/install/centos7.2/x86_64
  • pkglist=/install/custome/install/centos7.2/compute.centos7.pkglist
  • profile=compute
  • provmethod=install
  • template=/install/custome/install/centos7.2/compute.centos7.tmpl

Now out of Osimage definition (stanza) that we created here, we can create a new osimage.

  • [root@hrz-master centos7.2]# cat compute.centos7.stanza | mkdef -z osimage=centos7.2-x86_64-install-compute-hort1 object definitions have been created or modified.

look at the name, we added hort at the end. Now if we check the attribute of this new image we have what we wanted.

  • [root@hrz-master centos7.2]# lsdef -t osimage -o centos7.2-x86_64-install-compute-hort
  • Object name: centos7.2-x86_64-install-compute-hort
  • imagetype=linux
  • osarch=x86_64
  • osdistroname=centos7.2-x86_64
  • osname=Linux
  • osvers=centos7.2
  • otherpkgdir=/install/post/otherpkgs/centos7.2/x86_64
  • pkgdir=/install/centos7.2/x86_64
  • pkglist=/install/custome/install/centos7.2/compute.centos7.pkglist
  • profile=compute
  • provmethod=install
  • template=/install/custome/install/centos7.2/compute.centos7.tmpl

It is important to note that the pkgdir=/install/centos7.2/x86_64 directory which our OS located is same for both statefull and stateless. However in stateless we used genimage command to generate a image (rootimage), but here in statefull we do not need to do that.

Let’s have a look at the compute.centos7.tmpl and change it based on our needs. As I mentioned earlier, xCAT is using kickstart format.

Explanation: Kickstart is the process of answering all installation questions automatically in order to enable unattended installation. The main difference with Diskless setup is that here we only have pure OS which is loading from Master node (here: /install/centos7.2/x86_64/) and we are using a kickstart file (here: compute.centos7.tmpl) in order to do the unattended installation. Any further packages need to be installed by script(s) as a part of kickstart file during the installation.

In general there are 3 ways of creating kickstart file such as compute.centos7.tmpl:

  • Manually as we do here

  • GUI like using “system-config-kickstart” tool

  • Anaconda which is a RedHat standard program (it produces anaconda-ks.cfg file)

Let’s have a look at default compute.centos7.tmpl file which is normally good enough for simple OS installation, with more explanation for better understanding. (I just copied the part which by default not commented and removed unnecessary parts)

lang en_US

** if we need to define the keyboard also we can do it here. For German keyboard we can add:

keyboard “de-latin1-nodeadkeys”

** or for English keyboard: keyboard “us”

** I suggest to make sure which disks need to be ignored during the installation. As an example, the node might be connected to external storage or even local Disks which has lots of data. Plus attempting to deploy on a SAN-cluster the kickstart would fail, as the installer detects passive paths to the SAN that return no partition table. To do this we can use following command:

ignoredisk --drives=drive1,drive2,...

 where driver1, drive2 can be sda, sdb,..hda…

** another way of using this command is to specify the drive that only need to be used and ignore the rest, like following command

ignoredisk –only-use=sde

** Clear the MBR- This will destroys all of the content of disks with invalid partition tables.

zerombr

** Wipe out the disk if needed. There is command called clearpart which removes the partitions from the system. The –initlabel Initializes the disk label to the default for your architecture (for example msdos for x86). It is useful so that the installation program does not ask if it should initialize the disk label if installing to a brand new hard drive.

clearpart –all –initlabel

** I suggest to specify the drives that need to be wiped out, in spite of the fact that we used ignoredisk previously.

clearpart –drives=sde –initlabel –all

Important: There is a directory in Redhat distribution (/dev/disk/by-*) which precisely identify the disks (simlinks) in case we had a problem with /dev/sd* kind of identifier. As an example, we can use something like followings instead of using –onpart=sda1

  • --onpart=/dev/disk/by-path/pci-0000:00:05.0-scsi-0:0:0:0-part1
  • --onpart=/dev/disk/by-id/ata-ST3160815AS_6RA0C882-part1

** The above commands (zerombr and clearpart) will write a MBR (msdos) to the disk. If for any reason we need GPT such as having a drive with more than 2 TB capacity, we can use following comamnds to change the lable.

%pre
parted -s /dev/sda mklabel gpt
%end

** RAID Configuration. If there is a need for Software raid configuration, we can also do it here. However, I recommend to use Hardware raid for enterprise.

The raid. is the partition used for software raid. At the first step we separate around 20GB from each Disk (sda & sdb).

part raid.01 –size 10240 –ondisk sda
part raid.02 –size 10240 –ondisk sdb

We separate the rest of capacity of each disk as well.

part raid.11 –size 1 –grow –ondisk sda
part raid.12 –size 1 –grow –ondisk sdb

Now we devote partition raid.01 and raid.02 for swap with level 1 (Raid 1) and call it virtually md0. We do the same for partition raid.11 and raid.12, however here we also define the file-system type which is ext4.

raid swap –level 1 –device md0 raid.01 raid.02
raid / –level 1 –fstype ext4 –device md1 raid.11 raid.12

** No Raid configuration. If there is no need for raid configuration, probably has been done from Hardware, then we partition the disk without it. So basically we only see sda here due to Hardware raid configuration or no Raid in place.

part swap –size 10240 –ondisk=sda
part / –size 1 –grow –fstype=ext4 –ondisk=sda

**bootloader configuration. Bootloader is a small program that loads the OS into RAM [technically it loads the kernel (with the proper kernel parameters) and initial RAM disk before loading the whole OS] . When the system is powered on, the Bios or UEFI transfers the control of the system to where bootloader located. The location in general is where MBR or GPT which is storing the required info which includes where partitions start and begin, so your operating system knows which sectors belong to each partition and which partition is bootable.

Explanation: If the system is using Bios, then we have to use MBR (for OS) as it has some limitation particularly for Drive support which is limited to 2 TB. So basically if we want to install our OS in the drive more than 2 TB size we must use UEFI mode with GPT.

Important: I would divide the bootloader part into 2 different categories. If the drive is less than 2 TB, then it is easy. We write the MBR to the disk and use above commands for partition and can simply use following command for bootloader option:

bootloader –append=”crashkernel=256M” –location=mbr

However if the drive is more than 2TB capacity, we have to follow several steps. First the Bios have to be changed to UEFI. Second we need to write GPT options, and then use following commands for partitioning:

part /boot –fstype=ext4 –size=500
part /biosboot –fstype=biosboot –size=1
part swap –size 20240 –ondisk=sda
part / –size 1 –grow –fstype=ext4 –ondisk=sda

and then write the bootloader as follow:

bootloader –location=mbr –driveorder=sda –append=”crashkernel=auto rhgb quiet”

** There is a option called –append that we can specifies the kernel parameters. As an example we can enable the kdump by adding crashkernel option to the bootloader. As an example:

bootloader –append=”crashkernel=256M” –location=mbr

** install or upgrade. Tells the system to install a fresh system rather than upgrade an existing system.

install

** text mode install (default is graphical)

text

** network configuration. It is completely optional and can be done afterward. The important point is to give a correct name here.

network –device=enp4s0f0 –onboot=yes –noipv6
network –device=enp4s0f1 –noipv6

** firewall

firewall –disabled

** Select a zone. Add the –utc switch if your hardware clock is set to GMT

timezone –utc “#TABLE:site:key=timezone:value#”

** Don’t do X

skipx

** To generate an encrypted root password exactly same as you have in master node on deployed node:

rootpw –iscrypted #CRYPT:passwd:key=system,username=root:password#

** authentication

auth –useshadow –enablemd5
selinux –disabled

** Reboot after installation

reboot

** here is the last part of template which I changed bit the default one.

#end of section

%packages

** here we can define the packages that need to be installed each by each instead of using the Include_default_pkglist, example would be following:

@ X Window System 
@ GNOME Desktop Environment 
@ Graphical Internet

#INCLUDE_DEFAULT_PKGLIST#

%end

** The %pre section run inside the installer environment. So the system is not yet completely installed and the target file-system may not be completely created. Here I am adding the environment.

%pre
#INCLUDE:#ENV:XCATROOT#/share/xcat/install/scripts/pre.rh.rhel7#
%end

** Here is the scripts that are running after the system completely installed. So we can call it post-installation phase. As an example I created a directory called scripts inside the /install/custom/install/centos/ directory which is basically the same directory our kickstart file (*.tmpl) is located.

%post
#INCLUDE:#ENV:XCATROOT#/share/xcat/install/scripts/post.xcat#
#INCLUDE:#ENV:XCATROOT#/share/xcat/install/scripts/post.rhels7#
#INCLUDE:/install/custom/install/centos7.2/scripts/post.centos7#
#INCLUDE:/install/custom/install/centos7.2/scripts/post.XXXX#
%end

Explanation: The parts that started with INCLUDE has a ‘#’ at the beginning which does not mean commented. In the above I wrote post.XX that means you can add another script as well based on your needs. If need more, it is possible to add another line of Include at the bottom.

The default template part of the Include looks like following that we can change based on our needs.

%packages
#INCLUDE_DEFAULT_PKGLIST#
%end
%pre
{
echo “Running Kickstart Pre-Installation script…”
#INCLUDE:#ENV:XCATROOT#/share/xcat/install/scripts/pre.rh.rhels7#
} >>/tmp/pre-install.log 2>&1
%end
%post
mkdir -p /var/log/xcat/
{
cat >> /var/log/xcat/xcat.log << “EOF” %include /tmp/pre-install.log EOF echo “Running Kickstart Post-Installation script…” #INCLUDE:#ENV:XCATROOT#/install/custome/install/centos7.2/scripts/post.xcat# #INCLUDE:#ENV:XCATROOT#/install/custome/install/centos7.2/scripts/post.rhels7# } >>/var/log/xcat/xcat.log 2>&1

I change the name of post.xcat to post.centos7.2 and write the scripts based on my needs as can be seen here:

#!/bin/bash
PATH=$PATH:/usr/bin:/usr/sbin
# Disable Zero config
echo “NOZEROCONF=yes” >> /etc/sysconfig/network
# Disable IPv6
echo “net.ipv6.conf.all.disable_ipv6 = 1” >> /etc/sysctl.conf
echo “net.ipv6.conf.default.disable_ipv6 = 1” >> /etc/sysctl.conf
echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6
echo 1 > /proc/sys/net/ipv6/conf/default/disable_ipv6
# Disable SSH strict host key checking
echo “StrictHostKeyChecking no” >> /etc/ssh/ssh_config

Advanced Concepts

In Diskfull installation two directories need to be filled. One is /tftpboot/xcat/xnba/nodes and the other one is /install/autoinst. In contrast to Diskless installation that genimage command fill the /tftpboot/xcat/xnba/nodes directory, but here in Diskful we must use nodeset command.

The nodeset command will fill both above mentioned directory for us. The /tftpboot/xcat/xnba/nodes directory which has file for each node, is important for booting the node, as I mentioned earlier. And in /install/autoinst directory we have files for each node which will tell exactly what need to be done (during installation). So basically it consists of the complete.centos7.tmpl file and all the scripts that refereed inside this kickstart file.

So the nodeset command work as follow:

nodeset compute osimage=centos7.2-x86_64-install-compute-hort

However we have to make sure that mac table is already filled (automatically or manually). And that’s all. Every time we make a chanages in *.tmpl file, we need to run the nodeset command in order to rewrite the /install/autoinst directory. Keep in mind that nodeset command in Diskless installation need to be run only one time.

The part for filling the tables and advanced option is similar to Diskless deployment and can be followed after this stage.

Important

After the first successful installation of Diskfull image, the OS will be on the Harddisk. Therefore in the next reboot of the system, the OS will be booted from Harddisk, unless we specifically tell the xCAT that boot it from network with following command:

[root@fadmin1 ~]# rsetboot ffs1 net
ffs1: Network

%d bloggers like this: