So… interested in building a linux cluster huh? If so you definitely have a few choices in OS. The big names (and what I would call the most reliable names) are Redhat Linux (a.k.a. CentOS a.k.a. Oracle) and Suse Linux Enterprise Server (a.k.a. SLES).
Here is an outline of the steps on how to build your cluster.
This is a complete list of steps on how to build your cluster. If you need more detail on each step, I encourage you to check out my book.
- Install your two SLES 11 SP2 linux servers. Straightforward for the most part. I usually add the development package and take out firewalls (Novell AppArmor)
- Copy the .iso to local drive and add to software repository
Especially if you do installs remotely, much nicer to use the .iso than to have people insert and eject discs. You can add them in Yast -> Software Management -> Software Repositories.
- Add two shared drives, one for quorum, one for data
For physical servers, this usually means fiber storage through a switch or direct connection. For VMware, there are a few things you’ll need to do to add disks to VMs and have them be shared
- Add multipath support to your nodes
Not required, but very handy for cluster builds
(The above steps are covered in detail in my free excerpt of my ebook. Skip Down To My Ebooks Now)
- Install your cluster software and scripts
This will be on the SLES HA media (not the base OS install media). Once you have added the disc or .iso as a repository you will see the “High Availability” option in yast. On the left side choose “High Availability” in the “Groups” search and install everything.
- Add your multicast cluster config info
You will see a sample in the corosync.conf.example which is installed with the cluster software
- Start and verify your cluster
With both nodes having the same multicast heartbeat, when you start the cluster you should be able to see both nodes. The cluster is started/stopped with the “openais” service. You can verify by looking in /var/log/messages for the string “TOTEM”. Also with the “ps” command you should now see the corosync process and the children heartbeat processes. You can also start using the crm commands now to take a look at your cluster:
- crm configure show
- crm_verify -L
- Disable stonith and quorum
Since you have not set up a quorum disk yet you want to tell your cluster to disable these for now
- crm configure property stonith-enabled=false
- crm configure property no-quorum-policy=ignore
- In yast, configure and enable csync2
You will want to make sure you do these things:
- Auto Generate Node ID
- Enable Security Auth
- Generate Auth keyfile
- Enable mgmtd
- Add hosts and files to Csync2
- Generate pre-shared keys
- Turn Csync2 on
- Copy over your Csync2 files to second node
Make sure your initial config files are the same on both nodes. These files are the ones in the “Add Suggested Files” portion of your Csync2 config in yast.
- Turn on the Csync2 service (part of inetd)
You will need to enable Csycn2 and xinetd using chkconfig
- Run your sync using “csync2 -xv”
If it finishes with errors, check the files marked dirty, copy them manually, then run again
- Start crm_gui (via vncserver session)
It is much easier to configure and manage the cluster from here on out with the GUI. If you are remote and have a windows as your personal computer (like me), you will need to use VNC or else cygwin-x (I have a cygwin-x tutorial here)
You’re ALMOST THERE!!
- Set a password for “hacluster” and start the crm_gui and log in with the hacluster user/pass
- Now you can start adding resources. First add a resource group
A resource group is what is failed over from node to node. Then you will put your resources in that group, i.e. all the things you want to failover together
- Add a IP address primitive resource to the resource group
- At this point you can start your resource group and you will see the cluster IP on the active node
Now you can failover the resource from node to node using reboots or anything else to take a node out of the cluster. I always do this as my initial test because an IP address does not depending on anything else being correct. In other words, with shared storage, there are a lot of variables. So to make sure my cluster is working, I use just the IP address
- Add your shared data disk now as a resource
The same way you added a primitive resource IP to your resource group, you can now add a filesystem/device
- Add your quorum disk
First you use the command “sbd -d /dev/[path to quorum disk] create” to make your device usable as a quorum disk. Next you create the file “/etc/sysconfig/sbd” with these contents
# The next line enables the watchdog support:
Now you restart your openais service (there will be an error upon stopping it, but it should start fine
- Add your stonith disk to your cluster in the crm_gui
This time you add a primitve resource of type “Stonith” and add your stonith a.k.a. quorum disk. Do not add it to your resource group, it is a resource on its own separate from the ip/shared data resource group.
- Use command line to add “stonith-enabled” back into your cluster config
This one I haven’t found the GUI version so we do command line again (remember we took it out earlier?). Use the command “crm configure property stonith-enabled=true”
- Back in crm_gui, you can now start your stonith resource
SERIOUSLY… ALMOST DONE!
- Create an application start script
It needs to be LSB compliant and answer to
The easiest way is to take an existing script in /etc/init.d and modify it for your own liking. Put this script in the directory /etc/ha.d/resource.d
- In crm_gui, add the application to your resource group
Now you can add an application to your resource group. It’s a primitve type resource again.
Your cluster should now have a primitive resource quorum disk and a resource group containing an ip address, shared data disk, and application. This is a very basic two node cluster but models well what you will do in real life. For example building Apache Linux clusters or Oracle Linux clusters. Those are at the very basic applications with shared disk you want to failover from one node to the other.