OpenSSI

From OptionC

Table of contents

Introduction

We decided to build an OpenSSI cluster. We based our approach on the document at OpenSSI (http://openssi.org/cgi-bin/view?page=docs2/1.9/debian/xen-howto.txt) closely to a point. The following are the steps we took.

OpenSSI - First Node

Directory Structure

Create your directory structure. We chose to separate our node files (mainly swap files) into their own separate areas, and to have the configurations stored together. The document follows this layout:

/images/ssi
        |--- boot
        |--- configs
        |--- node1
        |--- node2
        |--- node3
        | ... and so on

Base DomU

Copy the Debian base system (found at http://sourceforge.net/projects/xenstuff/), or create your own using debootstrap or some other method. We end with an image file in the node1 subdirectory called ssi1.img and an accompanying swap image file, ssi1_swap.img.

Note: You will almost certainly want to increase the size of the base system's
      partition or disk file. If you use the one we created, it's stored in a 
      256MB file. Since the idea is to share this disk space between multiple
      nodes, you'll want to increase the size. Here's how to do it.
a) Create a new file or partition. For a file, use the following method 
   (this creates a 1GB file):
      # dd if=/dev/zero of={outfile} bs=1k seek=1024k count=1
      # mkfs.ext3 {outfile}
b) Next copy from the source file to the destination file (or partition) by 
   mounting them locally.
      # mkdir src dest
      # mount -o loop ssi1.img src
      # mount -o loop {outfile} dest   (if using an image file as in (a), above)
      # cp -dpr src/* dest
      # umount src dest
      # rmdir src dest
c) If using image files, clean up appropriately

Base Configuration

Create a base configuration file to boot the base image. Ours looked like the following:

kernel = "/boot/xen-linux-2.6.11-ocxenu"
memory = 128
name = "ssi1"
disk = ['file:/images/ssi/node1/ssi1.img,hda1,w','file:/images/ssi/node1/ssi1_swap.img,hda2,w']
root = "/dev/hda1 ro"
vif = ['mac=0A:00:00:00:11:D1']
ip = "192.168.64.141"
netmask = "255.255.255.0"
gateway = "192.168.64.1"

We give it plenty of memory because it will be easier to create the ramdisk if we have enough (but that's a later step). It's important to have a constant MAC address because that will be used by the first node to help find all other nodes.

Test and Configure DomU

Boot the node:

# xm create ssi1.cfg -c

If you used your own DomU file, then you may be able to skip this step. We chose to set up the cluster network as a static one. The /etc/network/interfaces file should have a section similar to the following:

iface eth0 inet static
  address 192.168.64.141
  netmask 255.255.255.0
  broadcast 192.168.64.255
  gateway 192.168.64.1

Restart the interface by typing:

# ifdown eth0
# ifup eth0

Other changes include making sure the /etc/fstab values match those in your configuration file. For example, in section 3 we set our root device as /dev/hda1. Xen will allow a machine to work with either /dev/hda1 or /dev/sda1 -- but clustering won't (at least not without some nasty workarounds). Make sure things are consistent.

Change the hostname in /etc/hostname:

# echo "ssi1" > /etc/hostname
# hostname ssi1

Update Modules

You will need to copy the appropriate modules over in order to ensure that the kernel can support loop mounting. If you are using the Option-C kernel (2.6.11-ocxenu) then the loop device is available as a module. Use the command:

 # modprobe loop
 loop: loaded (max 8 devices)

The important point is that you will need the loop device in order to make the ramdisk.

Install OpenSSI

Next you will need to install the OpenSSI-Xen packages. Append the following entries to /etc/apt/sources.list:

 deb http://radian.org/openssi-deb/openssi-v2/ ./
 deb-src http://radian.org/openssi-deb/openssi-v2/ ./

Setup some pinning for the Radian packages: Add following entries to /etc/apt/preferences (create the file if needed)

Package: *
Pin: origin radian.org
Pin-Priority: 1001 

If you need to configure the http proxy, do so. Then:

# apt-get update
# apt-get dist-upgrade

This will cause several packages to be downgraded. This is needed because of the OpenSSI packages and their dependencies. Next install the OpenSSI Xen package:

# apt-get install openssi-xen

You will get a couple of messages saying that certain utilities require IPVS support in the kernel and that using the software without IPVS modules is useless. Ignore these because we'll eventually boot our cluster using a different kernel which _does_ have IPVS.

Towards the end of the installation you'll get the chance to install the first node in the cluster. Using the default responses is okay for everything up to the point where you're asked for the clustername. There you'll want to enter an appropriate name. We used the hostname (ssi1).

Enter a clustername or (?): ssi1

Create Ramdisk

Now it's time to make a ramdisk which will be used to start the cluster manager during the boot process. Edit the /etc/mkinitrd/mkinitrd.conf file. You need to set the following values:

OPENSSI_XEN=yes
MODULES=none

Run mkinitrd

# mkinitrd -o /tmp/initrd

If you get an error such as /usr/sbin/mkinitrd: device /dev/sda1 is not a block device then you need to make sure your values for your partitions in the /etc/fstab of your DomU match the values in your configuration file (see step 5, above). If you have to make changes here, you'll probably need to make changes in the /etc/clustertab file as well or else you'll get other errors in making your ramdisk.

Once the ramdisk has been made, copy it (ftp, scp, other, your choice) to the host system and keep it safe.

Note: The ramdisk is actually gzip'ed. If it looks large (more than 2MB) then it may not have been created correctly. This will cause kernel panics when it's time to boot.

Test OpenSSI

Now you can shut down your DomU node. It's time to get an OpenSSI kernel. You can use a prebuilt one (recommended initially):

# wget http://radian.org/openssi-deb/openssi-v2/xenU-vmlinuz

Alternatively, follow these steps (based closely on the OpenSSI Xen document (http://openssi.org/cgi-bin/view?page=docs2/1.9/debian/xen-howto.txt)) to build your own kernel: 1) Download the source from CVS:

cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/ci-linux login
cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/ci-linux co  -r OPENSSI-DEBIAN ci
cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/ssic-linux login
cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/ssic-linux co  -r OPENSSI-DEBIAN openssi

2) Get a copy of the Linux 2.6.10 kernel from www.kernel.org (the particular version is because the above patches are based on that kernel).

3) Extract the Linux kernel to the linux directory. You should have a directory layout as follows:

 $ ls 
 ci    linux    linux-2.6.10.tar.bz2    openssi

4) Next:

 $ cd openssi
 $ make xenkern
 $ cd ..
 $ ls
 ci    linux    linux-2.6.10.tar.bz2    linux-ssi    openssi

5) Configure the kernel:

 $ cd linux-ssi
 $ cp ../openssi/kernel.configs/kernel-ssi-xenU.config oldconfig
 $ make ARCH=xen menuconfig 

6) Build the kernel

 $ make ARCH=xen 

7) Move any modules built to the /lib/modules directory.

 $ make ARCH=xen modules_install

8) Copy the kernel to wherever you want it. We use the /boot directory

 $ cp vmlinuz /boot/xen-ssi-2.6.10
 $ cp .config /boot/config.xen-ssi-2.6.10

Update Configuration

Modify your configuration file to look similar to:

kernel = "/images/ssi/boot/xenU-vmlinuz"
ramdisk = "/images/ssi/boot/initrd"
memory = 128
name = "ssi1"
disk = ['file:/images/ssi/node1/ssi1.img,hda1,w','file:/images/ssi/node1/ssi1_swap.img,hda2,w']
root = "/dev/hda1 ro"
vif = ['mac=0A:00:00:00:11:D1']
ip = 192.168.64.141
netmask = 255.255.255.0
gateway = 192.168.64.1

Boot

Boot as usual. Hopefully everything works. If so, you've got the beginnings of a cluster. Congratulations.


Adding New Nodes

This section will show you how to add new nodes, each with their own swap space (but no additional disk space). Since we've already defined our swap space (for node 1) as being on the /dev/hda2 partition, we'll use that as a standard.

Configuration

Set up a configuration file for the new node. It should look something like this:

kernel = "/images/ssi/boot/xenU-vmlinuz"
ramdisk = "/images/ssi/boot/initrd"
memory = 64
name = "ssi2"
disk = ['file:/images/ssi/node2/ssi2_swap.img,hda2,w']
root = "/dev/hda1 ro"
vif = ['mac=0A:00:00:00:11:D2']

Note: you no longer need the ip, netmask or gateway fields; these are taken care of by node1.

Note: the MAC address must be static and must be different (obviously) from node 1's. We chose to use a set of sequential numbers (D1 == node1, D2 == node2, etc.).

Swap File

Make sure you have the swap file set up:

# dd if=/dev/zero of=/images/ssi/node2/ssi2_swap.img bs=1k seek=64k count=1
# mkswap ssi2_swap.img

Update Node1

Inside node1 you'll need to add the information for the new node.

# ssi-addnode --xennode --hwaddr=0A:00:00:00:11:D2 --ipaddr=192.168.64.142

As before, enter the nodename (ssi2, or whatever you want) when prompted. You should read through the prompts, but you can take the defaults.

Since we're going to let each domU in the cluster have its own swap space, we'll need to change the /etc/fstab. Find the line that looks like:

/dev/hda2       swap    swap    sw,node=1       0       0

and modify it to look like:

/dev/hda2       swap    swap    sw,node=*       0       0

Ramdisk

Now we'll recreate the ramdisk and copy it back to the source machine:

# mkinitrd -o /tmp/initrd
# scp /tmp/initrd {host}:/images/ssi/boot

Testing

Back on the host machine, we should now be able to boot our new node:

# xm create -c ssi2.cfg

Note: node1 must be running; if it's not, then node2 isn't going to do too much because it won't have anything to boot against. This should boot up by using the node1 image over ethernet. After logging in, you'll be running a new node on your cluster.


Extra Tips, Notes and Hints

  • You'll get all sorts of messages about being unable to load modules. To fix these, do the normal thing: copy the modules into the node1 DomU. If you used the precompiled kernel then you won't be able to do this.
  • Don't use "shutdown" to stop a domU -- it will work, but it will also shutdown the rest of the machines in the cluster. Instead, use the "clusternode_shutdown" command.
  • Use the "cluster -V" command to get information about all nodes.
  • You may want to minimise the number of times you spend setting up nodes under node1 (and recreating the ramdisk), and simply set up a set of them at once. OpenSSI supports up to 125 nodes.
  • The cluster lends itself nicely to the potential for scripting under Xen, passing in variables (such as the MAC address)
  • You should be able to set node 1 up to act as a DHCP server and handle assignment of the IP addresses based upon the MAC address


Recommended Reading

The following references were useful in creating this document:



Additional Cluster Links

The following are other links about clustering, not using OpenSSI.