Howto Build a Basic Gentoo Beowulf Cluster

From NBSWiki
Jump to: navigation, search

Contents

Introduction

There are already many existing Linux based clustering solutions out there that claim to provide an easy way to obtain/build a Linux Beowulf Cluster[1]. The fact is that most links out there are either dead or completely outdated. We'll concentrate on a more specific class that use a diskless node approach where the nodes boot off a Single System Image through the network interface (this process is explained in the Creating the SSI section below). The Clustermonkey web site[2]has an article[3]which alleviates the use of such a configuration in specific conditions depending on the intended use of the cluster. One of the key conditions where diskless nodes are useful is when there is a need to share file based data during the runs between the nodes. However, if all processes compute and manipulate the data independently, local storage becomes more interesting. In our case, the nodes do have a local disk that we could configure as a local "scratch pad" for such purposes.

Motivation

With the existence of commercial and non-commercial solutions (refer to the References at the end of the article), one must ponder as to why we would want to build our own cluster solution from bottom up. We'll provide a few of the key answers to this here.

Why build a Network Bootable Beowulf Cluster

  • The key element here is maintenance and management. A typical Beowulf Cluster is made of many identical nodes which are to run essentially copies of the same software. Duplicating the installation onto those nodes and managing each noes becomes inefficient on a node to node basis. For this reason, a Single System Image (SSI) booted off the network makes much more sense. Further more, some of the changes to the SSI can be propagated to all nodes instantly.
  • Adding a node becomes a simple task of adding it's MAC address to a config file and booting the node on the Beowulf network[4].

Why use Gentoo as the Base Distribution

Gentoo is becoming more and more popular due to it's flexibility and managebility. Recent IEEE journal articles have been written on this subject so we won't debate this here[5]. Since hardware and software, in the Linux and research world, evolve at a frantic rate, the need for a fast evolving OS is more than necessary. Gentoo offers this technological bleeding edge as well as providing the means to easily integrating new packages to the system by the means of portage's ebuilds.

Material

In our proof of concept, we will be using the following material to build our mini-cluster:

Master Node

  • Dual AMD Opteron(tm) Processor 244
  • 2Gigs of RAM
  • ROOT (/) partition in RAID1
  • "ScratchPad" area made up of 4*SATA 120Gig in RAID0

Slave Nodes

  • AMD Athlon(TM) XP 2500+
  • 1Gig of RAM
  • ROOT (/) is NFS mounted
  • "ScratchPad" (/ScratchPad) is NFS mounted
  • 1 local disk but not used in the present test configuration

Network Cards

  • 8 port 10/100 D-Link switch
  • 3c905C-TX Ethernet NICs on both Master and Slave nodes

Network Topology

We will use the most basic/common network topology for building this cluster. All Slave Nodes connect to one switch which in turn connect to the Master Node through a 100BaseT Ethernet network. The Master Node has two Ethernet devices to ensure that the Slave Nodes are on an isolated network. In theory, the nodes should not be accessed directly by the users and jobs are to be launched through the Master Node.

SimpleBeowulfNetTopology.png

Creating the SSI

The steps to creating a Gentoo base Single System Image is documented in the Gentoo Diskless Client section of this wiki. Although a little bit general, it contains the key elements to creating an SSI image that will be usable by the Gentoo Headnode Configuration document. However, there are some applications we do implicitly add to nodes. Here is a short listing of some of the Gentoo packages:

sys-cluster/openmpi
sys-cluster/torque
sys-cluster/ganglia

Ganglia is used for monitoring the entire cluster. It's installation is detailed in the Gentoo Headnode Configuration document document. Note that the openmpi and torque ebuilds come from the www.gentooscience.org overlay.


Configuring the Master Node

This section is detailed by the Gentoo Headnode Configuration Document. Please refer to it for details on how to configure the Master Node.

Passwordless SSH Logon to the Nodes and Node List Creation

We use SSH to launch commands on each nodes as an alternative to RSH. There are many arguments to using RSH instead of the overhead of SSH. The fact is that SSH is more portable when it comes to carrying the environment over to the other nodes than RSH. Since SSH is only used for launching commands an not for the actual communications, there is no overhead added to the actual computation.

Creating a passwordless key for all nodes

Since your home directory is mounted across all nodes, you only need to create one key in your home directory and it will automatically be present on all nodes due to the NFS mounted nature of your $HOME. Here is the sequence to perform:

cd ~/.ssh/
ssh-keygen -t dsa -b 1024 -f id_dsa

The ssh-keygen command will prompt for a passphrase, don't enter anything since we don't want one to log onto the nodes. We then add the newly generated key to the authorized_keys:

cat id_dsa.pub >> authorized_keys

Now we must log onto all the nodes so that their unique signature is added to our ssh configuration. To make the process simpler, we can loop the process as such:

for Num in $(seq 1 24); do ssh thinkbig${Num} hostname; done

This will log you onto each nodes and get the hostname value (we use hostname so that ssh is only used to launch a simple command and doesn't actually open a session on the node). Here is an example output, note that some of the nodes aren't available (ssh: thinkbig20: Name or service not known) and some of them were already registered (they simply return their hostname):

eric@headless ~ $ for Num in $(seq 1 24); do ssh thinkbig${Num} hostname; done
thinkbig1
The authenticity of host 'thinkbig2 (10.0.1.12)' can't be established.
RSA key fingerprint is 22:c1:2a:28:44:f2:1d:a6:7e:57:72:16:ee:d5:28:4c.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'thinkbig2,10.0.1.12' (RSA) to the list of known hosts.
thinkbig2
The authenticity of host 'thinkbig3 (10.0.1.13)' can't be established.
RSA key fingerprint is 22:c1:2a:28:44:f2:1d:a6:7e:57:72:16:ee:d5:28:4c.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'thinkbig3,10.0.1.13' (RSA) to the list of known hosts.
thinkbig3
ssh: thinkbig4: Name or service not known
The authenticity of host 'thinkbig5 (10.0.1.15)' can't be established.
RSA key fingerprint is 22:c1:2a:28:44:f2:1d:a6:7e:57:72:16:ee:d5:28:4c.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'thinkbig5,10.0.1.15' (RSA) to the list of known hosts.
thinkbig5
ssh: thinkbig6: Name or service not known
thinkbig7
ssh: thinkbig8: Name or service not known
thinkbig9
thinkbig10
ssh: thinkbig11: Name or service not known
thinkbig12
thinkbig13
ssh: thinkbig14: Name or service not known
thinkbig15
thinkbig16
thinkbig17
thinkbig18
thinkbig19
ssh: thinkbig20: Name or service not known
thinkbig21
ssh: thinkbig22: Name or service not known
thinkbig23
thinkbig24 

Auto Creating the HOSTFILE

The loop described above can also be used to generate a list of available nodes as such:

for Num in $(seq 1 24); do ssh thinkbig${Num} hostname 2> /dev/null | grep -e '^think' >> hostfile ; done

Only run this after having added the hosts to your known_hosts as performed by the loop above. The file named hostfile now contains:

eric@headless ~ $ cat hostfile
thinkbig1
thinkbig2
thinkbig3
thinkbig5
thinkbig7
thinkbig9
thinkbig10
thinkbig12
thinkbig13
thinkbig15
thinkbig16
thinkbig17
thinkbig18
thinkbig19
thinkbig21
thinkbig23
thinkbig24 

What Works

  1. Network booted nodes
  2. Ganglia monitoring of the nodes
  3. Passwordless login onto the nodes with SSH
  4. Local execution of OpenMPI on the Master Node:
kyron@headless ~ $ export LD_LIBRARY_PATH=/usr/lib/openmpi/1.0.2-gcc-4.1/lib64; /usr/lib/openmpi/1.0.2-gcc-4.1/bin/mpirun -np 2 hello
Hello, world. I am 1 of 2
Hello, world. I am 0 of 2

What Doesn't Work

  • Execution of OpenMPI on the 32 bit nodes including the 64 bit head node... This is a heterogeneous issue.

To Do

  1. Configure OpenLDAP authentication for the nodes
  2. Install and configure a job management system such as Torque and/or Maui

References

Diskless Solutions

Bootable Cluster CD (BCCD)[6]Based on a mix of BSD ports and Gentoo ebuilds, this bootable "cluster in a pocket" is geared towards educational use.
Skyld Commercial, based on Redhat Enterprise and was created by the founders of Beowulf.
ClusterKnoppix Free, based no the Knoppix bootable CD, is geared towards using OpenMosix for distributing the processing (process migration).

Other

Here, in no particular order:
LinuxHPC
Beowulf homepage Don't judge the homepage, check out the mailing list, that is where all the knowledge and experience lies.

Footnotes

  1. We refer to Beowulf Cluster in it's classical sens of the definition where the nodes can only be accessed by the head/master node.
  2. www.clustermonkey.net
  3. So Why Use Disks on Clusters?
  4. Depending on the management tools used, some other files might require modification and some daemon refreshing might be in order, we won't get into these dynamics at this point
  5. Gentoo Linux: the next generation of Linux
  6. Something Wonderful this Way Comes
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox