Installing Gentoo as a Beowulf Head node

From NBSWiki

Jump to: navigation, search

This is my repository of misc notes while I was building the head node for a Beowulf cluster under Gentoo with SSI netbooted nodes. This is obviously work in progress but I'd like it to evolve into a respectable HOWTO which could be added to the Gentoo Wiki.

Contents

Intent

This Beowulf Head Node is intended to be used as the front-end to PXE booted nodes using a Single System Image (the SSI config is discussed in my Gentoo Diskless Client section).

Reference articles

I try to keep these in order of importance/relevance.
This link is of main interest (Wilhelm Meier's web site on the subject)
Gentoo Diskless Clients Installation, by Wilhelm Meier, Markus Müller, Zweibrücken
SLIM: A Solution to Large Size Networked Linux System Administration, Management and Deployment
Some other external references on [Beowulf](ish) clustering:
A _must_, Cluster Monkey contains an important collection of technical articles on the subject of HPC computing: http://www.clustermonkey.net/
Ian Foster's website , for those interested in Grid Computing

Parallel work in progress

Not a pun...but, I ain't the only geek working on this ;) Beocat is an installation very similar to this one (with the exception that my nodes don't use an initramfs... will learn that technique one day ;P). You will notice that we cross-ref each other, answers to this are obtained on #gentoo-cluster from freenode.net (just poke me, kyron, or kuffs, the Beocat author).

General references

The obvious: Beowulf's website, do not neglect the mailing list which contains hordes of discussions going back to the mid nineties (!)
As an example, David Kewley indicated the following useful directives when building a cluster:

A few elements of manageability that I use all the time:

  • the ability to turn nodes on or off in a remote, scripted, customizable manner
  • the ability to reinstall the OS on all your nodes, or specific nodes, trivially (e.g. as provided by Rocks or Warewulf) {... or SSI netbooted nodes ...obviously ;P Kyron}
  • the ability to get remote console so you can fix problems without getting out the crash cart -- hopefully you don't have to use this much (because it means paying attention to individual systems), but when you need it, it will speed up your work compared to the alternative
  • the ability to gather and analyze node health information trivially, using embedded hardware management tools and system software tools
  • the ability to administratively close a node that has problems, so that you can deal with the problem later, and meanwhile jobs won't get assigned to it

Software Listing

System Base

NOTE: In the examples below, we'll assume that ETH0 is the public NIC (dhcp configured) and ETH1 is the client-side NIC.

net-misc/dhcpcd

Since ETH0 is configured using DHCP, we will loose our /etc/resolve.conf file which is necessary to modify for the ETH1 side of the network (the nodes). To prevent dhcp from ETH0 from overwriting the configuration file, we must disable modifications to /etc/resolve.conf as follows:

net-dns/dnsmasq

This list of DHCP options will come in handy when configuring dnsmasq since it relies on the knowledge of these options. We set this package up first since you need it to boot your nodes and send them proper configuration directives.
In this setup, we will use net-dns/dnsmasq as both a caching DNS server and a DHCP server, automatic integration of DHCP/DNS resolution comes in handy when you want to dynamically add nodes to your cluster. We choose dnsmasq to make the configuration easier than using the full blown versions of DHCP/BIND daemons.

First, add/modify the following line to dnsmasq.conf, the rest of the file can usually be left alone:

# nodes are connected to eth1 and we want dnsmasq to only listen on that specific NIC
interface=eth1
except-interface=eth0
# If you set domain, don't forget to add it to your /etc/resolve.conf next to "search"
domain=cluster.local
# Pull in the Athlon-XP nodes definition file (you can obviously have more than one):
conf-file=/etc/AthlonXP.dnsmasq.conf

This is an example of a {ARCH}.dnsmasq.conf file.

File: /etc/AthlonXP.dnsmasq.conf
#The group name,address range and lease time:
dhcp-range=AthlonXP,192.168.1.20,192.168.1.50,12h
# The GROUP_NAME's,option 3:default Gateway (if you really need this, nodes shouldn't require routed access):
dhcp-option=AthlonXP,3,192.168.1.1
# The GROUP_NAME's, option 42:time server address:
dhcp-option=AthlonXP,42,0.0.0.0
# This is required for PXE booting
dhcp-boot=net:AthlonXP,/pxelinux.0,boothost,192.168.1.2
# As can be seen in the dnsmasq.conf, this option is not guaranteed to work (DN search order)
dhcp-option=AthlonXP,119,cluster.local
# Now for the host listing, format is: 
# dhcp-host=MACADDRESS,net:GROUP_NAME,NODE_NAME,IP_ADDRESS
dhcp-host=00:C0:4F:91:C6:03,net:AthlonXP,node0,192.168.1.50
dhcp-host=00:50:04:99:2A:DD,net:AthlonXP,node1,192.168.1.51
dhcp-host=00:40:ca:14:45:ef,net:AthlonXP,node2,192.168.1.52
dhcp-host=00:40:ca:15:7e:43,net:AthlonXP,node3,192.168.1.53

NOTE:Don't forget to add the following lines to your /etc/resolve.conf. If you don't, you won't be able to ping the nodes and all sorts of other weird problems will appear with the cluster usage and management tools.

File: /etc/resolve.conf
search cluster.local
nameserver 127.0.0.1

ALSO, you must modify the way DHCP gets it's address for ETH0 since you don't want it to overwrite your resolv.conf. You get this by adding "-R" to the dhcpcd arguments:

File: /etc/conf.d/net
dhcpcd_eth0="-R"

net-ftp/tftp-hpa and sys-boot/syslinux

NOTE: dnsmasq should do this now!!!

Refer to the Chapter 4 of Diskless Nodes with Gentoo for these two packages, they are easy to install and well explained there although we will extend the use of the pxelinux.0 boot mechanisms by the use of strategically assigning IP addresses and using the pxelinux.0 configuration files. Also, the Gentoo Diskless Client article has some example files.

Cluster Management

* sys-cluster/c3 [ Masked ]

Latest version available: 4.0.1
Latest version installed: [ Not Installed ]
Size of downloaded files: 53 kB
Homepage: http://www.csm.ornl.gov/torc/C3/
Description: The Cluster Command and Control (C3) tool suite
License: C3

This package contains a set of small applications used to perform basic tasks simultaneously on all nodes of a cluster through commands using RSH or SSH. This is especially useful for setting and cleaning up the execution environment (for example). See the tool suite description for more details.

The config file

File: /etc/c3.conf
cluster local {
        headless:headless  #head node
        thinkbig[1-24]      #compute nodes
}

Example command

Retreive the hostname of all nodes as defined in the config file:

eric@headless ~ $ cexec -p hostname
local thinkbig1: thinkbig1
local thinkbig2: thinkbig2
local thinkbig3: thinkbig3
local thinkbig4: thinkbig4
local thinkbig5: thinkbig5
local thinkbig6: thinkbig6
local thinkbig7: thinkbig7
local thinkbig8: thinkbig8
local thinkbig9: thinkbig9
local thinkbig10: thinkbig10
local thinkbig11: thinkbig11
local thinkbig12: thinkbig12
local thinkbig13: thinkbig13
local thinkbig14: thinkbig14
local thinkbig15: thinkbig15
local thinkbig16: thinkbig16
local thinkbig17: thinkbig17
local thinkbig18: thinkbig18
local thinkbig19: thinkbig19
local thinkbig20: thinkbig20
local thinkbig21: thinkbig21
local thinkbig22: thinkbig22
local thinkbig23: thinkbig23
local thinkbig24: thinkbig24

* sys-cluster/torque

Latest version available: 2.0.0_p7
Latest version installed: 2.0.0_p7
Size of downloaded files: 2,237 kB
Homepage:    http://www.clusterresources.com/products/torque/
Description: A freely downloadable cluster resource manager and queuing system based on OpenPBS
License:     openpbs

Torque 2.0.0p7 is now available as an ebuild (see Bug #115189)

* sys-cluster/maui [ Masked ]

Latest version available: 3.2.6_p13-r1
Latest version installed: 3.2.6_p13-r1
Size of downloaded files: 863 kB
Homepage: http://www.clusterresources.com/products/maui/
Description: Maui Cluster Scheduler
License: maui

Cluster Monitoring

* dev-lang/php

     Latest version available: 5.1.4
     Latest version installed: [ Not Installed ]
     Size of downloaded files: 18,812 kB
     Homepage:    http://www.php.net/
     Description: The PHP language runtime engine.
     License:     PHP-3

Will be needed for Ganglia monitoring and visualizing website. You will need at least USE="apache2 xml xmlread" for ganglia to work.

* net-www/apache

     Latest version available: 2.0.55-r1
     Latest version installed: [ Not Installed ]
     Size of downloaded files: 14,053 kB
     Homepage:    http://httpd.apache.org/
     Description: The Apache Web Server
     License:     Apache-2.0

Will be needed for Ganglia monitoring and visualising website. We also add the followin entry to /etc/apache2/vhosts.d/00_default_vhost.conf between the <IfDefine DEFAULT_VHOST> and </IfDefine> markers (at the begining).

File: /etc/apache2/vhosts.d/00_default_vhost.conf
<VirtualHost *:80>
# ServerName ganglia.livia.etsmtl.ca
# Serveralias ganglia.livia.etsmtl.ca
# ServerAdmin eric@livia.etsmtl.ca
 Setenv VLOG
 DirectoryIndex index.php index.html
 DocumentRoot /var/www/ganglia.livia.etsmtl.ca/htdocs/
   <Directory "/var/www/ganglia.livia.etsmtl.ca/htdocs/">
        Order allow,deny
        Allow from all 
        Options +Indexes
   </Directory>
</VirtualHost>

* sys-cluster/ganglia

     Latest version available: 3.0.2
     Latest version installed: 3.0.2
     Size of downloaded files: 3,328 kB
     Homepage:    http://ganglia.sourceforge.net/
     Description: Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids
     License:     BSD

/etc/gmetad.conf

You first generate the default gmond.conf file with:

gmond --default_config > /etc/gmond.conf

Then edit it to your likings. I modified the following sections (note the added mcast_if=eth1):

File: /etc/gmond.conf
cluster {
  name = "Kluster"
}
/* Feel free to specify as many udp_send_channels as you like.  Gmond
   used to only support having a single channel */
udp_send_channel {
  mcast_if=eth1
  mcast_join = 239.2.11.71
  port = 8649
}

/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
  mcast_if=eth1
  mcast_join = 239.2.11.71
  port = 8649
  bind = 239.2.11.71
}

NOTE: As specified in the FAQ (see How should I configure multihomed machines?), in a multihomed environment, you must add the following to your /etc/conf.d/net:

routes_eth1=( "-host 239.2.11.71" )

Which is the equivalent to:

route add -host 239.2.11.71 dev eth1

OS Tweaking

* sys-kernel/ck-sources [ Masked ]

Latest version available: 2.6.15_p2
Latest version installed: [ Not Installed ]
Size of downloaded files: 38,991 kB
Homepage:    http://members.optusnet.com.au/ckolivas/kernel/
Description: Full sources for the Linux kernel with Con Kolivas' high performance patchset and Gentoo's basic patchset.
License:     GPL-2

* sys-process/schedtool

Latest version available: 1.2.3
Latest version installed: [ Not Installed ]
Size of downloaded files: 24 kB
Homepage:    http://freequaos.host.sk/schedtool
Description: A tool to query or alter a process' scheduling policy.
License:     GPL-2


Important files

Modify /etc/make.conf

As always, USE flags and CFLAGS depend greatly on intended use and actual hardware profile of the machine. Some relevant USE flags in our case:

server --> Added to torque 2.0.0-p7
pbsserver
mpi
blas
File: /etc/make.conf
USE="-* sse sse2 blas mpi zlib nls userlocales utf8 vim X pbsserver server"
CFLAGS="-march=opteron -O2 -pipe"
CXXFLAGS="${CFLAGS}"
PORT_LOGDIR=/var/log/portage
PORTDIR_OVERLAY=/usr/local/portage
MAKEOPTS="-j6"
PORTAGE_NICENESS=-1
AUTOCLEAN="yes"
PORTAGE_TMPFS="/dev/shm"
FEATURES="buildpkg fixpackages sandbox ccache"
CCACHE_SIZE="2G"
CLEAN_DELAY=1
LINGUAS="fr en"
PORT_LOGDIR=/var/log/portage

LDAP Auth

I still have to write up how to set up OpenLDAP on the head node. For the moment, here are some of the important files that need to be modified for LDAP auth to work.

/etc/openldap/slapd.conf

I recommend you keep the comments and find where to change these lines in the slapd.conf file if you're new to LDAP.

File: /etc/pam.d/system-auth
include         /etc/openldap/schema/core.schema
include         /etc/openldap/schema/cosine.schema
include         /etc/openldap/schema/inetorgperson.schema
include         /etc/openldap/schema/openldap.schema
include         /etc/openldap/schema/nis.schema
include         /etc/openldap/schema/java.schema
include         /etc/openldap/schema/corba.schema
include         /etc/openldap/schema/misc.schema
pidfile         /var/run/openldap/slapd.pid
argsfile        /var/run/openldap/slapd.args
database        bdb
suffix          "ou=cluster,dc=livia,dc=etsmtl,dc=ca"
rootdn          "cn=Manager,ou=cluster,dc=livia,dc=etsmtl,dc=ca"
rootpw          dotchos
directory       /var/lib/openldap-data/cluster
index   objectClass     eq
loglevel none

/etc/pam.d/system-auth

The modification of this file is necessary if you want LDAP to store the passwords and if you want to be able to change them with passwd(1).

File: /etc/pam.d/system-auth
#%PAM-1.0
/etc/pam.d/system-auth
auth       required pam_env.so
auth       sufficient   pam_unix.so likeauth nullok
auth       sufficient   pam_ldap.so use_first_pass
auth       required pam_deny.so

account    required pam_unix.so

password   required pam_cracklib.so difok=2 minlen=8 dcredit=2 ocredit=2 retry=3
password   sufficient   pam_unix.so nullok md5 shadow use_authtok
password   sufficient   pam_ldap.so use_authtok use_first_pass
password   required pam_deny.so

session    required pam_limits.so
session    required pam_unix.so
session    optional pam_ldap.so
Personal tools