Installing Gentoo as a Beowulf Head node
From NBSWiki
This is my repository of misc notes while I was building the head node for a Beowulf cluster under Gentoo with SSI netbooted nodes. This is obviously work in progress but I'd like it to evolve into a respectable HOWTO which could be added to the Gentoo Wiki.
Contents |
Intent
This Beowulf Head Node is intended to be used as the front-end to PXE booted nodes using a Single System Image (the SSI config is discussed in my Gentoo Diskless Client section).
Reference articles
I try to keep these in order of importance/relevance.
This link is of main interest (Wilhelm Meier's web site on the subject)
Gentoo Diskless Clients Installation, by Wilhelm Meier, Markus Müller, Zweibrücken
SLIM: A Solution to Large Size Networked Linux System Administration, Management and Deployment
Some other external references on [Beowulf](ish) clustering:
A _must_, Cluster Monkey contains an important collection of technical articles on the subject of HPC computing: http://www.clustermonkey.net/
Ian Foster's website , for those interested in Grid Computing
Parallel work in progress
Not a pun...but, I ain't the only geek working on this ;) Beocat is an installation very similar to this one (with the exception that my nodes don't use an initramfs... will learn that technique one day ;P). You will notice that we cross-ref each other, answers to this are obtained on #gentoo-cluster from freenode.net (just poke me, kyron, or kuffs, the Beocat author).
General references
The obvious: Beowulf's website, do not neglect the mailing list which contains hordes of discussions going back to the mid nineties (!)
As an example, David Kewley indicated the following useful directives when building a cluster:
A few elements of manageability that I use all the time:
- the ability to turn nodes on or off in a remote, scripted, customizable manner
- the ability to reinstall the OS on all your nodes, or specific nodes, trivially (e.g. as provided by Rocks or Warewulf) {... or SSI netbooted nodes ...obviously ;P Kyron}
- the ability to get remote console so you can fix problems without getting out the crash cart -- hopefully you don't have to use this much (because it means paying attention to individual systems), but when you need it, it will speed up your work compared to the alternative
- the ability to gather and analyze node health information trivially, using embedded hardware management tools and system software tools
- the ability to administratively close a node that has problems, so that you can deal with the problem later, and meanwhile jobs won't get assigned to it
Software Listing
System Base
NOTE: In the examples below, we'll assume that ETH0 is the public NIC (dhcp configured) and ETH1 is the client-side NIC.
net-misc/dhcpcd
Since ETH0 is configured using DHCP, we will loose our /etc/resolve.conf file which is necessary to modify for the ETH1 side of the network (the nodes). To prevent dhcp from ETH0 from overwriting the configuration file, we must disable modifications to /etc/resolve.conf as follows:
net-dns/dnsmasq
This list of DHCP options will come in handy when configuring dnsmasq since it relies on the knowledge of these options. We set this package up first since you need it to boot your nodes and send them proper configuration directives.
In this setup, we will use net-dns/dnsmasq as both a caching DNS server and a DHCP server, automatic integration of DHCP/DNS resolution comes in handy when you want to dynamically add nodes to your cluster. We choose dnsmasq to make the configuration easier than using the full blown versions of DHCP/BIND daemons.
First, add/modify the following line to dnsmasq.conf, the rest of the file can usually be left alone:
# nodes are connected to eth1 and we want dnsmasq to only listen on that specific NIC interface=eth1 except-interface=eth0 # If you set domain, don't forget to add it to your /etc/resolve.conf next to "search" domain=cluster.local # Pull in the Athlon-XP nodes definition file (you can obviously have more than one): conf-file=/etc/AthlonXP.dnsmasq.conf
This is an example of a {ARCH}.dnsmasq.conf file.
| File: /etc/AthlonXP.dnsmasq.conf |
#The group name,address range and lease time: dhcp-range=AthlonXP,192.168.1.20,192.168.1.50,12h # The GROUP_NAME's,option 3:default Gateway (if you really need this, nodes shouldn't require routed access): dhcp-option=AthlonXP,3,192.168.1.1 # The GROUP_NAME's, option 42:time server address: dhcp-option=AthlonXP,42,0.0.0.0 # This is required for PXE booting dhcp-boot=net:AthlonXP,/pxelinux.0,boothost,192.168.1.2 # As can be seen in the dnsmasq.conf, this option is not guaranteed to work (DN search order) dhcp-option=AthlonXP,119,cluster.local # Now for the host listing, format is: # dhcp-host=MACADDRESS,net:GROUP_NAME,NODE_NAME,IP_ADDRESS dhcp-host=00:C0:4F:91:C6:03,net:AthlonXP,node0,192.168.1.50 dhcp-host=00:50:04:99:2A:DD,net:AthlonXP,node1,192.168.1.51 dhcp-host=00:40:ca:14:45:ef,net:AthlonXP,node2,192.168.1.52 dhcp-host=00:40:ca:15:7e:43,net:AthlonXP,node3,192.168.1.53 |
NOTE:Don't forget to add the following lines to your /etc/resolve.conf. If you don't, you won't be able to ping the nodes and all sorts of other weird problems will appear with the cluster usage and management tools.
| File: /etc/resolve.conf |
search cluster.local nameserver 127.0.0.1 |
ALSO, you must modify the way DHCP gets it's address for ETH0 since you don't want it to overwrite your resolv.conf. You get this by adding "-R" to the dhcpcd arguments:
| File: /etc/conf.d/net |
dhcpcd_eth0="-R" |
net-ftp/tftp-hpa and sys-boot/syslinux
NOTE: dnsmasq should do this now!!!
Refer to the Chapter 4 of Diskless Nodes with Gentoo for these two packages, they are easy to install and well explained there although we will extend the use of the pxelinux.0 boot mechanisms by the use of strategically assigning IP addresses and using the pxelinux.0 configuration files. Also, the Gentoo Diskless Client article has some example files.
Cluster Management
* sys-cluster/c3 [ Masked ]
Latest version available: 4.0.1 Latest version installed: [ Not Installed ] Size of downloaded files: 53 kB Homepage: http://www.csm.ornl.gov/torc/C3/ Description: The Cluster Command and Control (C3) tool suite License: C3
This package contains a set of small applications used to perform basic tasks simultaneously on all nodes of a cluster through commands using RSH or SSH. This is especially useful for setting and cleaning up the execution environment (for example). See the tool suite description for more details.
The config file
| File: /etc/c3.conf |
cluster local {
headless:headless #head node
thinkbig[1-24] #compute nodes
}
|
Example command
Retreive the hostname of all nodes as defined in the config file:
eric@headless ~ $ cexec -p hostname local thinkbig1: thinkbig1 local thinkbig2: thinkbig2 local thinkbig3: thinkbig3 local thinkbig4: thinkbig4 local thinkbig5: thinkbig5 local thinkbig6: thinkbig6 local thinkbig7: thinkbig7 local thinkbig8: thinkbig8 local thinkbig9: thinkbig9 local thinkbig10: thinkbig10 local thinkbig11: thinkbig11 local thinkbig12: thinkbig12 local thinkbig13: thinkbig13 local thinkbig14: thinkbig14 local thinkbig15: thinkbig15 local thinkbig16: thinkbig16 local thinkbig17: thinkbig17 local thinkbig18: thinkbig18 local thinkbig19: thinkbig19 local thinkbig20: thinkbig20 local thinkbig21: thinkbig21 local thinkbig22: thinkbig22 local thinkbig23: thinkbig23 local thinkbig24: thinkbig24
* sys-cluster/torque
Latest version available: 2.0.0_p7 Latest version installed: 2.0.0_p7 Size of downloaded files: 2,237 kB Homepage: http://www.clusterresources.com/products/torque/ Description: A freely downloadable cluster resource manager and queuing system based on OpenPBS License: openpbs
Torque 2.0.0p7 is now available as an ebuild (see Bug #115189)
* sys-cluster/maui [ Masked ]
Latest version available: 3.2.6_p13-r1 Latest version installed: 3.2.6_p13-r1 Size of downloaded files: 863 kB Homepage: http://www.clusterresources.com/products/maui/ Description: Maui Cluster Scheduler License: maui
Cluster Monitoring
* dev-lang/php
Latest version available: 5.1.4
Latest version installed: [ Not Installed ]
Size of downloaded files: 18,812 kB
Homepage: http://www.php.net/
Description: The PHP language runtime engine.
License: PHP-3
Will be needed for Ganglia monitoring and visualizing website. You will need at least USE="apache2 xml xmlread" for ganglia to work.
* net-www/apache
Latest version available: 2.0.55-r1
Latest version installed: [ Not Installed ]
Size of downloaded files: 14,053 kB
Homepage: http://httpd.apache.org/
Description: The Apache Web Server
License: Apache-2.0
Will be needed for Ganglia monitoring and visualising website. We also add the followin entry to /etc/apache2/vhosts.d/00_default_vhost.conf between the <IfDefine DEFAULT_VHOST> and </IfDefine> markers (at the begining).
| File: /etc/apache2/vhosts.d/00_default_vhost.conf |
<VirtualHost *:80>
# ServerName ganglia.livia.etsmtl.ca
# Serveralias ganglia.livia.etsmtl.ca
# ServerAdmin eric@livia.etsmtl.ca
Setenv VLOG
DirectoryIndex index.php index.html
DocumentRoot /var/www/ganglia.livia.etsmtl.ca/htdocs/
<Directory "/var/www/ganglia.livia.etsmtl.ca/htdocs/">
Order allow,deny
Allow from all
Options +Indexes
</Directory>
</VirtualHost>
|
* sys-cluster/ganglia
Latest version available: 3.0.2
Latest version installed: 3.0.2
Size of downloaded files: 3,328 kB
Homepage: http://ganglia.sourceforge.net/
Description: Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids
License: BSD
/etc/gmetad.conf
You first generate the default gmond.conf file with:
gmond --default_config > /etc/gmond.conf
Then edit it to your likings. I modified the following sections (note the added mcast_if=eth1):
| File: /etc/gmond.conf |
cluster {
name = "Kluster"
}
/* Feel free to specify as many udp_send_channels as you like. Gmond
used to only support having a single channel */
udp_send_channel {
mcast_if=eth1
mcast_join = 239.2.11.71
port = 8649
}
/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
mcast_if=eth1
mcast_join = 239.2.11.71
port = 8649
bind = 239.2.11.71
}
|
NOTE: As specified in the FAQ (see How should I configure multihomed machines?), in a multihomed environment, you must add the following to your /etc/conf.d/net:
routes_eth1=( "-host 239.2.11.71" )
Which is the equivalent to:
route add -host 239.2.11.71 dev eth1
OS Tweaking
* sys-kernel/ck-sources [ Masked ]
Latest version available: 2.6.15_p2 Latest version installed: [ Not Installed ] Size of downloaded files: 38,991 kB Homepage: http://members.optusnet.com.au/ckolivas/kernel/ Description: Full sources for the Linux kernel with Con Kolivas' high performance patchset and Gentoo's basic patchset. License: GPL-2
* sys-process/schedtool
Latest version available: 1.2.3 Latest version installed: [ Not Installed ] Size of downloaded files: 24 kB Homepage: http://freequaos.host.sk/schedtool Description: A tool to query or alter a process' scheduling policy. License: GPL-2
Important files
Modify /etc/make.conf
As always, USE flags and CFLAGS depend greatly on intended use and actual hardware profile of the machine. Some relevant USE flags in our case:
server --> Added to torque 2.0.0-p7 pbsserver mpi blas
| File: /etc/make.conf |
USE="-* sse sse2 blas mpi zlib nls userlocales utf8 vim X pbsserver server"
CFLAGS="-march=opteron -O2 -pipe"
CXXFLAGS="${CFLAGS}"
PORT_LOGDIR=/var/log/portage
PORTDIR_OVERLAY=/usr/local/portage
MAKEOPTS="-j6"
PORTAGE_NICENESS=-1
AUTOCLEAN="yes"
PORTAGE_TMPFS="/dev/shm"
FEATURES="buildpkg fixpackages sandbox ccache"
CCACHE_SIZE="2G"
CLEAN_DELAY=1
LINGUAS="fr en"
PORT_LOGDIR=/var/log/portage
|
LDAP Auth
I still have to write up how to set up OpenLDAP on the head node. For the moment, here are some of the important files that need to be modified for LDAP auth to work.
/etc/openldap/slapd.conf
I recommend you keep the comments and find where to change these lines in the slapd.conf file if you're new to LDAP.
| File: /etc/pam.d/system-auth |
include /etc/openldap/schema/core.schema include /etc/openldap/schema/cosine.schema include /etc/openldap/schema/inetorgperson.schema include /etc/openldap/schema/openldap.schema include /etc/openldap/schema/nis.schema include /etc/openldap/schema/java.schema include /etc/openldap/schema/corba.schema include /etc/openldap/schema/misc.schema pidfile /var/run/openldap/slapd.pid argsfile /var/run/openldap/slapd.args database bdb suffix "ou=cluster,dc=livia,dc=etsmtl,dc=ca" rootdn "cn=Manager,ou=cluster,dc=livia,dc=etsmtl,dc=ca" rootpw dotchos directory /var/lib/openldap-data/cluster index objectClass eq loglevel none |
/etc/pam.d/system-auth
The modification of this file is necessary if you want LDAP to store the passwords and if you want to be able to change them with passwd(1).
| File: /etc/pam.d/system-auth |
#%PAM-1.0 /etc/pam.d/system-auth auth required pam_env.so auth sufficient pam_unix.so likeauth nullok auth sufficient pam_ldap.so use_first_pass auth required pam_deny.so account required pam_unix.so password required pam_cracklib.so difok=2 minlen=8 dcredit=2 ocredit=2 retry=3 password sufficient pam_unix.so nullok md5 shadow use_authtok password sufficient pam_ldap.so use_authtok use_first_pass password required pam_deny.so session required pam_limits.so session required pam_unix.so session optional pam_ldap.so |
