TNT works with the OMPI
implementation, version 4.1.0. Installations of TNT and OMPI will be
independent. Therefore, as MPI libraries won't be in the TNT files, the user
must compile OMPI from source. TNT will get those libraries at execution time
automatically (the ordinary linking action with any shared library, such as
OMPI’s). The platforms supported are Linux and Cygwin (Windows 7 and higher
were tested).
On Linux:
A common procedure is:
./configure --prefix=<INSTALL DIRECTORY>
(It shows a lot of output)
(Usually takes a while)
A
very common setting is:
export PATH="<MPI INSTALL
DIRECTORY>/bin:$PATH"
export LD_LIBRARY_PATH="<MPI INSTALL
DIRECTORY>/lib:$LD_LIBRARY_PATH"
export OMPI_MCA_btl_tcp_if_include=<IP
ADDRESS>/<MASK>
(this one is because we've experienced some
issues when using more than one network interface; this setting corrects the
issue. Use the IP address of your computer, with the last field as 0; the
proper mask is generally 24, although this depends on the exact configuration
of your network)
export OMPI_MCA_rmaps_base_oversubscribe=1
(otherwise, you get crashing errors when
oversubscribing processes)
On Cygwin:
With
the cygwin-setup application (e.g. setup-x86_64.exe
or some such), install these packages:
OMPI_MCA_btl_tcp_if_include = <IP ADDRESS>/<MASK>
(this is because we've experienced some issues
when using more than one network interface; this setting corrects the issue.
Use the IP address of your computer, with the last field as 0; the proper mask is
generally 24, although this depends on the exact configuration of your network)
OMPI_MCA_rmaps_base_oversubscribe = 1
(if you don't want some crashing error when
running more processes than CPUs)
Requirements:
The preferred approach for
the later is to have a common file system, such as
NFS (Network File System). It would be just one computer with the
libraries/binaries, which would share all that by a network directory, to all
the nodes of the cluster. Another way is to install everything on the local
hard drive of each node; this one, clearly, makes maintenance more difficult:
if you want to upgrade TNT or MPI, you would need to reinstall the upgrades in
each node's hard drive (when, using the the first
method, this work needs to be done in just one machine). Both approaches work
the same once they’re done. Less usual scenarios bring into consideration the
networked filesystem costs as a negative factor.
However, NFS is, again, the most frequent answer.
hostfile
Unlike PVM, in MPI systems
it is not necessary to preload hosts to create a virtual machine. They are just
listed in an ordinary text file called by convention hostfile,
which TNT reads at runtime. An example of its content:
master |
slots=16 |
max_slots=16 |
fast1 |
slots=16 |
max_slots=16 |
fast2 |
slots=16 |
max_slots=8 |
slow1 |
slots=8 |
max_slots=8 |
slow2 |
slots=8 |
max_slots=8 |
Here, there are 5 rows -one per host- arranged in 3
columns as follows:
The hostfile
would be usually just in the master node (called in the computer from
which TNT launches parallel jobs to the other ones in the network), and TNT
will get it from the current working directory, or a user-defined one.
Notes: