Setting up accounts on cheetah ---- Initially accounts are empty. The first thing to do is change your password from the default password. All password changing must happen on carnivore, as carnivore is the file and password server. Passwords are syncronized every hour, so after you change it, within an hour it will be distributed to all the nodes. In the meantime you may have to use the old password. I'll be using the example account 'jojo' to demonstrate commands. [jojo@node1 jojo]$ ssh carnivore jojo@carnivore's password: Last login: Mon Jan 12 13:32:50 2004 from cheetah [jojo@carnivore jojo]$ passwd Changing password for jojo (current) UNIX password: New password: Retype new password: passwd: all authentication tokens updated successfully Updating... Password successfully changed. [jojo@carnivore jojo]$ Check to see if you can log in to each node. There are seven nodes besides cheetah, called node1 - node7. Cheetah is aliased to node0. [jojo@cheetah jojo]$ ssh node1 jojo@node1's password: Last login: Mon Jan 12 14:21:56 2004 from node0 [jojo@node1 jojo]$ Note that the first time you login to a new computer, it will give you an authenticity warning. Just type yes. Your home directory is automatically mounted on each node, so if you create something on cheetah you should be able to see if while on any node. Usually you should only do work while on cheetah. In order to login automatically, you need to create a rsa public/private key pair. From your home directory, go into the subdirectory ".ssh". The leading "." makes it normally hidden. [jojo@cheetah jojo]$ cd .ssh [jojo@cheetah .ssh]$ ls known_hosts [jojo@cheetah .ssh]$ The file known_hosts lists the computers you've connected to. After you connect to all of the nodes (and also other outside computers) they are listed in known_hosts. You want to create an rsa key-pair in the following way: [jojo@cheetah .ssh]$ ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/home/jojo/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/jojo/.ssh/id_rsa. Your public key has been saved in /home/jojo/.ssh/id_rsa.pub. The key fingerprint is: 92:eb:8b:53:00:4b:3a:6f:59:0f:03:5f:15:a7:20:a1 jojo@node1 [jojo@cheetah .ssh]$ ls id_rsa id_rsa.pub known_hosts [jojo@cheetah .ssh]$ Just press enter at each prompt. The two files created are id_rsa and id_rsa.pub. id_rsa is the private key, and id_rsa.pub is the public key that you would distribute to other computers. When you use ssh or scp, it uses these keys to establish that you want to automatically login. Now, remember that your home directory is automatically exported to each node. This means you don't have to copy between any computers, you just have to put the key in the correct place. For rsa, just copy the public key to authorized_keys. After this you will be able to ssh to and from any computer within the cluster. [jojo@cheetah .ssh]$ cp id_rsa.pub authorized_keys [jojo@cheetah .ssh]$ ls authorized_keys id_rsa id_rsa.pub known_hosts [jojo@cheetah .ssh]$ ssh node1 Last login: Mon Jan 12 14:24:28 2004 from node0 [jojo@node1 jojo]$ ssh node4 The authenticity of host 'node4 (10.0.4.104)' can't be established. RSA key fingerprint is 9a:1e:0f:ce:2a:27:d4:b8:56:cf:72:1b:1f:fa:d6:6f. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'node4,10.0.4.104' (RSA) to the list of known hosts. Last login: Tue Feb 4 01:25:55 2003 from node0 [jojo@node4 jojo]$ logout [jojo@node1 jojo]$ logout [jojo@cheetah jojo]$ Now that you can ssh to any node without a password, you can use lam (an mpi distribution) without further setup. Note that to properly use lam, you must set up ssh to auto-login and also must have already ssh'ed to each node so it doesn't get the authenticity check. This is because lam depends upon ssh to run remote programs. To start the lam process, use lamboot. The -v option means verbose, and will show you if there are any problems. The command automatically uses the system node list and takes about one second per node. [jojo@cheetah jojo]$ lamboot -v LAM 6.5.4/MPI 2 C++/ROMIO - University of Notre Dame Executing hboot on n0 (node0 - 1 CPU)... Executing hboot on n1 (node1 - 1 CPU)... Executing hboot on n2 (node2 - 1 CPU)... Executing hboot on n3 (node3 - 1 CPU)... Executing hboot on n4 (node4 - 1 CPU)... Executing hboot on n5 (node5 - 1 CPU)... Executing hboot on n6 (node6 - 1 CPU)... Executing hboot on n7 (node7 - 1 CPU)... topology done [jojo@cheetah jojo]$ In order to compile programs, use mpicc. This is a wrapper program for gcc. There is also mpiCC for c++ and mpif77 for f77. It automatically adds libraries and other mpi necessities to your program. Any options, for example -lm for the math library, are automatically passed on to gcc. Example: [jojo@cheetah jojo]$ mpicc msum.c -o msum -lm [jojo@cheetah jojo]$ ls msum msum.c [jojo@cheetah jojo]$ msum.c is an example program to test mpi message passing. It adds the sum of the square-roots of numbers 1 to a preset max. Ranges are assigned and distributed to each node. After compiling, you can run with mpirun. The syntax of mpirun is: mpirun -np # [arguments] or mpirun N [arguments] The is the name of your executable, and [arguments] are passed on to your program. # should be the number of nodes. We have seven nodes plus the head node, so one per node would be "-np 8". You can do fewer to see how the number of nodes affects execution time. (Note: execution time may also be affected by other people running programs. Check with 'who' or 'finger'.) mpirun N ... automatically runs with one program per node. [jojo@cheetah jojo]$ mpirun N msum Hello. I am n0. Hello. I am n4. Hello. I am n1. Hello. I am n2. Hello. I am n5. Hello. I am n6. Hello. I am n3. Hello. I am n7. Sent range [0-142857142]. Sent range [142857142-285714284]. Sent range [285714284-428571426]. Sent range [428571426-571428568]. Sent range [571428568-714285710]. Sent range [714285710-857142852]. Sent range [857142852-1000000000]. got one from node1 got one from node2 got one from node3 got one from node4 got one from node5 got one from node6 got one from node7 Sum is: 1609 [jojo@cheetah jojo]$ Use the 'time' command to easily test execution time. There are other ways within an mpi program to do more advanced timing and profiling. [jojo@cheetah jojo]$ time mpirun N msum Hello. I am n0. Sent range [0-142857142]. [...] Sum is: 161085 real 0m29.600s user 0m0.008s sys 0m0.006s [jojo@cheetah jojo]$ --- Joshua Blake blakjr@jmu.edu