The Marmot
software libraries provide runtime checking of MPI calls and
can point out incorrect usage and deadlocks. The libraries can be
helpful additions to debugging MPI code along with classical
debuggers. Marmot libs are installed for the default MPI
implementations on NCSA systems and example Fortran and C linking and
usage is shown in the table below. Commands are shown in bold so
that they can easily be cut & pasted for your convenience.
[arnoldg@tune ~/mpi]$ icpc -g -o hello_hang_m hello_hang.o -L/usr/apps/mpi/marmot/lib \ -lmarmot-profile -lmarmot-core -lmarmot-trace \ -lcmpi -lpthread -lstdc++ -L/opt/gm/lib -lgm [arnoldg@tune ~/mpi]$ [arnoldg@tune ~/mpi]$ /usr/bin/cmpirun -np 3 -machinefile $HOME/lamhosts hello_hang_m Hello world! I'm 0 of 2 on tund Hello world! I'm 1 of 2 on tunb WARNING: all clients are pending! Last calls (max. 10) on node 0: timestamp 1: MPI_Init(*argc, ***argv) timestamp 3: MPI_Comm_rank(comm = MPI_COMM_WORLD, *rank) timestamp 5: MPI_Comm_size(comm = MPI_COMM_WORLD, *size) timestamp 7: MPI_Get_processor_name(*name, *resultlen) timestamp 9: MPI_Finalize()
Last calls (max. 10) on node 1: timestamp 2: MPI_Init(*argc, ***argv) timestamp 4: MPI_Comm_rank(comm = MPI_COMM_WORLD, *rank) timestamp 6: MPI_Comm_size(comm = MPI_COMM_WORLD, *size) timestamp 8: MPI_Get_processor_name(*name, *resultlen) timestamp 10: MPI_Send(*buf, count = 1, datatype = MPI_CHAR, dest = 1, tag = 55, comm MPI_COMM_WORLD)
Killed by signal 2. Killed by signal 2.
|
[arnoldg@tund TEST_F]$ cmpifc -c -g ring1.f [arnoldg@tund TEST_F]$ [arnoldg@tund TEST_F]$ icpc -g -o ring1 ring1.o -L/usr/apps/mpi/marmot/lib \ -lmarmot-profile -lmarmot-fortran -lmarmot-core -lmarmot-trace -lcmpi_fort \ -lcmpi -lcmpi_fort_io -lcmpi_io -lpthread -lstdc++ -L/opt/gm/lib -lifport -lifcore -limf \ -lgm /usr/local/intel/9.0.026/lib/for_main.o [arnoldg@tund TEST_F]$ [arnoldg@tund TEST_F]$ /usr/bin/cmpirun -np 3 -machinefile $HOME/lamhosts ring1 WARNING: all clients are pending! Last calls (max. 10) on node 0: timestamp 1: MPI_INIT(ierror) timestamp 3: MPI_COMM_RANK(comm = MPI_COMM_WORLD, *rank, ierror) timestamp 5: MPI_COMM_SIZE(comm = MPI_COMM_WORLD, *size, ierror) timestamp 7: MPI_SSEND(*buf, count = 1, datatype = MPI_INTEGER, dest = 1, tag = 201, comm = MPI_COMM_WORLD, ierror)
Last calls (max. 10) on node 1: timestamp 2: MPI_INIT(ierror) timestamp 4: MPI_COMM_RANK(comm = MPI_COMM_WORLD, *rank, ierror) timestamp 6: MPI_COMM_SIZE(comm = MPI_COMM_WORLD, *size, ierror) timestamp 8: MPI_SSEND(*buf, count = 1, datatype = MPI_INTEGER, dest = 0, tag = 201, comm = MPI_COMM_WORLD, ierror)
|
cu11:~/mpi181% mpCC_r -g -o hello_hang_m hello_hang.c -L/usr/local/lib \ -lmarmot-profile -lmarmot-core -lmarmot-trace cu11:~/mpi182% cu11:~/mpi183% poe hello_hang_m -procs 3 Hello world! I'm 0 of 2 on cu11 Hello world! I'm 1 of 2 on cu11 WARNING: all clients are pending! Last calls (max. 10) on node 0: timestamp 2: MPI_Init(*argc, ***argv) timestamp 4: MPI_Comm_rank(comm = MPI_COMM_WORLD, *rank) timestamp 6: MPI_Comm_size(comm = MPI_COMM_WORLD, *size) timestamp 7: MPI_Get_processor_name(*name, *resultlen) timestamp 9: MPI_Finalize()
Last calls (max. 10) on node 1: timestamp 1: MPI_Init(*argc, ***argv) timestamp 3: MPI_Comm_rank(comm = MPI_COMM_WORLD, *rank) timestamp 5: MPI_Comm_size(comm = MPI_COMM_WORLD, *size) timestamp 8: MPI_Get_processor_name(*name, *resultlen) timestamp 10: MPI_Recv(*buf, count = 1, datatype = MPI_CHAR, source = 1, tag = 55, comm = MPI_COMM_WORLD, *status)
^CERROR: 0031-250 task 2: Interrupt ERROR: 0031-250 task 0: Interrupt ERROR: 0031-250 task 1: Interrupt
|
Copper Fortran example
cu11:~/MARMOT/TEST_F222% mpxlf_r -c ring1.f ** ring === End of Compilation 1 === 1501-510 Compilation successful for file ring1.f. cu11:~/MARMOT/TEST_F223% mpCC_r -g -o ring1 ring1.o -L/usr/local/lib \ -lmarmot-profile -lmarmot-core -lmarmot-trace -lmarmot-fortran -lxlf -lxlf90 cu11:~/MARMOT/TEST_F224% cu11:~/MARMOT/TEST_F225% poe ./ring1 -procs 3 WARNING: all clients are pending! Last calls (max. 10) on node 0: timestamp 1: MPI_INIT(ierror) timestamp 3: MPI_COMM_RANK(comm = MPI_COMM_WORLD, *rank, ierror) timestamp 5: MPI_COMM_SIZE(comm = MPI_COMM_WORLD, *size, ierror) timestamp 7: MPI_SSEND(*buf, count = 1, datatype = MPI_INTEGER, dest = 1, tag = 201, comm = MPI_COMM_WORLD, ierror)
Last calls (max. 10) on node 1: timestamp 2: MPI_INIT(ierror) timestamp 4: MPI_COMM_RANK(comm = MPI_COMM_WORLD, *rank, ierror) timestamp 6: MPI_COMM_SIZE(comm = MPI_COMM_WORLD, *size, ierror) timestamp 8: MPI_SSEND(*buf, count = 1, datatype = MPI_INTEGER, dest = 0, tag = 201, comm = MPI_COMM_WORLD, ierror)
^CERROR: 0031-250 task 1: Interrupt ERROR: 0031-250 task 0: Interrupt ERROR: 0031-250 task 2: Interrupt
|
[arnoldg@co-login1 ~/mpi]$ icc -g -c hello_hang.c hello_hang.c(38): warning #266: function declared implicitly exit(0); ^
[arnoldg@co-login1 ~/mpi]$ icpc -g -o hello_hang_m hello_hang.o \ -L/usr/apps/mpi/marmot/lib -lmarmot-profile -lmarmot-core -lmarmot-trace -lmpi [arnoldg@co-login1 ~/mpi]$ [arnoldg@co-login1 ~/mpi]$ mpirun -np 3 hello_hang_m Hello world! I'm 0 of 2 on co-login1.ncsa.uiuc.edu Hello world! I'm 1 of 2 on co-login1.ncsa.uiuc.edu WARNING: all clients are pending! Last calls (max. 10) on node 0: timestamp 1: MPI_Init(*argc, ***argv) timestamp 4: MPI_Comm_rank(comm = MPI_COMM_WORLD, *rank) timestamp 5: MPI_Comm_size(comm = MPI_COMM_WORLD, *size) timestamp 8: MPI_Get_processor_name(*name, *resultlen) timestamp 9: MPI_Finalize()
Last calls (max. 10) on node 1: timestamp 2: MPI_Init(*argc, ***argv) timestamp 3: MPI_Comm_rank(comm = MPI_COMM_WORLD, *rank) timestamp 6: MPI_Comm_size(comm = MPI_COMM_WORLD, *size) timestamp 7: MPI_Get_processor_name(*name, *resultlen) timestamp 10: MPI_Recv(*buf, count = 1, datatype = MPI_CHAR, source = 1, tag = 55, comm = MPI_COMM_WORLD, *status)
|
[arnoldg@co-login1 TEST_F]$ ifort -g -c ring1.f [arnoldg@co-login1 TEST_F]$ [arnoldg@co-login1 TEST_F]$ icpc -g -o ring1 ring1.o -L../LIB \ -lmarmot-profile -lmarmot-fortran -lmarmot-core -lmpi \ -lifcore /usr/local/intel/9.0.028/lib/for_main.o [arnoldg@co-login1 TEST_F]$ [arnoldg@co-login1 TEST_F]$ mpirun -np 3 ring1 WARNING: all clients are pending! Last calls (max. 10) on node 0: timestamp 1: MPI_INIT(ierror) timestamp 4: MPI_COMM_RANK(comm = MPI_COMM_WORLD, *rank, ierror) timestamp 5: MPI_COMM_SIZE(comm = MPI_COMM_WORLD, *size, ierror) timestamp 8: MPI_SSEND(*buf, count = 1, datatype = MPI_INTEGER, dest = 1, tag = 201, comm = MPI_COMM_WORLD, ierror)
Last calls (max. 10) on node 1: timestamp 2: MPI_INIT(ierror) timestamp 3: MPI_COMM_RANK(comm = MPI_COMM_WORLD, *rank, ierror) timestamp 6: MPI_COMM_SIZE(comm = MPI_COMM_WORLD, *size, ierror) timestamp 7: MPI_SSEND(*buf, count = 1, datatype = MPI_INTEGER, dest = 0, tag = 201, comm = MPI_COMM_WORLD, ierror)
|
arnoldg/mpi> mpicc -c -g hello_hang.c hello_hang.c(38): warning #266: function declared implicitly exit(0); ^
arnoldg/mpi> mpiCC -g -o hello_hang_m hello_hang.o -L/usr/local/lib \ -lmarmot-profile -lmarmot-core -lmarmot-trace arnoldg/mpi> arnoldg/mpi> mpirun -np 3 -machinefile ~/lamhosts hello_hang_m Warning: No xauth data; using fake authentication data for X11 forwarding. /usr/bin/X11/xauth: error in locking authority file /home/ncsa/arnoldg/.Xauthority Hello world! I'm 0 of 2 on tg-login3.ncsa.teragrid.org Hello world! I'm 1 of 2 on tg-login4.ncsa.teragrid.org WARNING: all clients are pending! Last calls (max. 10) on node 0: timestamp 2: MPI_Init(*argc, ***argv) timestamp 3: MPI_Comm_rank(comm = MPI_COMM_WORLD, *rank) timestamp 5: MPI_Comm_size(comm = MPI_COMM_WORLD, *size) timestamp 7: MPI_Get_processor_name(*name, *resultlen) timestamp 9: MPI_Finalize()
Last calls (max. 10) on node 1: timestamp 1: MPI_Init(*argc, ***argv) timestamp 4: MPI_Comm_rank(comm = MPI_COMM_WORLD, *rank) timestamp 6: MPI_Comm_size(comm = MPI_COMM_WORLD, *size) timestamp 8: MPI_Get_processor_name(*name, *resultlen) timestamp 10: MPI_Recv(*buf, count = 1, datatype = MPI_CHAR, source = 1, tag = 55, comm = MPI_COMM_WORLD, *status)
Killed by signal 2.
|
MARMOT/TEST_F> mpif77 -g -c ring1.f MARMOT/TEST_F> MARMOT/TEST_F> mpiCC -g -o ring1 ring1.o -L$HOME/src/MARMOT/LIB \ -lmarmot-profile -lmarmot-fortran -lmarmot-core -lmarmot-trace -limf \ -lifcore -lifport -ldl /opt/intel/compiler80/lib/for_main.o MARMOT/TEST_F> MARMOT/TEST_F> mpirun -np 3 -machinefile ~/lamhosts ring1 Warning: No xauth data; using fake authentication data for X11 forwarding. /usr/bin/X11/xauth: error in locking authority file /home/ncsa/arnoldg/.Xauthority WARNING: all clients are pending! Last calls (max. 10) on node 0: timestamp 2: MPI_INIT(ierror) timestamp 3: MPI_COMM_RANK(comm = MPI_COMM_WORLD, *rank, ierror) timestamp 5: MPI_COMM_SIZE(comm = MPI_COMM_WORLD, *size, ierror) timestamp 7: MPI_SSEND(*buf, count = 1, datatype = MPI_INTEGER, dest = 1, tag = 201, comm = MPI_COMM_WORLD, ierror)
Last calls (max. 10) on node 1: timestamp 1: MPI_INIT(ierror) timestamp 4: MPI_COMM_RANK(comm = MPI_COMM_WORLD, *rank, ierror) timestamp 6: MPI_COMM_SIZE(comm = MPI_COMM_WORLD, *size, ierror) timestamp 8: MPI_SSEND(*buf, count = 1, datatype = MPI_INTEGER, dest = 0, tag = 201, comm = MPI_COMM_WORLD, ierror)
|
[arnoldg@honest2 hang]$ mpicc -g -o hello_hang_m hello_hang.c \
-L/usr/apps/mpi/marmot_mvapich2_intel/lib -lmarmot-profile -lmarmot-core \
-lmarmot-trace -lstdc++
hello_hang.c(37): warning #266: function "exit" declared implicitly
exit(0);
|
|