NCSA Home
Contact Us | Intranet | Search

Totalview Debugger

Table of Contents
  1. Overview
  2. TotalView on NCSA Linux Clusters
  3. TotalView on NCSA Shared Memory Machines
  4. General TotalView usage
  5. Using the command line interface (CLI)
  6. Enabling Memory Debugging

1. Overview

TotalView is a full-featured, source-level, graphical debugger for C, C++, and Fortran (77 and 90), assembler, and mixed source/assembler codes based on the X Window System from Etnus. TotalView supports MPI, PVM and HPF.

Information on TotalView is available in the release notes and user guide at the Etnus Online Documentation page, as well as in the /usr/apps/tools/toolworks/totalview/doc directory. Also see "man totalview" for command syntax and options.

Note: In order to use TotalView, you must be using a terminal or workstation capable of displaying X Windows. See Using the X Window System for more information.

2. TotalView on Linux Clusters

TotalView is available on NCSA's Linux Clusters. There are 4 TotalView licenses for jobs up to 8 processes, and 1 license for jobs up to 128 processes. We do not currently have a way to guarantee you will get a license when your job starts if you run in batch.

GNU and Intel compilers are supported.

Important: For both compilers you need to compile and link your code with -g to enable source code listing within TotalView.

2.1 TG-NCSA (Mercury)

Issue with GPFS and the 2.4 kernel (2008)

An issue with using TotalView to debug MPI applications where the executable resides on a GPFS filesystem and/or the MPICH-GM libraries are dynamically linked to versions which reside on a GPFS filesystem appeared after the transition of all filesystems to GPFS in late '07.

Additional steps have been added below to work around this issue by statically linking MPICH-GM libraries and by placing the application executable and input data on the node local scratch for each node in the debug session.

Before you begin
  1. Compile and link your code with the compiler/linker flags '-O0 -g' to provide symbolic debug information and predictable TotalView behavior with the Intel  compilers. If you are using MPICH-GM and the mpicc, mpicxx, mpif77 and mpif90 compiler scripts, you need to set the environment variable MPICH_USE_SHLIB to no before linking. This can be done in your .soft file by adding the line:
    MPICH_USE_SHLIB=no
    or if you are NOT using the compiler scripts and prefer to add the libraries needed for linking:
    f77/f90: -L $MPICH_GM_HOME/lib -lmpich -lpmpich -L $GM_HOME/lib -lgm -lpthread
    c: -L $MPICH_GM_HOME/lib -lmpich -lpmpich -L $GM_HOME/lib -lgm -lpthread
    c++: -L $MPICH_GM_HOME/lib -lpmpich++ -lmpich -lpmpich -L $GM_HOME/lib -lgm -lpthread

    Doing ldd executablename should not show any MPICH-GM libraries (GM libs are ok).
  2. Add +totalview to your .soft file in your HOME directory and issue the resoft command. This will add TotalView to your environment.
  3. Make sure you have your DISPLAY environment variable set correctly. See the discussion on Using the X11 Windows System and/or Running from an interactive batch session

Serial Code Debugging

If the memory requirements of your code fit within the limits of a shell on the front-end host (tg-login), you can run TotalView directly. If not, you will need to run on a compute node via an interactive batch session.

From the tg-login front-end host you start the TotalView debugging session with the following command

% totalview ./program.exe [program args]

If you do not see the TotalView process manager window, you should first consult the Using the X Window System page.

MPI Debugging MPICH-GM

First, start an interactive batch session with the number of nodes/processes needed for debugging your application. This will put you onto the launch host for the job.

As mentioned above, your application executable needs to be placed on a non-GPFS filesystem which in this case is the local scratch disk on each node (/scr). To efficiently copy the executable (and any other input files needed by the executable), use pbsdsh as follows from within the launch host of the interactive job:

pbsdsh -u cp $HOME/path/to/my/executablec /scr/$PBS_JOBID
pbsdsh -u cp $HOME/path/to/my/inputdata /scr/$PBS_JOBID
cd /scr/$PBS_JOBID

If you are not using ssh X11 tunnelling but are setting DISPLAY to be a direct access to your local Xserver then you can skip down to mpirun -np XX ...., otherwise do the following steps.

cp $PBS_NODEFILE machinefile
hostname

and from another login session on tg-login.ncsa.teragrid.org:

ssh -X launch hostname
cd /scr/pbs_jobid_from_above

and then start mpirun as below with $PBS_NODEFILE replaced with machinefile as needed.

Once you are ready to debug, MPICH-GM provides a TotalView switch to the mpirun script.

% mpirun -np XX -machinefile $PBS_NODEFILE -tv ./program.exe [program args]

where XX is the number of processes needed.

MPI Debugging MPICH-VMI

First, start an interactive batch session with the number of nodes/processes needed for debugging your application.

Follow the steps in the MPICH-GM above description to copy your executable and input files to local scratch (/scr) and then proceed with the following steps.

Once you are ready to debug, MPICH-VMI provides a TotalView switch to the mpirun script that enables the 'attach to a paused process method'. This is different than the other methods of using Totalview with an MPI application but just as valid.

% mpirun -np XX -machinefile $PBS_NODEFILE -debugger totalview ./program.exe [program args]

where XX is the number of processes needed. mpirun with then report that the process is waiting for 300 seconds and also provide the process PID of the application that totalview should attach to.

In another window, ssh (with -X) to the host machine that your ran mpirun on. Then change directory to the location where you ran your application. Finally:

% setenv LD_LIBRARY_PATH /opt/vmi-2.0.1-1-gcc/lib:$LD_LIBRARY_PATH
% totalview -pid PID ./program.exe

and totalview will start-up, attach the process given by PID and use ./program.exe to get the symbol information from.

Running from a PBS Interactive batch session

For an interactive batch session you need to specify the number of compute nodes and the amount of wall clock time you will need them. The example below asks for 2 compute nodes with 2 processes per node for 30 minutes.

% qsub -I -V -lwalltime=00:30:00 -lnodes=2:ppn=2
When the session begins, it will startup a shell on launch node.

There are two options once you have a session started: using X11 tunneling with ssh or setting environment variables. The preferred way is to use X11 tunnelling with ssh but the PBS batch system does not use ssh to put the user on the compute node.

Setting Environment Variables directly is described in the Using the X Window System page. In this mode you need to set your DISPLAY variable to the X display of your local machine.

The other option is to use X11 tunneling with ssh. Find the name of the launch host for the interactive batch session above by typing hostname in the window that has the interactive batch session in progress. You should also echo the environment variable $PBS_NODEFILE  which will be used below.

From another TG-NCSA login session, logon to the launch host via ssh with tunnelling enabled.

% ssh -X launch_host

Once connected to the launch host, change to the directory from where you run your application. Next, set the environment variable PBS_NODEFILE  to that reported from the interactive batch session.

2.2 Abe (Intel 64 Linux Cluster)

TotalView will be supported on Abe. This support is currently in-progress.

Before you begin
  1. Compile and link your code with the compiler/linker flags '-O0 -g' to provide symbolic debug information and predictable TotalView behavior with the Intel  compilers.
  2. Add +totalview to your .soft file in your HOME directory and issue the resoft command. This will add TotalView to your environment.
  3. Make sure you have your DISPLAY environment variable set correctly. See the discussion on Using the X11 Windows System and/or Running from an interactive batch session

Serial Code Debugging

If the memory requirements of your code fit within the limits of a shell on a front-end host (one of the honest nodes), you can run TotalView directly. If not, you will need to run on a compute node via an interactive batch session.

From the tg-login front-end host you start the TotalView debugging session with the following command

% totalview ./program.exe [program args]

If you do not see the TotalView process manager window, you should first consult the Using the X Window System page.

MPI Debugging MVAPICH2

First, add +mvapich2-0.9.8p2patched-intel-ofed-1.2-dbg or the -dbg build of the mvapich2 version you are using to your ~/.soft file, resoft and relink your application. If you are using mvapich2-1.2 then you are ready to go as this build has TotalView already enabled.

Next, start an interactive batch session with the number of nodes/processes needed for debugging your application.

Start up the mpd processes in this session as you would do for a batch job. See the sample batch job file for MVAPICH2.

From another terminal session connected to the PBS launch host:

% mpiexec -tv -n XX ./program.exe [program args]

where XX is the number of processes needed. mpiexec should connect to the mpd console of the launch host.

When done debugging, issue the command mpdexitall.

MPI Debugging with Open MPI

First, add +openmpi-1.2.4-intel or +openmpi-1.2.4-gcc to your ~/.soft file, resoft and build your application if you are not already using Open MPI.

NOTE: Version of Open MPI prior to 1.2.4 will not work with TotalView 8.

Next, start an interactive batch session with the number of nodes/processes needed for debugging your application.

From another terminal session connected to the PBS launch host:

% mpirun -tv -np XX -machinefile ${PBS_NODEFILE} ./program.exe [program args]

where XX is the number of processes needed.

For more information, please see the discussion here for more information on using TotalView with Open MPI.

MPI Debugging MPICH-VMI

First, start an interactive batch session with the number of nodes/processes needed for debugging your application.

Once you are ready to debug, MPICH-VMI provides a TotalView switch to the mpirun script that enables the 'attach to a paused process method'. This is different than the other methods of using Totalview with an MPI application but just as valid.

% mpirun -np XX -machinefile $PBS_NODEFILE -debugger totalview ./program.exe [program args]

where XX is the number of processes needed. VMI with then report that the process is waiting for 300 seconds and also provide the host and process PID of the application that totalview should attach to.
Connect to TotalView on Host: 10.1.68.172 PID: 12673. Waiting for 300 Seconds
for example.
In another window, ssh -X (with tunnelling) to the Host ip that reported by VMI. Next, change directory to the location where you ran your application. Finally start totalview with the PID and program name:

% totalview -pid PID ./program.exe

and totalview will start-up, attach the process given by PID and use ./program.exe to get the symbol information from.

You should now be able to use TotalView.

Running from a PBS Interactive batch session

The Torque qsub command now supports the use of X11 tunnelling directly to the launch host for interactive batch sessions via the -X switch.

First, be sure you enabled ssh tunneing in the ssh session you used to connect to one of the honest hosts. Second, submit an interactive batch job with the -X switch:

qsub -I -X -V -lwalltime=00:30:00 -lnodes=2:ppn=8
Finally, once PBS has put you on the launch host, you need only use one of the above MPI start-up sequences to start debugging.

Old way without using the -X option for qsub

For an interactive batch session you need to specify the number of compute nodes and the amount of wall clock time you will need them. The example below asks for 2 compute nodes with 8 processes per node for 30 minutes.

% qsub -I -V -lwalltime=00:30:00 -lnodes=2:ppn=8
When the session begins, it will startup a shell on launch node.

There are two options once you have a session started: using X11 tunneling with ssh or setting environment variables. The preferred way is to use X11 tunnelling with ssh but the PBS batch system does not use ssh to put the user on the compute node.

Setting Environment Variables directly is described in the Using the X Window System page. In this mode you need to set your DISPLAY variable to the X display of your local machine.

The other option is to use X11 tunneling with ssh. Find the name of the launch host for the interactive batch session above by typing hostname in the window that has the interactive batch session in progress.

From another Abe login session (on a honest node), logon to the launch host via ssh with tunnelling enabled.

% ssh -X launch_host

If the MPD ring has been started in the PBS session, you can start your mpiexec session as above.

3. TotalView on NCSA Shared Memory Machines

TotalView is available on NCSA's shared memory machines.

3.1 Cobalt (SGI Altix)

There are 32 TotalView licenses for jobs up to 32 processes. We do not currently have a way to guarantee you will get a license when your job starts if you run in batch.

Before you begin
  1. Compile and link your code with the compiler/linker flags '-O0 -g' to provide symbolic debug information and predictable TotalView behavior with the Intel compilers.
  2. Add +totalview to your .soft file in your HOME directory and issue the resoft command. This will add TotalView to your environment.
Serial and OpenMP Debugging

If the memory requirements of your code fit within the limits of a shell on the front-end machine (cobalt), you can run TotalView directly. If not, you will need to run via an interactive batch session.

On the interactive host co-login1, you start the TotalView debugging session with the following command
% totalview ./program.exe [ program args ]
MPI Debugging with MPT

There is currently an issue with MPT 1.23 (default MPI) and breakpoints. When debugging your application, please rebuild with MPT 1.25 by adding it to your environment:
% soft add ++sgi-mpt-1.25
The MPT include files and libraries will be added automatically.

Due to some changes in shell limits and the MPI_MEMMAP feature and the use of the mpirun wrapper, you need to disable MPI_MEMMAP for the shell:
% setenv MPI_MEMMAP_OFF
or
% export MPI_MEMMAP_OFF

% totalview /usr/bin/mpirun -a [ mpirun arguments ] ./program.exe  [ program args ] 
For example, here is how you would run a code called xhpl with 4 processors:
% totalview /usr/bin/mpirun -a -np 4 ./xhpl

Running from a PBS Interactive batch session

For an interactive batch session you need to specify the number of cpus, the wall clock time and memory you will need. The example below asks for 4 cpus for 30 minutes and 2gb of memory:

% qsub -I -V -lwalltime=00:30:00 -lncpus=4 -lmem=2gb
When the session begins, it will startup a shell on launch node.

There are two options once you have a session started: using X11 tunneling with ssh or setting environment variables. The preferred way is to use X11 tunnelling with ssh but the PBS batch system does not use ssh to put the user on the compute node.

Setting Environment Variables. This is described in the Using the X Window System page. In this mode you need to set your DISPLAY variable to the X display of your local machine.

X11 tunneling. If you specify the debug queue via the -qdebug option to qsub, your interactive batch job will be run on the login host. Since the DISPLAY variable is set correctly by specifiying -V, the job is ready for TotalView debugging.

4. General TotalView usage

Serial and OpenMP Debugging

As TotalView starts up, you will see two windows appear: the Control window and the Process window. In the Process window you can start inserting breakpoints etc and then click on the GO button. Happy debugging.

MPI Debugging

As TotalView starts up, you will see two windows appear: the Control window and the Process window. In the Process window click on the GO button and the when prompted by the window "Process XXX is a parallel job. Do you want to stop the job now ?", click "Yes". You will arrive at the MPI_Init() breakpoint as shown here for a code using SGI's MPT. You are now ready to debug in parallel.

Note
: If you are debugging a code using MPICH-GM on Tungsten you will want to insert a breakpoint at somepoint after the call to MPI_Init() as the builtin breakpoint for MPI_Init() does not appear fully functional.

Some comments from Etnus about breakpoints and MPI_Init:
"Be very cautious in placing breakpoints at or before a line that calls MPI_Init() or MPL_Init() because timeouts can occur while your program is being initialized. After you allow the parallel processes to proceed into the MPI_Init() or MPL_Init() call, allow all of the parallel processes to proceed through it within a short time."

"Timeouts can occur if you place breakpoints that stop other processes too soon after calling MPI_Init() or MPL_Init(). If you create "stop all" breakpoints, the first process that gets to the breakpoint stops all the other parallel processes that have not yet arrived at the breakpoint. This can cause a timeout."

More on Breakpoints

To get all processes to stop at the same action point (see breakpoint) instead of stopping the group of processes as a whole when the current process hits the action point: go to File -> Preferences -> Action Points and select "When breakpoint hit, stop:  Process" rather than Group. You can also set this preference on an individual basis by opening the properties dialog for each individual breakpoint (right click on action point and select Properties).

5. Using the command line interface (CLI)

Using the TotalView command line interface with SGI MPT applications

Put TotalView in your environment:

soft add +totalview

Launch TotalView using the CLI

totalviewcli /usr/bin/mpirun -a -np 4 ./mpihw

For more information on using the CLI, consult the following Etnus pages

6. Enabling Memory Debugging

For all platforms, be sure to add +totalview to your ${HOME}/.soft file and issue the resoft command. Add the following  additions to your linking step and then see that last paragraph for how to check that Memory Debugging is enabled.

Copper (rs6000)

First try
setenv LIBPATH ${TOTALVIEW_HOME}/rs6000/lib/tvheap_mr:${TOTALVIEW_HOME}/rs6000/lib:${LIBPATH}
and then launch TotalView as usual. If the above does not work, you need to relink your application as follows:

32-bit compiling (-q32):

–L ${TOTALVIEW_HOME}/rs6000/lib/tvheap_mr –L${TOTALVIEW_HOME}/rs6000/lib ${TOTALVIEW_HOME}/rs6000/lib/aix_malloctype.o

64-bit compiling (-q64):

–L ${TOTALVIEW_HOME}/rs6000/lib/tvheap_mr –L ${TOTALVIEW_HOME}/rs6000/lib ${TOTALVIEW_HOME}/rs6000/lib/aix_malloctype64_5.o

If you tire of seeing the TotalView reminder about using the Memory Debugger, add the following to ${HOME}/.tvdrc: dset TV::MEMDEBUG::hia_allow_ibm_poe equal true

Tungsten (linux-x86)

Relinking is recommended:
-L${TOTALVIEW_HOME}/linux-x86/lib -ltvheap -Wl,-rpath,${TOTALVIEW_HOME}/linux-x86/lib

Mercury (linux-ia64)

If you are using MPICH-GM, you need to build and link against a version of MPICH-GM built with disable-register for the ch_gm driver by adding: +mpich-gm-1.2.6..14b-intel90-tvdebug to your $HOME/.soft and resoft. Use the MPI compiler utilities mpicc, mpif77, for convenience.

Relinking is recommened:

-L${TOTALVIEW_HOME}/linux-ia64/lib -ltvheap -Wl,-rpath,${TOTALVIEW_HOME}/linux-ia64/lib

Cobalt (linux-ia64)

Relinking is recommened:

-L${TOTALVIEW_HOME}/linux-ia64/lib -ltvheap -Wl,-rpath,${TOTALVIEW_HOME}/linux-ia64/lib

Making sure Memory Debugging is enabled

After launching TotalView as discussed above for each platform, but before running the application within TotalView (before clicking on Go), check if the Memory Debugger is enabled by going to Tools>Memory Debugging and clicking the radio button labeled 'Enable memory debugging' on the Configuration Tab if it is not already selected. Click on the main TotalView window and click on Go, or insert some break points at areas you want to inspect the memory usage.

For more  information on using the TotalView debugger click here.