NCSA Home
Contact Us Intranet

Totalview Debugger

User Information Home
Data
Security
Allocations
Consulting
Training

NCSA's Help Desk is available 24 hours a day, seven days a week, 365 days a year:
help.ncsa.illinois.edu
217-244-0710
help@ncsa.illinois.edu

Table of Contents
  1. Overview
  2. TotalView on NCSA Linux Clusters
  3. TotalView on NCSA Shared Memory Machines
  4. General TotalView usage
  5. Using the command line interface (CLI)
  6. Enabling Memory Debugging

1. Overview

TotalView is a full-featured, source-level, graphical debugger for C, C++, and Fortran (77 and 90), assembler, and mixed source/assembler codes based on the X Window System from Etnus. TotalView supports MPI, PVM and HPF.

Information on TotalView is available in the release notes and user guide at the Etnus Online Documentation page. Also see "man totalview" for command syntax and options.

Note: In order to use TotalView, you must be using a terminal or workstation capable of displaying X Windows. See Using the X Window System for more information.

2. TotalView on Linux Clusters

TotalView is available on NCSA's Linux Clusters. On Abe there is a 384 token TotalView license and you only checkout the number of licenses you need. . We do not currently have a way to guarantee you will get a license when your job starts if you run in batch.

GNU and Intel compilers are supported.

Important: For both compilers you need to compile and link your code with -g to enable source code listing within TotalView.

2.1 Abe (Intel 64 Linux Cluster)

TotalView is supported on Abe.

Before you begin
  1. Compile and link your code with the compiler/linker flags '-O -g' to provide symbolic debug information and predictable TotalView behavior with the Intel  compilers.
  2. Add +totalview to your .soft file in your HOME directory and issue the resoft command. This will add TotalView to your environment.
  3. Make sure you have your DISPLAY environment variable set correctly. See the discussion on Using the X11 Windows System and/or Running from an interactive batch session

Serial Code Debugging

If the memory requirements of your code fit within the limits of a shell on a front-end host (one of the honest nodes), you can run TotalView directly. If not, you will need to run on a compute node via an interactive batch session.

From the login front-end host you start the TotalView debugging session with the following command

% totalview ./program.exe [program args]

If you do not see the TotalView process manager window, you should first consult the Using the X Window System page.

MPI Debugging MVAPICH2

MVAPICH2-1.2 is built with TotalView debugging enabled.

First, start an interactive batch session with the number of nodes/processes needed for debugging your application.

Start up the mpd processes in this session as you would do for a batch job. See the sample batch job file for MVAPICH2.

Now start your application with the debugger: :

% mpiexec -tv -n XX ./program.exe [program args]

where XX is the number of processes needed. mpiexec should connect to the mpd console of the launch host.

When done debugging, issue the command mpdexitall.

MPI Debugging with Open MPI

First, add +openmpi-1.2.4-intel or +openmpi-1.2.4-gcc to your ~/.soft file, resoft and build your application if you are not already using Open MPI.

NOTE: Version of Open MPI prior to 1.2.4 will not work with TotalView 8.

Next, start an interactive batch session with the number of nodes/processes needed for debugging your application.

Now start your application with the debugger:

% mpirun -tv -np XX -machinefile ${PBS_NODEFILE} ./program.exe [program args]

where XX is the number of processes needed.

For more information, please see the discussion here for more information on using TotalView with Open MPI.

MPI Debugging MPICH-VMI

First, start an interactive batch session with the number of nodes/processes needed for debugging your application.

Once you are ready to debug, MPICH-VMI provides a TotalView switch to the mpirun script that enables the 'attach to a paused process method'. This is different than the other methods of using Totalview with an MPI application but just as valid.

% mpirun -np XX -machinefile $PBS_NODEFILE -debugger totalview ./program.exe [program args]

where XX is the number of processes needed. VMI with then report that the process is waiting for 300 seconds and also provide the host and process PID of the application that totalview should attach to.
Connect to TotalView on Host: 10.1.68.172 PID: 12673. Waiting for 300 Seconds
for example.
In another window, ssh -X (with tunnelling) to the Host ip that reported by VMI. Next, change directory to the location where you ran your application. Finally start totalview with the PID and program name:

% totalview -pid PID ./program.exe

and totalview will start-up, attach the process given by PID and use ./program.exe to get the symbol information from.

You should now be able to use TotalView.

Running from a PBS Interactive batch session

The Torque qsub command now supports the use of X11 tunnelling directly to the launch host for interactive batch sessions via the -X switch.

First, be sure you enabled ssh tunneing in the ssh session you used to connect to one of the honest hosts. Second, submit an interactive batch job with the -X switch:

qsub -I -X -V -lwalltime=00:30:00 -lnodes=2:ppn=8
Finally, once PBS has put you on the launch host, you need only use one of the above MPI start-up sequences to start debugging.

3. TotalView on NCSA Shared Memory Machines

TotalView is available on NCSA's shared memory machines.

3.1 Ember (SGI UV)

There are currently 384 licenses being shared with x86_64 based machines (Ember and Abe/Lincoln). We do not currently have a way to guarantee you will get a license when your job starts if you run in batch.

Before you begin
  1. Compile and link your code with the compiler/linker flags '-O0 -g' to provide symbolic debug information and predictable TotalView behavior with the Intel compilers.
  2. Add module --silent add totalview-8.8.0-1 to your .profile or .login file in your HOME directory and issue the command module add totalview-8.8.0-1. This will add TotalView to your environment.
Serial and OpenMP Debugging

If the cpu time, thread count and memory requirements of your code fit within the limits of a shell on the front-end machine (ember), you can run TotalView directly. If not, you will need to run via an interactive batch session.

On the login host ember, you start the TotalView debugging session with the following command
% totalview ./program.exe [ program args ]
MPI Debugging with MPT

If the cpu time, MPI task count and memory requirements of your code fit within the limits of a shell on the front-end machine (ember), you can run TotalView directly. If not, you will need to run via an interactive batch session.

% totalview mpirun -a [ mpirun arguments ] ./program.exe  [ program args ] 
For example, here is how you would run a code called xhpl with 4 processors:
% totalview mpirun -a -np 4 ./xhpl

Proceed to the usual MPI debugging process section below.

Running from a PBS batch session

Under construction.

Using TotalView Remote Display Client

You will need to get the client appropriate to you desktop system. There are clients for Windows, Mac OS X, Linux x86 and Linux x86_64. See the TotalViewTech Download page for a list of the available systems with links to their ftp site for each platform.

See the Using the Remote Display Client for instructions on installation.

Once the TotalView Remote Viewer is installed on your desktop, you need to start the client and configure it to connect to Ember.

The following images show 3 different configurations.

  1. Debugging on ember login node
  2. Debugging sequential batch
  3. Debugging mpi batch
When editing your settings replace gbauer with your username.
Debugging on the Ember front-end or logon node

After making the changes shown in Debugging on ember login node, click on the "Launch Debug Session" button at the bottom of the window.

Depending on which client you use, you may see a sequence of two terminals (term1, term2) that are needed to start the VNC server on Ember. You will need to login to Ember in those terminals (the first one will cause the second one to appear). If you use keys you may need to use the passphrase for the key or your kerberos password. Do not close those terminals until you are done with the session.

Next, a series of RBD windows may appear (RFBConnection, RFBclose) while the VNC desktop and TotalView GUI appear. As above, do not close any terminals, windows etc associated with the session until done with the session.

You are now able to use TotalView as you would directly on the login node.

Serial or core
Follow the GUI buttons and fields to run a sequential application or debug a core or process running on the login node.

MPI
First, select the MPI application you wish to debug. Then click on the Parallel tab to specify the Parallel System as MPT, select the number of MPI tasks. Keep the number down to 4 or so as there are only 12 processors in the logon CPU set. Then click Ok to proceed with the usual MPI debugging process.

Debugging in a batch job
Under construction.

3.2 Cobalt (SGI Altix)

There are 32 TotalView licenses for jobs up to 32 processes. We do not currently have a way to guarantee you will get a license when your job starts if you run in batch.

Before you begin
  1. Compile and link your code with the compiler/linker flags '-O0 -g' to provide symbolic debug information and predictable TotalView behavior with the Intel compilers.
  2. Add +totalview to your .soft file in your HOME directory and issue the resoft command. This will add TotalView to your environment.
Serial and OpenMP Debugging

If the memory requirements of your code fit within the limits of a shell on the front-end machine (cobalt), you can run TotalView directly. If not, you will need to run via an interactive batch session.

On the interactive host co-login1, you start the TotalView debugging session with the following command
% totalview ./program.exe [ program args ]
MPI Debugging with MPT

There is currently an issue with MPT 1.23 (default MPI) and breakpoints. When debugging your application, please rebuild with MPT 1.25 by adding it to your environment:
% soft add +sgi-mpt-1.25
The MPT include files and libraries will be added automatically.

Due to some changes in shell limits and the MPI_MEMMAP feature and the use of the mpirun wrapper, you need to disable MPI_MEMMAP for the shell:
% setenv MPI_MEMMAP_OFF
or
% export MPI_MEMMAP_OFF

% totalview mpirun -a [ mpirun arguments ] ./program.exe  [ program args ] 
For example, here is how you would run a code called xhpl with 4 processors:
% totalview mpirun -a -np 4 ./xhpl

Running from a PBS Interactive batch session

For an interactive batch session you need to specify the number of cpus, the wall clock time and memory you will need. The example below asks for 4 cpus for 30 minutes and 2gb of memory:

% qsub -I -V -lwalltime=00:30:00 -lncpus=4 -lmem=2gb
When the session begins, it will startup a shell on launch node.

There are two options once you have a session started: using X11 tunneling with ssh or setting environment variables. The preferred way is to use X11 tunnelling with ssh but the PBS batch system does not use ssh to put the user on the compute node.

Setting Environment Variables. This is described in the Using the X Window System page. In this mode you need to set your DISPLAY variable to the X display of your local machine.

X11 tunneling. If you specify the debug queue via the -qdebug option to qsub, your interactive batch job will be run on the login host. Since the DISPLAY variable is set correctly by specifiying -V, the job is ready for TotalView debugging.

4. General TotalView usage

Serial and OpenMP Debugging

As TotalView starts up, you will see two windows appear: the Control window and the Process window. In the Process window you can start inserting breakpoints etc and then click on the GO button. Happy debugging.

MPI Debugging

As TotalView starts up, you will see two windows appear: the Control window and the Process window. In the Process window click on the GO button and the when prompted by the window "Process XXX is a parallel job. Do you want to stop the job now ?", click "Yes". You will arrive at the MPI_Init() breakpoint as shown here for a code using SGI's MPT. You are now ready to debug in parallel.

Note
: If you are debugging a code using MPICH-GM on Tungsten you will want to insert a breakpoint at somepoint after the call to MPI_Init() as the builtin breakpoint for MPI_Init() does not appear fully functional.

Some comments from Etnus about breakpoints and MPI_Init:
"Be very cautious in placing breakpoints at or before a line that calls MPI_Init() or MPL_Init() because timeouts can occur while your program is being initialized. After you allow the parallel processes to proceed into the MPI_Init() or MPL_Init() call, allow all of the parallel processes to proceed through it within a short time."

"Timeouts can occur if you place breakpoints that stop other processes too soon after calling MPI_Init() or MPL_Init(). If you create "stop all" breakpoints, the first process that gets to the breakpoint stops all the other parallel processes that have not yet arrived at the breakpoint. This can cause a timeout."

More on Breakpoints

To get all processes to stop at the same action point (see breakpoint) instead of stopping the group of processes as a whole when the current process hits the action point: go to File -> Preferences -> Action Points and select "When breakpoint hit, stop:  Process" rather than Group. You can also set this preference on an individual basis by opening the properties dialog for each individual breakpoint (right click on action point and select Properties).

5. Using the command line interface (CLI)

Using the TotalView command line interface with SGI MPT applications

Put TotalView in your environment:

soft add +totalview

Launch TotalView using the CLI

totalviewcli /usr/bin/mpirun -a -np 4 ./mpihw

For more information on using the CLI, consult the following Etnus pages

6. Enabling Memory Debugging

For all platforms, be sure to add +totalview to your ${HOME}/.soft file and issue the resoft command. Add the following  additions to your linking step and then see that last paragraph for how to check that Memory Debugging is enabled.

Cobalt (linux-ia64)

Relinking is recommened:

-L${TOTALVIEW_HOME}/linux-ia64/lib -ltvheap -Wl,-rpath,${TOTALVIEW_HOME}/linux-ia64/lib

Making sure Memory Debugging is enabled

After launching TotalView as discussed above for each platform, but before running the application within TotalView (before clicking on Go), check if the Memory Debugger is enabled by going to Tools>Memory Debugging and clicking the radio button labeled 'Enable memory debugging' on the Configuration Tab if it is not already selected. Click on the main TotalView window and click on Go, or insert some break points at areas you want to inspect the memory usage.

For more  information on using the TotalView debugger click here.