NCSA Home
Contact Us | Intranet | Search

Debugging on Copper

Contents


Overview

Debugging can be frustrating--people don't typically use debugging tools because they're fun. Look over the list of available tools above and their descriptions in the table below while considering questions like: Does the problem happen with every execution, or only when scaling up parameters? Is the program source code available? No debugging tool is perfect, and if an approach or tool shown here doesn't yield results, try another one with similar capabilities, or consider using a tool that analyzes your program from a different perspective.  For example, a Totalview session combined with the output from the source code analyzer "ftnchek" may work in concert to help pinpoint a bug. Before diving into a debugger, there are a couple things that can be done quickly.

1) If a specific error message is produced with a popular application or community code, try looking for it in the FAQ for the package or do an internet search with the application name and the error. For IBM error messages of the form NNNN-MMM, try typing the error into the search box at the top right of this page before you try a general internet search.

2) If a core file was created, the serial debugger dbx may quickly point to the problem area with a command similar to one shown here [in this example, the application is named a.out]:

  dbx a.out core		# command line dbx, type "where" at the (dbx) prompt

3) Try compiling the program with the parallel environment compilers [even if it's not MPI code] using mpxlf, mpcc or similar because the parallel environment libraries include more error trapping routines and can sometimes isolate the source code line number causing an error.  Set the MP_COREFILE_FORMAT environment variable- "setenv MP_COREFILE_FORMAT STDERR" and run the program again. It may point out the line number causing the problem.

Most of the tools available will provide more information if your code was compiled with the -g flag [and with -O3 or higher optimizations disabled] , therefore it's a good idea to rebuild your code with that flag when proceeding to use a debugger. Higher levels of optimization can lead to incorrect debugging results for values and locations of variables.


debugger / tool
description
serial, openmp/threaded, or parallel
strengths
limitations
dbx

pdbx

[serial and parallel debuggers]
dbx is a comand line symbolic debugger.  It can be use to debug serial or threaded programs.

pdbx is a version that can be used with MPI programs built for IBM's parallel environment [poe]
serial or openmp/threaded

parallel [pdbx]
best command line debugger for XL compiled code
there's a learning curve if you've never  used dbx or pdbx
idebug

[classic serial debuggers]
idebug is IBM's graphical user interface debugger and it works well with 32 bit code [NOT built with -q64 or OBJECT_MODE=64] from the XL compilers.
serial or openmp/threaded
GUI interface is easy to use
not reliable for use with 64 bit code

requires X windows

gdb


[classic serial debuggers]
The GNU gdb debugger is available and may be used with serial programs or core files from serial or parallel programs. For information on  using gdb see the online man page and the  gdb user manual serial or openmp/threaded
best debugger  for gnu compilers

can attach to running processes
gdb sometimes has difficulty with IBM XL compiled code

32 bit code only
ddd


[classic serial debuggers]
The ddd graphical interface for gdb is available on the login nodes. ddd can be used with c, c++, fortran, and perl source code. See the ddd user guide for more information and examples. serial or openmp/threaded
intuitive GUI interface

clicking on a variable in ddd will display its value
requires X windows

the interface can be slow to draw for sites far away from NCSA

32 bit code only
Totalview

[classic serial debuggers]

[parallel debuggers]
The Totalview debugger [with graphical user interface] works with the supported MPI environment. It is our recommended debugger for MPI code.

Information on starting Totalview on Copper (and other NCSA platforms) can be found here.

Totalview documentation can be found in /usr/apps/tools/toolworks/totalview/doc and on the Etnus website here.

parallel , serial, or openmp/threaded
intuitive graphical interface and debugger for use with default MPI environment

can attach to running processes
requires X windows

can debug up to 32 MPI ranks with our license
-qheapdebug

[memory allocation debuggers]
The "-qheapdebug" compiler flag for the c compilers enables debug versions of memory management functions like malloc().
serial, openmp/threaded, or parallel
scales with MPI or threads

can run in batch mode

c/c++ only, no fortran support

may slow program execution, disable when not debugging

recompile/relink required

Parallel Environment debug options

[MPI specific debugging tools]
The Parallel Environment provides environment variable settings and flags to support debugging MPI code. Output may be labeled by rank, and you can request extra runtime checking of MPI function arguments for correctness. parallel can run in batch mode

scales with MPI

no recompile/relink required
may not find MPI deadlocks
Marmot MPI check libraries

[MPI specific debugging tools]

These libraries can be linked with your program to provide runtime checks for common MPI programming problems and MPI deadlock detection.  This debugging aid can scale with your MPI application to the maximum number of processes you can employ.  It's a good option when bugs appear when running at scalle


parallel
can run in batch mode

can find MPI deadlocks

scales with MPI
may not find some bugs

recompile/relink required
floating point exceptions

[general purpose tools and techniques]

The techniques for trapping floating point exceptions [which are not always bugs] vary by compiler and operating system.  Since each case is a little different, see the link at left for the examples that match your situation.  Fortran compilers tend to have a more straightforward approach to floating point exceptions than c compilers.


serial , openmp/threaded, or parallel
scales with MPI

useful with batch mode

minimal performance impact for most codes
floating point exceptions are not necessarily bugs

recompile/relink required
source code analysis tools

[general purpose tools and techniques]
lint [for c code] and ftnchek [fortran] are available on tungsten.  While not true debuggers, source code analysis tools can be very helpful when trying to track down a program bug.  They can also help you write clean maintainable code by providing useful feedback about coding style, unused variables, non-portable practices, ...   Don't be alarmed by the number of warnings generated by these tools, they're designed to detect a great variety of potential problems. serial , openmp/threaded, or parallel can pinpoint problem areas of source code

can search for problems without running code

ftncheck can generate a call graph
source code required
truss


[general purpose tools and techniques]
truss produces a system call trace for any program you can run.

Each line in the trace contains the system call name as used by your program, followed by  its arguments in parentheses and its return value.

If you know what sort of system call may be failing, truss can be quite powerful.

For more information, see-
man truss


serial , openmp/threaded, or parallel
scales with MPI

can be used without source code

guaranteed to produce some output

can attach to running processes

-c option can do profiling for your code's system calls

recompile/relink not required
extremely verbose output



References