NCSA Home
Contact Us Intranet

Debugging on Ember

User Information Home
Data
Security
Allocations
Consulting
Training

NCSA's Help Desk is available 24 hours a day, seven days a week, 365 days a year:
help.ncsa.illinois.edu
217-244-0710
help@ncsa.illinois.edu

Contents


Overview

Debugging can be frustrating. People don't typically use debugging tools because they are fun. Review the list of available tools above and their descriptions in the table below while considering questions such as: Does the problem happen with every execution, or only when scaling up parameters? Is the program source code available? No debugging tool is perfect, and if an approach or tool shown here doesn't yield results, try another one with similar capabilities. Or, consider using a tool that analyzes your program from a different perspective. For example, a TotalView session combined with the output from the source code analyzer "ftnchek" may work in concert to help pinpoint a bug. Before diving into a debugger, there are a couple things that can be done quickly.

1) If a specific error message is produced with a popular application or community code, try looking for it in the FAQ for the package or do an Internet search with the application name and the error.

2) If a core file was created, the serial debuggers may quickly point to the problem area with a command similar to one shown here [in these examples, the application is named a.out]:

gdb a.out core		# command line gdb
idbc a.out core # command line idb
ddd a.out core # ddd graphical debugger
idb a.out core # idb graphical debugger

3) If the program is written in Fortran, try recompiling with the Inte l Fortran compiler flags: "-check bounds -traceback -g". Then run the program again.

Most of the tools available will provide more information if your code was compiled with the -g flag [and with -O3 or higher optimizations disabled]; therefore, it is a good idea to rebuild your code with that flag when proceeding to use a debugger. Higher levels of optimization can lead to incorrect debugging results for values and locations of variables.



[general purpose tools and techniques]
Debugger / Tool Description Serial, OpenMP / Threaded, or Parallel Strengths Limitations
gdb


[classic serial debuggers]

The GNU gdb debugger is available and may be used with serial programs or core files from serial or parallel programs. For information on  using gdb see the online man page and the  gdb user manual

serial or openmp/threaded
best debugger  for gnu compilers

can attach to running processes
gdb sometimes has difficulty with Intel compiled code
ddd


[classic serial debuggers]

The ddd graphical interface for gdb is available on the login node. ddd can be used with c, c++, fortran, and perl source code. See the ddd user guide for more information and examples.

serial or openmp/threaded
intuitive GUI interface

clicking on a variable in ddd will display its value
requires X windows

the interface can be slow to draw for sites far away from NCSA
idbc

idb (GUI interface version of idbc)

[classic serial debuggers]

The Intel debugger idbc is installed along with the Intel compilers.For more information on the Intel debugger, see : idbc -help . idbc is similar to gdb in operation and it will recognize most gdb commands if started with the -gdb flag. idbc works well with C, C++, and Fortran codes.

serial or openmp/threaded
best debugger for Intel compiled code

can debug Fortran 77, 90, and 95

can attach to running processes
idbc default interface is dbx, use "idbc -gdb" for the gdb compatible interface
Totalview


[classic serial debuggers]

[parallel debuggers]

The Totalview debugger [with graphical user interface] works with the supported MPI environment. It is our recommended debugger for MPI code.

Information on starting Totalview on Ember (and other NCSA platforms) can be found here.

Totalview documentation can be found in /usr/apps/tools/toolworks
/totalview/doc
and on the Etnus website here.

parallel , serial, or openmp/threaded
intuitive graphical interface and debugger for use with default MPI environment

can attach to running processes

also has command-line interface (tv8cli)

can debug up to 128 MPI ranks with our license
Valgrind


[memory allocation debuggers]

Valgrind can detect memory management bugs and threading bugs in c/c++ code. Since valgrind can add lots of diagnostic information to output and slow execution, it's best used only when debugging. 

serial , openmp/threaded
can run in batch mode

recompile/relink not needed
can slow execution

potentially verbose output
c/c++ only, no fortran
MALLOC_CHECK_

[memory allocation debuggers]

C/C++ programs using malloc(), calloc(), or realloc() can set the  MALLOC_CHEC_ env. variable.  From "man malloc": If MALLOC_CHECK_ is set to 0, any detected heap corruption is silently ignored; if set to 1, a diagnostic is printed on stderr; if set to 2, abort() is called immediately.

serial, openmp/threaded, or parallel can run in batch mode

environment setting, recompile/relink not required
c/c++ only, not fortran

can slow performance, so leave MALLOC_CHECK_ unset for production runs
MPI_CHECK_ARGS

[MPI specific debugging tools]

Set the MPI_CHECK_ARGS environment variable to enable runtime checks of MPI function calls. MPI_CHECK_ARGS (toggle) Enables checking of MPI function arguments. Segmentation faults might occur if bad arguments are passed to MPI, so this is useful for debugging purposes. Using argument checking adds several microseconds to latency. Default: Not enabled

parallel
can run in batch mode

can find MPI programming errors

scales with MPI
may not find some bugs

no recompile/relink required
floating point exceptions

[general purpose tools and techniques]

The techniques for trapping floating point exceptions [which are not always bugs] vary by compiler and operating system.  Since each case is a little different, see the link at left for the examples that match your situation.  Fortran compilers tend to have a more straightforward approach to floating point exceptions than c compilers.

serial , openmp/threaded, or parallel
scales with MPI

useful with batch mode

minimal performance impact for most codes
floating point exceptions are not necessarily bugs

recompile/relink required
source code analysis tools

[general purpose tools and techniques]

Splint [for c code] and ftnchek [fortran] are available on ember.  While not true debuggers, source code analysis tools can be very helpful when trying to track down a program bug.  They can also help you write clean maintainable code by providing useful feedback about coding style, unused variables, non-portable practices, ...   Don't be alarmed by the number of warnings generated by these tools, they're designed to detect a great variety of potential problems.

serial , openmp/threaded, or parallel can pinpoint problem areas of source code

can search for problems without running code

ftncheck can generate a call graph
source code required
strace


[general purpose tools and techniques]

Strace produces a system call trace for any program you can run.

Each line in the trace contains the system call name as used by your program, followed by  its arguments in parentheses and its return value.

If you know what sort of system call may be failing, strace can be quite powerful.

For more information, see-

man strace
serial , openmp/threaded, or parallel
scales with MPI

can be used without source code

guaranteed to produce some output

can attach to running processes

-c option can do profiling for your code's system calls

recompile/relink not required
extremely verbose output


ltrace


[general purpose tools and techniques]

ltrace produces a library call trace for any program you can run.

Each line in the trace contains the library call name as used by your program, followed by  its arguments in parentheses and its return value.

If you know what sort of library call may be failing, ltrace can be quite powerful.

For more information, see-

man ltrace
serial , openmp/threaded, or parallel
scales with MPI

can be used without source code

guaranteed to produce some output

can attach to running processes

-c option can do profiling for your code's library calls

recompile/relink not required
extremely verbose output

can dramatically slow program execution


References