NCSA Home
Contact Us | Intranet | Search

IBM pSeries 690 Debugging, Performance, and Timing FAQ

 

How can I verify that my openmp or threaded program is really using all the threads I expect?

    Use ps to find the process id of your program, then use the special ps options below to show the threads and utilization.
    Cu12:% ps -eaf|grep a.out
     arnoldg 1646806 1491002   0 15:38:04  pts/5  0:00 grep a.out
     arnoldg 2154738 1310950 974 15:37:57 pts/18  1:31 ./a.out
    Cu12:% ps -m -o THREAD -p 2154738
        USER     PID    PPID      TID ST  CP PRI SC    WCHAN        F     TT BND COMMAND
     arnoldg 2154738 1310950        - A  1920  36 15        *   200001 pts/18   - ./a.out
           -       -       -  2187497 R  120  36  1        -   400000      -   - -
           -       -       -  2760845 R  120  36  1        -   400000      -   - -
           -       -       -  3203205 R  120  36  1 2020a5b0   c00000      -   - -
           -       -       -  3473513 R  120  36  1        -   400000      -   - -
           -       -       -  3514471 R  120  36  1        -   400000      -   - -
           -       -       -  3793105 R  120  36  1        -   400000      -   - -
           -       -       -  4202571 R  120  36  1        -   400000      -   - -
           -       -       -  4210911 R  120  36  1        -   400000      -   - -
           -       -       -  4219045 R  120  36  1        -   400000      -   - -
           -       -       -  5308641 R  120  36  1        -   400000      -   - -
           -       -       -  5333211 R  120  36  1 2020a5b0   c00000      -   - -
           -       -       -  5423199 R  120  36  1        -   400000      -   - -
           -       -       -  5439669 R  120  36  1 2020a5b0   c00010      -   - -
           -       -       -  5456063 R  120  36  0        -   400000      -   - -
           -       -       -  5578907 R  120  36  1        -   400000      -   - -
           -       -       -  5824681 R  120  36  1 2020a5b0   400000      -   - -
    
    Cu12:~239% ps -m -o pcpu,thcount,pid,tid,cpu -p 1622126
     %CPU THCNT     PID TID  CP
     87.5    16 1622126   - 1920
        -     -       - 3178537 120
        -     -       - 3211443 120
        -     -       - 3473551 120
        -     -       - 3735735 120
        -     -       - 3801255 120
        -     -       - 3833959 120
        -     -       - 5185595 120
        -     -       - 5234729 120
        -     -       - 5316839 120
        -     -       - 5374173 120
        -     -       - 5406749 120
        -     -       - 5415025 120
        -     -       - 5521643 120
        -     -       - 5603467 120
        -     -       - 5701753 120
        -     -       - 6029345 120
                              # ^^^ non-zero values indicate high cpu utilization
    			  #     zero values would indicate threads waiting on i/o
    
    See also: llq -w

How do I find out how much memory my program used?

    You can use the llhist command.
     
    Or you can use llsummary to see the maximum resident set size [maxrss] used by your program and all of its processes and threads. The maxrss field is shown in 1k units, so a command like this will yield the maximum megabytes or gigabytes used:
    Cu12:~/loadleveler315% expr `llsummary -l -j cu12.4123 | grep 'Step maxrss'|cut -d':' -f2` / 1024
    229051  # megabytes
    Cu12:~/loadleveler316% expr `llsummary -l -j cu12.4123 | grep 'Step maxrss'|cut -d':' -f2` / 1048576
    223     # gigabytes
    
    For programs running on the interactive machine, try "ps u" and look at the RSS field [like llsummary, the units are 1k pages]:
    Cu12:~/loadleveler329% ps u
    USER         PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
    arnoldg  3686468  4.2  7.0 887704 887732 pts/17 A    11:45:17  0:02 ./malloctest
    Cu12:~/loadleveler330% expr 887732 / 1024
    866  # megabytes
    

I see Segmentation fault (core dumped) on a program I know to be correct and debugged.

    It's possible that the stacksize may not be large enough. If your program is 32-bit (the default), try relinking with -bmaxstack:0x80000000 to increase the maximum stacksize to 2 Gbytes. If that does not work, try compiling with -q64 to enable 64-bit addressing.

    If your program is 64-bit and you are trying to run interactively, the program may need more memory than is available: try running in a batch job. If you are already running within a batch job, try setting a higher value for ConsumableMemory.

Program exits with: Error encountered while attempting to allocate a data object.

    Datasize may not be large enough. If your program is 32-bit (the default), try relinking with -bmaxdata:0x80000000 to increase the maximum datasize to 2 Gbytes. If that does not work, try compiling with -q64 to enable 64-bit addressing.

    If your program is 64-bit and you are trying to run interactively, the program may need more memory than is available: try running in a batch job. If you are already running within a batch job, try setting a higher value for ConsumableMemory.

How do I tell which cpu(s) my code is running on?

    There is an undocumented function, mycpu, that returns an integer from 0 to N-1, where N is the number of cpus on the node.
    C
    int i;
    i = mycpu();
    Fortran
    INTEGER i
    i = MYCPU()

Back to top