Run the program with /usr/bin/time and compare the user and elapsed time reported by /usr/bin/time and note also the %CPU [comparing cputime and wall clock time from "qhist jobid" would also work]. Here's sample output from /usr/bin/time for a job that requested 4 cpus and ran with 4 threads.
The batch script contained a line like this:
/usr/bin/time a.out
The stderr file contained this output from /usr/bin/time:
176.04user 0.28system 0:44.93elapsed 392%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+4569minor)pagefaults 0swaps
Note that 176.04user/44.93elapsed=3.918 which corresponds to the 392 %CPU. The sample program used 392/400 or about 98% of its 4 cpus.
Qhist for the job showed:
...
Usage:
wall clock: 00:00:56
cputime: 00:02:59
That works out to be (2min*60sec+59sec)/56sec = 3.20 . The wall clock time for a short job contains some batch system overhead, so the ratio is a little smaller than it should be for the test job, but the qhist data confirm that >3 cpus were utilized.
This can happen when, at times, the scheduler attempts to start the job
even though sufficient resources to run the job are unavailable. The job
should start as soon as the resources are available.
Use /usr/bin/mpirun with totalview and see if that fixes it for you. Also,
take a look at the
gdb web page.