NCSA Home
Contact Us Intranet

Intel 64 Linux Cluster Running Jobs FAQ

User Information Home
Data
Security
Allocations
Consulting
Training

NCSA's Help Desk is available 24 hours a day, seven days a week, 365 days a year:
help.ncsa.illinois.edu
217-244-0710
help@ncsa.illinois.edu

 

I have a job in the queue and it's not running yet--why?

    At NCSA, we run the same job scheduling package with all of our batch systems. The scheduling of jobs is based on first-in, first-out [FIFO] but with some important modifications to make sure that all jobs and users get a fair chance to run. Some of the various scenarios you may observe are explained here.

    Q: Why are many machines or resources currently idle?

    A: In the time period before a large cpu-count job starts, it always looks this way. A large cpu-count job will probably be starting very soon and the idle resources will be used once again.

    Q: User X just submitted a job like mine after my job and their job has started, why?

    A: Our scheduler "remembers" how many jobs have been run by each user for a while. If User X has run few jobs recently, and you've run more, then User X will get more priority so that they can catch up and everyone gets a fair-share of the system.

    The scheduler will also backfill smaller and shorter jobs to utilize any idle resources.

    Q: How can I get a test, debug, or any of my jobs to start sooner--I really need to try something out now!

    A: The debug queue is available for small cpu jobs. In addition, because the scheduler has to let resources sit idle before starting any large [many cpu] job, you can try to take advantage of that situation by submitting a job that requests short walltime.

    Q: I don't understand why my job hasn't started. How can anyone get work done when the queues are so busy?

    A: See the output of the qs command. It shows the queued jobs along with the time they've been waiting in the queue. You can compare your jobs to jobs that have waited a similar time and determine that other users' experience is in common with yours.

I have an account and can login, but why do I get the message saying I have no accounts that can be used for batch jobs on this system when I try to submit jobs?

    This is probably because your account is expired or overused. You can use the batch_accts command to check your accounts.

    You can also check details of your allocation and usage on the TeraGrid Portal or using the TeraGrid tgusage command.

I get the error message "qsub: illegal -N value " when I submit a PBS job. What does this mean?

    This error message occurs when the jobname specified in the -N option is > 15 characters long OR if the first character in the name is non-alphabetic. From the qsub man page:

    -N name  Declares  a name for the job.  The name specified may be up to
              and including 15 characters in length.   It  must  consist  of
              printable, non white space characters with the first character
              alphabetic.
    
    If your batch script does not specify the -N option, the name of the batch script is used as the jobname, so the above limitations will apply for the batch script name in this case.

I get a warning message "Warning: no access to tty (Bad file descriptor). Thus no job control in this shell." in my stdout file when I run a batch job.

    This message is harmless and can safely be ignored. It just means that there is no interactive shell access for this job.

How do I enable X11 forwarding for an interactive batch session?

    Use the "-X" option/flag on your qsub line.

    Ex.
         qsub -I -V -X -l walltime=00:05:00,nodes=1:ppn=2

I got an error message: SEEK_SET is #defined but must not be for the C++ binding of MPI. What does that mean?

    Users may get such error messages when using +mvapich2-intel and +openmpi-1.2-intel. The problem is that both stdio.h and the MPI C++ interface use SEEK_SET, SEEK_CUR, and SEEK_END. You can try adding

        #undef SEEK_SET
        #undef SEEK_END
        #undef SEEK_CUR
    
    before mpi.h is included, or add the definition -DMPICH_IGNORE_CXX_SEEK for +mvapich2-intel, or -DOMPI_IGNORE_CXX_SEEK for +openmpi-1.2-intel to the command line (this will cause the MPI versions of SEEK_SET etc. to be skipped). (Please also refer to the MPICH2 FAQ.)

Errors in Nullcomm::Clone with C++.

    For +mvapich2-intel users, particularly with older C++ compilers, may see error messages of the form

    "error C2555: 'MPI::Nullcomm::Clone' : overriding virtual function differs from
    'MPI::Comm::Clone' only by return type or calling convention".
    
    This is caused by the compiler not implementing part of the C++ standard. To work around this problem, add the definition
         -DHAVE_NO_VARIABLE_RETURN_TYPE_SUPPORT
    
    to the CXXFLAGS variable or add a
         #define HAVE_NO_VARIABLE_RETURN_TYPE_SUPPORT 1
    
    before including mpi.h. (Please also refer to the MPICH2 FAQ)

My job runs ok in small scale, but fails with segmentation fault when it goes to larger scale. What may be wrong?

    As you scale out, MPI is going to need more memory for buffers [depending on your communication pattern]. The application may also need more memory at scale. You may try some tests with ppn=6 or ppn=4 instead of ppn=8, or specify the himem resource so that each process can access more memory.

My MPICH-VMI job fails, but does not provide any error messages. How can I diagnose the problem?

    MPICH-VMI provides the following verbose options to mpirun that may be helpful:

        -v                  Verbose level 1         MPIRUN verbose & VMI startup
        -vv                 Verbose level 2         Warning messages
        -vvv                Verbose level 3         Error messages
        -vvvv               Verbose level 10:       Excess Debug (Everything)
    

I am using mvapich2-intel; why did I get this error message: mpiexec_abe1134: cannot connect to local mpd (/tmp/mpd2.console_userid)?

    To run jobs with mvapich2-intel, please make sure you are using the sample batch script from /usr/local/doc/batch_scripts. The mpd needs to be set otherwise you might get such kind of error message.

    If you also get the error message failed to ping mpd on abe0159(or other node); recvd output={} together with the above error message, please check the .mpd.conf file in your home directory, it needs to have at least one letter for MPD_SECRETWORD.

My default shell is bash; Why do my MVAPICH2 jobs fail with the error message: mpdboot_abe0295 (handle_mpd_output 406): from mpd on abe0293, invalid port info: no_port?

    If users with default bash environmets, run "mvapich2-start-mpd" in a batch job that uses two or more nodes and the $HOME/.bashrc file is non existent or the default content (see "/usr/local/etc/skel/.bashrc") has been removed or commented out, this error occurs. To resolve the issue, place a copy of "/usr/local/etc/skel/.bashrc" in your $HOME directory or prepend the contents to your already existing file.

I got error messages like: Unable to allocate QP. Unable to get response data from sock_cm! OPENFABRIC Device(fatal):VMI_Buffer_Allocate(): Error registering memory region. when I scale out with VMI. What do those error messages mean?

    These error messages are likely to be symptoms of running nodes out of memory. Please keep in mind that MPI itself needs some memory and might need more for larger scale runs. Please try with ppn=6 or ppn=4, or specify the himem resource.

My job got error messages with both OpenMPI and MVAPICH2: Error: Unsupported datatype passed to ADIOI_Count_contiguous_blocks, [aim-nano_02:22229] MPI_ABORT invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 1. What does it mean?

    One possible reason for getting the error is because MPI treats some data as first-class MPI-2 derived datatypes, whereas ROMIO assumes that they are built on top of MPI-1 derived datatypes. You may want to try VMI2 which is built on top of MPI-1.

My job got an error message while writing output Error: No space left on device. What does it mean?

    The Lustre filesystem is served by several file servers, and if one of them is full, you may see that message even though the filesystem shows available space. The message comes from the errno.h system header file:

    [username@honest2 ~]$ grep ENOSPC /usr/include/asm-x86_64/errno.h
    #define ENOSPC          28      /* No space left on device */
    
    You may observe a file server [OST in the output below] close to capacity after you've seen that error:
    [username@honest2 ~]$ lfs df /cfs/scratch
    UUID                 1K-blocks      Used Available  Use% Mounted on
    scr1_UUID            186017976  11347984 174669992    6% /cfs/scratch[MDT:0]
    ost1_UUID            5676562992 2151041964 3525521028   37% /cfs/scratch[OST:0]
    ost2_UUID            5676562992 2202281616 3474281376   38% /cfs/scratch[OST:1]
    ost3_UUID            5676562992 2895861636 2780701356   51% /cfs/scratch[OST:2]
    ost4_UUID            5676562992 2117350856 3559212136   37% /cfs/scratch[OST:3]
    ost5_UUID            5676562992 2117267876 3559295116   37% /cfs/scratch[OST:4]
    ost6_UUID            5676562992 2131391836 3545171156   37% /cfs/scratch[OST:5]
    ost7_UUID            5676562992 2124743176 3551819816   37% /cfs/scratch[OST:6]
    ost8_UUID            5676562992 2148473868 3528089124   37% /cfs/scratch[OST:7]
    ost9_UUID            5676562992 2232595836 3443967156   39% /cfs/scratch[OST:8]
    ost10_UUID           5676562992 2326244020 3350318972   40% /cfs/scratch[OST:9]
    ost11_UUID           5676562992 2334170504 3342392488   41% /cfs/scratch[OST:10]
    ost12_UUID           5676562992 2229796680 3446766312   39% /cfs/scratch[OST:11]
    ost13_UUID           5676562992 2162607760 3513955232   38% /cfs/scratch[OST:12]
    ost14_UUID           5676562992 2299783112 3376779880   40% /cfs/scratch[OST:13]
    ost15_UUID           5676562992 2163394280 3513168712   38% /cfs/scratch[OST:14]
    ost16_UUID           5676562992 2376409332 3300153660   41% /cfs/scratch[OST:15]
    filesystem summary:  90825007872 36013414352 54811593520   39% /cfs/scratch
    
    In any event, if you see that error, please report it to consult@ncsa.uiuc.edu.

Why did my MPI code run np (specified in mpirun) instances of my code on process 0 instead of running on np cores?

    We have found that mixing MPI implementations (building your code with one, and running in an environment of another implementation) can cause this to happen. Please verify that your environment, build, and batch script commands are consistent.

I have a serial program, and I want to run multiple simulations with it on a set of machines as one batch job. How can I do that?

    This job script is an example of how you can run a serial program or command concurrently on a set of machines using ssh. Note, in order to make efficient use of the machines, it's important that the instances of your program on each machine complete in about the same time. Otherwise, machines that finish early will be idle and wasting resources.

    #!/bin/sh
    #PBS -l nodes=4:ppn=7      # 4 machines , 7 cpus per machine [28 cpus ]
    #PBS -l walltime=01:00:00  # Specify job run time limit of 1 hour
    #PBS -A abc                # Charge job to project abc (recommended for users
                               # with multiple projects)
    #PBS -o testjob.out        # Store the standard output and standard error of the
                               # job in file testjob.out
    #PBS -m e                  # Send mail when job terminates (optional)
    #PBS -N testjob            # Specify job name (optional)
    
    
    # This shell script would run a command or set of commands for you on each
    # machine in your job.
    
    for host in `cat $PBS_NODEFILE | uniq`
    do
             ( ssh -a -q -x $host "$HOME/bin/a.out.sh $SCR" ) &
                                 # ^^^^^^^^^^^^^^^^^^^^^^ your commands in quotes
    done
    wait   # waits for all the outstanding ssh subshells above to complete
    
    

    The a.out.sh script could resemble the one below if you wanted to run multiple sets of the same serial program on each machine to use the available cpus.

    #!/bin/sh
    
    N=7    # run this many copies per host
    SCR=$1
    PROGRAM="${HOME}/a.out.serial"
    
    # change to job scratch directory $SCR
    cd $SCR
    
    # make a directory for this machine/node and move into it
    HOST=`hostname`
    mkdir -p $HOST
    cd $HOST
    for ITERATION in `seq 1 $N`
    do
      # open a sub shell and setup the serial run there, backgrounding the subshell
      (
        mkdir $ITERATION
        cd $ITERATION
        # copy any needed input files to here, untar a bundle here ...
        cp $HOME/input.dat .
        $PROGRAM > output
      ) &
      # ^ important, do not omit the ampersand
    done
    
    # wait until each of subshells from the for loop above completes
    wait
    
    

Why does my program work with MVAPICH2, but not OpenMPI?

    You may get errors like:
    [abe0872:6882] *** An error occurred in MPI_Sendrecv
    [abe0872:6882] *** on communicator MPI_COMM_WORLD
    [abe0872:6882] *** MPI_ERR_RANK: invalid rank
    [abe0872:6882] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
    
    when using OpenMPI, but not in MVAPICH2 if the code is explicitly using rank -1 in send/recv calls. In MVAPICH2, MPI_PROC_NULL is defined as -1, but with OpenMPI it's defined as -2, hence the difference in runtime behavior. The proper thing for the code to do is use MPI_PROC_NULL (instead of -1) when that is the intention.