In order to properly intercept termination signals, pbspvm must be exec'ed, replacing the shell which invokes it. pbspvm checks for this, and will complain if it is not in the proper location within the PBS process hierarchy.
When PBS initiates a job, the list of allocated hosts is made available in a file which is referenced via the PBS_NODEFILE environment variable. pbspvm uses this file to construct a PVM hostfile and starts a single PVM daemon (pvmd) on each unique node. The master PVM daemon resides on the first host listed in PBS_NODEFILE. PVM lacks a mechanism for indicating the number of processors available on a node, so if PBS allocates multiple virtual processors on a host, it is up to the PVM application to determine which nodes were allocated and how many processes should be started on each of them. The pvm_config() function can be used to determine which hosts have been assigned to a job, but not the number of CPUs allocated.
pbspvm supports three different ways of running PVM programs. In the case of a typical batch job, program is a PVM executable which uses pvm_config() to determine which nodes are available and then calls pvm_spawn() to start tasks on those nodes. Alternatively, program can be a shell script which invokes a series of PVM programs, one after another, possibly interspersed with shell commands or other operations. Finally, PVM jobs can be run interactively by using "qsub -I" to start an interactive PBS session and then setting program to be the PVM console, pvm. In this case, the PVM console automatically detects the presence of the PVM daemons and configures its virtual machine accordingly. PVM tasks may then be spawned interactively via the console interface.
qsub -l nodes=8:compute -l walltime=15:00
#!/usr/bin/csh
cd ~/mydir
exec pbspvm -v -L ./mylog -P "~/bin:~/pvm3/bin/$PVM_ARCH" myprog arg1 arg2
arg3
^D
Example 2: Execute a script that runs two PVM programs in succession and then copies the resulting output files from the master node's local scratch directory to the user's home directory:
qsub -l nodes=8:c2:ppn=2 -l walltime=1800
#!/usr/local/bin/tcsh
cd ~/mydir
exec pbspvm -X 0.05 -W ~/lscr/output -P ~/bin ./myscript >&myprog.out
^D
where myscript might look like:
#!/usr/local/bin/tcsh
~/bin/myprog1 arg1
~/bin/myprog2 arg2
if ( ! -d ~/outdir ) mkdir ~/outdir
cp ~/lscr/output/* ~/outdir
exit 0
Example 3: Start up an interactive PVM console:
qsub -I -l nodes=10:typhoon -l walltime=30:00
cd ~/mydir
exec pbspvm -v -P "~/bin:~/pvm3/bin/$PVM_ARCH:$PVM_ROOT/lib/$PVM_ARCH" pvm
To perform its startup and cleanup functions, pbspvm invokes rsh multiple times for every node allocated to the job. If several very short jobs with large numbers of nodes execute in rapid succession on the same master node, the pool of reserved network ports available to rsh may be exhausted. This results in error messages complaining about "socket: All ports in use", and the job may fail. The networking protocol frees these ports for reuse in about two minutes, so a simple workaround is to sleep for a little while before executing the next pbspvm command:
qsub -l nodes=64:c1+32:c2:ppn=2 -l walltime=600
#!/usr/bin/csh
cd ~/mydir
sleep 120
exec pbspvm -v -P ~/bin/pvm bigjob
^D
Circumstances may arise in which pbspvm (or any other PBS job, for that matter) might not be able to find and kill all of the processes belonging to it. If a pristine execution environment is essential, additional checks (beyond -C or -X) may be needed to ensure that no stray processes reside on a node.
Only one PVM job per user may be active on a node. pbspvm checks for this, and will abort the job if an existing PVM daemon belonging to the same user is found on any of the assigned nodes. To avoid this problem, users should be careful to submit only one PVM job at a time, or else to specify PBS node properties in such a way that multiple jobs will not share nodes (e.g., by avoiding the "#shared" attribute and by using "ppn=" to allocate all of the virtual processors on a node). The "-W depend=" option of qsub may also be helpful in enforcing dependencies between jobs.