RUSH RENDER QUEUE
(C) Copyright 1995,2000 Greg Ercolano. All rights reserved.
V 101.84 09/06/00
Strikeout text indicates features not yet implemented


Command Reference


 Submit Command Reference

Submit Commands
AutoDump 
Criteria 
Command
Cpus
DoneCommand 
DoneMail 
Frames
LogDir
LogFlags 
NeverCpus
Notes
Priority
Ram
State
Title
WaitFor
Dump job on completion
Criteria for matching hosts
Render script to execute
Hosts (or hostgroups) to use for rendering
Command to run when job done
Send mail when job done
Frame ranges to render
Directory for log files
Controls logfile behavior
Cpus to never use for rendering
Job notes
Default priority
Ram job expects to use (max)
Initial state for job
Title for job
Wait for other jobs to complete

AutoDump
(rush -autodump)

Command

Cpus
(rush -ac/-rc)

Criteria
(rush -criteria)


[erco@howland] % rush -lac
IP               Hostname   Ram  Cpus Pri Criteria
192.168.10.3     rotwang    100  2    0   +any,linux,linux6.0,intel,+dante
192.168.10.2     how        256  2    0   +any,sgi,irix,irix6.2
192.168.10.1     nt         256  1    0   +any,winnt,+dante
 
criteria ( linux | ( irix6 & octane ) )
criteria ( linux | irix6.2 )
criteria ( linux & !alpha )
criteria ( linux & alpha & carrera )    
criteria ( +any )
criteria ( !intel )
# Use linux machines OR irix6 octanes.
# Only linux machines OR  irix6.2 machines.
# Use only linux machines that are NOT dec-alphas.
# Use only linux dec-alphas built by Carrera.
# Use all available machines
# Use all machines that are NOT intel based machines.

DoneCommand

DoneMail
(rush -donemail)

Frames
(rush -af/-rf)

LogDir

LogFlags

NeverCpus
(rush -an/-rn)

Notes
(rush -notes)

Priority
(rush -priority)

Ram
(rush -ram)

State
(rush -pause/-cont)

Title
(rush -title)

WaitFor


Rush Command Line


Rush Command Line Arguments
-ac -af -an -autodump -checkconf -checkhosts
-cont -criteria -deltaskfu -dexit -dexitnow -deltask_fu
-dlog -done -donemail -down -dump -end
-fail -fu -getoff -hold -jobnotes -lac
-laj -lajf -lc -lcf -lf -lff
-lfi -lj -ljf -notes -offline -online
-pause -ping -que -ram -rc -reorder
-rf -rn -rotate -status -submit -tasklist
-title -trs -tss -uping


Configuration File
$RUSH_DIR/etc/rush.conf




Hosts File
$RUSH_DIR/etc/hosts




Cpu Accounting File
$RUSH_DIR/var/cpu.acct


The cpu accounting file is configured with the rush.conf file's CpuAcctPath  command. Each time a frame finishes executing, a new entry is created in the Cpu Accounting file logging the name of the job, how long the frame ran, etc.

Cpu Accounting File Example

u  948242700 53
p  948242783 tahoe-798    WERNER/C33 erco     0106  superior 100k  122  0   0
p  948242783 tahoe-798    WERNER/C33 erco     0107  superior 100k  122  0   0
p  948242865 tahoe-797    KILLER     erco     0504  superior 200   121  0   0
u  948246300 5
u  948249900 0

Process Entries


p  948242783 tahoe-798 WERNER/C33 erco  0106  superior  100k  122  0   0
p  948242783 tahoe-798 WERNER/C33 erco  0107  superior  100k  122  0   0
p  948242865 tahoe-797 KILLER     erco  0504  superior  200   121  0   0
-  --------- --------- ---------- ----  ----  --------  ----  ---  -   -
|      |         |          |      |     |       |       |     |   |   |
|      |         |          |      |     |       |       |     |   |   #Secs User Time
|      |         |          |      |     |       |       |     |   |                 
|      |         |          |      User  |       |       |     |   #Secs System Time
|      |         |          |            |       |       |     |
|      |         |          Title of job |       |       |     #Secs Wall Clock Time
|      |         Jobid                   |       |       |
|      |                                 |       |       Priority
|      time(2) process started           |       |
|                                        |       Host that ran the process
'p' indicates 'process entry'            |
					 Frame that ran

Utilization Entries


u  948242700 53
u  948246300 5
-  --------- --
|      |      |
|      |      Percent of time processor(s) were busy rendering. (0-100)
|      |
|      time(2) utilization recorded
|
'u' indicates 'utilization entry' 
Note: values showing process execution time are problematic for billing purposes. Wall clock time includes time process may have spent waiting for network load. User and System times only report the respective times spent for the Render Script only; not its sub-processes (eg. the renderer).

To properly bill for cpu time, you would either need to enable unix process accounting to attain accumulated cpu time for all sub-processes in the user's render script, or, create wrapper scripts that use programs like timex(2) to monitor the binary execution time of the critical render/compositor processes.

Tools like timex(2) indicate in their documentation they need unix process accounting is enabled to show sub-process totals. This is usually prohibitive on production machines, due to resources used by the unix process accounting system.


Administration


    Unix Installation Instructions

1) Choose a local directory to install rush.

    /usr/local/rush is recommended. To install the system on a large network, first install the software on one machine, get everything working, then rdist(1) the directory tree to all the machines on your network.

    WARNING: As with all daemons, do *not* install rush binaries in NFS mounted directories; NFS hiccups will cause the executing daemons to hang, since the binaries will be demand paging over NFS. Keep rush binaries local on each machine.

2) The RUSH_DIR environment variable should be set.

    The directory rush is installed (ie. /usr/local/rush). This setting should be in all environments that run the rush binaries. This includes boot scripts that start the rushd(8) daemon and user environments.

3) Configure the $RUSH_DIR/etc/rush.conf file.

    For most situations the defaults suffice.

      Be sure to register your settings for serverport in /etc/services, or equivalent. See serverport for an example entry.

    If security is an issue at your site, be sure to check ALL settings, esp. UidRange and GidRange. Also, correctly configure AdminUser and WebUser for your environment. Read about them before accepting the defaults.

    If you want to make changes, see Configuration File for more info.


4) Configure the $RUSH_DIR/etc/hosts file

    It should contain the names of all hosts that participate in rendering. See Hosts File for more information.

5) Configure the $RUSH_DIR/etc/templates file.

    Customize the template render/submit scripts for your local environment. TDs use these templates to create their submit scripts and render scripts via 'rush -tss/-trs', and they will want to inherit settings for typical situations.

6) Configure the $RUSH_DIR/etc/.submit and $RUSH_DIR/etc/.render files.

    These files are sourced by the default Submit Script and Render Scripts respectively.

7) Configure daemon to start on boot.

8) Configure regular log rotations.

    This usually just involves invoking rush -rotate via cron(8) on a nightly basis.

9) Security issues.

    To avoid root loopholes, be sure all subdirs in the path to the setuid binaries and config files have tight permissions, eg. if rush is installed in /usr/local/rush/bin:
        chmod go-w /usr \
    	       /usr/local \
    	       /usr/local/rush \
    	       /usr/local/rush/bin \
    	       /usr/local/rush/bin/* \
    	       /usr/local/rush/etc
    
        chmod 4755 /usr/local/rush/bin/rush \
    	       /usr/local/rush/bin/rushd
    
        chown 0.0 /usr/local/rush/bin/rush \
    	      /usr/local/rush/bin/rushd
         

Network Install

Basically, you want to rdist(1) the /usr/local/rush directory to all the machines, start the daemons, and verify they're running. It's recommended you start the rush daemon after the boot scripts have enabled networking, but BEFORE enabling nfs and rpc services.


    # LINUX
    foreach i ( linux1 linux2 linux3 linux4 )
        echo -n Working on ${i}: dist..
        rdist -c /usr/local/rush             ${i}:/usr/local/rush
	rdist -c /usr/local/rush/etc/S99rush ${i}:/etc/rc.d/init.d/rush
	echo -n rc3..
	rsh $i ln -s /etc/rc.d/init.d/rush /etc/rc.d/rc2.d/S29rush
	echo -n rc5..
	rsh $i ln -s /etc/rc.d/init.d/rush /etc/rc.d/rc5.d/S29rush
	echo -n daemon..
	rsh $i /etc/rc.d/init.d/rush start
    end

    # IRIX
    foreach i ( octane1 octane2 octane3 octane4 )
        echo -n Working on ${i}: dist..
        rdist -c /usr/local/rush             ${i}:/usr/local/rush
	rdist -c /usr/local/rush/etc/S99rush ${i}:/etc/init.d/rush
	echo -n rc..
	rsh $i ln -s /etc/init.d/rush /etc/rc2.d/S35rush
	echo -n daemon..
	rsh $i /etc/init.d/rush start
    end
    

Now verify all the daemons have started.

    rush -ping +any           # pings all daemons in rush/etc/hosts
    

    NT Installation Instructions

TBD


FAQ - Frequently Asked Questions




TD Questions

How can I use padded frame numbers (0000) in my render script?
Use $RUSH_PADFRAME, it is created for you automatically.

However, you may want to do your own frame number padding, so you can use this unix technique:

set padframe = `perl -e 'printf("%04d",$ENV{RUSH_FRAME});'`
To use different padding widths, just change the '4' (in '%04d') to a different number.

My renders are coming up 'FAIL'. How do I figure out what's wrong?
Check the frame logs being generated by your render script.

Frame logs contain the error messages from each rendered frame which should help you determine the problem. Make sure your submit script has logdir pointing to a valid directory, which is where your frame logs can be found.

Also, make sure your render script is returning the proper exit code. The most common problem is a render script that does not properly handle returning exit codes. Your render script must 'exit 0' for a frame to show up 'DONE' in the frame list. Make sure your script is properly checking the error returns from your renderer, and translating them into the codes rush expects. See Render Scripts for more.


How do I have rush automatically retry frames? How do I set the number of retrys?
See Retrying Frames.

My job isn't starting renders on my cpus. What's going on?
Use 'rush -lc' and check the Notes column for messages.

If you know the remote cpus aren't just busy with other jobs, then list your cpus and check the 'NOTES' column to see if the system is giving you reasons why your cpus are getting rejected. 

The job might be in Pause, there are no more frames to render, all the available machines don't have as much ram as your job needs, etc. Here are some typical situations:

[erco@howland]% rush -lc
CPUSPEC            STATE  FRM  PID   ELAPSED  NOTES
placid=3@100k      Idle   -    -     00:04:37 Job state is 'Pause'
tahoe=1@1          Idle   -    -     00:02:08 No more frames
superior=1@1       Idle   -    -     00:02:08 Not enough ram
waccubuc=1@1       Idle   -    -     00:02:08 This is a 'neverhost'
ontario=1@1        Idle   -    -     00:02:08 Failed 'criteria' check

How do I setup my submit script to only render on certain platforms or operating systems?
Use the Criteria submit script command.

This command allows you to build a list of platforms, operating systems, or other general critera to limit which machines will run your renders.

You can see the different criteria names in the output of 'rush -lac'. It is up to your sysadmin to maintain the criteria names.


How can I render several frames in one process using rush?
With clever scripting. See Batching Multiple Frames for how to render several frames at a time.

Sometimes it pays to render several frames at a time rather than one at a time, to decrease the amount of time the renderer spends loading files.

If you have existing script filters which monitor the progress of renders to determine which frames are rendering, you can probably easily modify these scripts to work with rush to reflect changes in the frame list, using either frame notes (rush -notes) or frame state change operations (rush -que/rush -done). 


My job has its 'k' flag set; why isn't it bumping off other jobs' frames?
For a job to bump another off a cpu, these things must be true:
  • A job only bump other jobs of lower priority (ie. not same priority) 
  • A job can't be bumped if almighty flag is set ('a'). 
  • A job can't be bumped unless its entry in the -tasklist is either in the Avail or Run state.
When a frame is bumped, the bumped frame will show a message in its frame list indicating the job that bumped it, e.g.:
% rush -lf erie-790
STAT FRAME TRY HOSTNAME PID   ELAPSED  NOTES
Run  0100  0   tahoe    10290 00:00:26 
Run  0101  0   tahoe    10291 00:00:26 
Que  0102  1   tahoe    10292 00:00:09 Bumped by ralph's superior-791,KILLER @300ka
Que  0103  0   -        0     00:00:00 
[..]

Is there an easier way to set the RUSH_JOBID environment variable?
You can use eval `submit` to automatically set it, or a simple alias to set it manually. However, cut and pasting the setenv command is not so hard.

Some people like to use this alias to make it easy to set new jobid variables:

      # Put this in your .cshrc
      alias jid 'setenv RUSH_JOBID "\!*"'
Then you can use it on the command line to set one or more jobids:
      erco@tahoe % jid tahoe-932 tahoe-933
If you want to have the RUSH_JOBID variable set automatically in your shell whenever you invoke your submit script, then use 'eval':
      erco@tahoe % eval `my_submit_script`
..the shell automatically parses the 'setenv RUSH_JOBID' command rush prints on stdout when a job is successfully submitted. Error messages are not affected by 'eval', so you don't have to worry about loosing error messages when using this technique.

How can my render script detect it's being 'bumped' by a higher priority job?
    Not without clever scripting.

    Usually the desire to do this stems from wanting to clean up left over temporary files generated by renders. In most cases, you can avoid left over files by putting temporary files in $RUSH_TMPDIR, which rush cleans automatically, even after bumps.

    Bumps and dumps use SIGKILL to kill the render script and its children. This signal is NOT trappable. There's a reason:

      Under many circumstances SIGTERM, the 'trappable' kill is not effective, especially during heavy rendering, causing bumped frames not to bump, screwing up unattended use, and leaving processors unproductive.

      Since bumps can happen just as readily as dumps, both use SIGKILL, untrappable, and always effective (except in pathological cases where the process is hung).

      So do not expect to be able to trap interrupts to detect bumps/dumps.

    If you need a way to determine if you are re-rendering a frame that was previous killed mid-execution (ie. bumped by a higher priority job), you can put some logic into your render script:

        #!/bin/csh -f
        ..
        if ( -e /somewhere/$RUSH_FRAME.busy ) then
    	echo We are picking up a frame that was killed.
    	echo Do pickup stuff here..
        endif
    
        # Create a 'busy' file for this frame
        #    If we are bumped, busy file is left behind 
        #    so that the above logic can detect it.
        #
        touch /somewhere/$RUSH_FRAME.busy
        echo Do rendering here..
        rm -f /somewhere/$RUSH_FRAME.busy
        



Systems Administrator Questions

What's the best way to verify all the daemons are running?

    Use:

      rush -ping +any

    This 'pings' all the daemons in the rush host's file with a TCP message.

    If the daemon isn't running, tail(1) the daemon's log file in $RUSH_DIR/var/rushd.log.


Is there an example boot script I can use to invoke rush?

    Yes; see $RUSH_DIR/etc/S99rush.

Is there a way to partition a network into separate render queue 'domains'?
    Yes; configure the serverport value differently in the $RUSH_DIR/etc/rush.conf file, and maintain separate $RUSH_DIR/etc/hosts files. 

    For instance, if you have a network of four hosts, A,B,C, D, and don't want A/B's render queue to communicate with the C/D machines, then configure the rush.conf file on the A/B machines: 
     

      serverport 696


    ..and on the C/D machines: 
     

      serverport 10002


    By doing this, the A/B host's render queue will not communicate with the C/D render queue, and vice versa. Both the daemons and the rush(1) client will refer to these values automatically. 

    You can use any port numbers, provided they don't conflict with existing networking protocols. Be sure to reserve both sets of numbers you use in your /etc/services file, for documentation purposes.


How do I update changes to the rush hosts file (or rush.conf file) to the network?