RUSH RENDER QUEUE
(C) Copyright 1995,2000 Greg Ercolano. All rights reserved.
V 101.84 09/06/00

Strikeout text indicates features not yet implemented

Command Reference

Submit Command Reference

Submit Commands

AutoDump 
Criteria 
Command
Cpus
DoneCommand 
DoneMail 
Frames
LogDir
LogFlags 
NeverCpus
Notes
Priority
Ram
State
Title
WaitFor

Dump job on completion
Criteria for matching hosts
Render script to execute
Hosts (or hostgroups) to use for rendering
Command to run when job done
Send mail when job done
Frame ranges to render
Directory for log files
Controls logfile behavior
Cpus to never use for rendering
Job notes
Default priority
Ram job expects to use (max)
Initial state for job
Title for job
Wait for other jobs to complete

AutoDump (rush -autodump)

autodump off
autodump done
autodump donefail

# Don't autodump; job remains when frames are done.
# Job dumps itself when all frames are DONE
# Job dumps itself If all frames are DONE or FAIL

In the case of 'autodump done', the job will not dump if there were any FAIL frames. If you want it to dump anyway, then use 'autodump donefail'.

Command

Usually, this is always an absolute NFS path to the Render Script.

It can, however, be an absolute path to any executable or script, provided it returns rush exit codes (0,1,2), and knows how to access RUSH_FRAME to determine which frame its working on.

    command /job/MARINER/DIVE/rush/render-script 640 480

WinNT Note - Use UNC paths for the absolute path to the render script. This prevents problems with inconsistently mapped drive letters. A UNC example:

command //server-01/large6/rush/bin/myrender.bat 640 480

Cpus (rush -ac/-rc)

When specifying a cpu, your are telling rush at least three things:

The name of the host (or hosts when host groups are specified)
The number of cpus to use on that host (or hosts)
The job's priority to use when running on that host

The number of cpus defaults to 1 if unspecified.

If unspecified, the priority value defaults to the Priority value for the job.

Priority is a value between 1 and 999, with 999 being highest priority, 1 being lowest. Priority values can be followed by optional flags 'k' and/or 'a'. See Priority Description for a full description of how the priority mechanism works.

cpus pabst
cpus pabst=4
cpus pabst=4@900
cpus pabst=4@900,2@500    
cpus +any=10@1
cpus +farm=50@1
cpus pabst=4/2

# 1 cpu on pabst, default priority
# 4 cpus on pabst, default priority
# 4 cpus at 900 priority
# 4 cpus at 900 priority, 2 cpus at 500
# use up to 10 cpus on 'any available machine'
# use up to 50 machines on the 'farm' host group
# 4 cpus, 2 cpus per frame (threading)

Host Group

Host Groups are configured by your sysadmin in the Hosts file.

Criteria (rush -criteria)

&

|

rush -lac


[erco@howland] % rush -lac
IP               Hostname   Ram  Cpus Pri Criteria
192.168.10.3     rotwang    100  2    0   +any,linux,linux6.0,intel,+dante
192.168.10.2     how        256  2    0   +any,sgi,irix,irix6.2
192.168.10.1     nt         256  1    0   +any,winnt,+dante

When you specify hosts to render, any Criteria you specify will limit which machines your renders will run on; if the criteria you specify don't match a particular host, even if the host is specifically requested by a Cpus command, frames will be turned away from rendering on that machine.

For instance, if your job depends on using only linux machines or sgis running IRIX 6.2, you might submit your job with a criteria line that reads:

criteria ( linux | irix6.2 )

The above presumes your sysadmin uses 'linux' and 'irix6.2' as qualifiers in the host list. If you need new criteria strings configured, ask your sysadmin to add them to the rush system's hosts file.

Only one Criteria command should appear in a submit script; multiple instances of the command are not cumulative.

Here are some more examples:

criteria ( linux | ( irix6 & octane ) )
criteria ( linux | irix6.2 )
criteria ( linux & !alpha )
criteria ( linux & alpha & carrera )    
criteria ( +any )
criteria ( !intel )

# Use linux machines OR irix6 octanes.
# Only linux machines OR  irix6.2 machines.
# Use only linux machines that are NOT dec-alphas.
# Use only linux dec-alphas built by Carrera.
# Use all available machines
# Use all machines that are NOT intel based machines.

Caveat: There is currently no default precedence for logical operators at this time; operators are simply parsed from left to right. So be sure to use parens to imply any kind of precedence, as shown above.

DoneCommand

LogDir

For the DoneCommand to be executed, the job must dump. For automatic invocation, you will need to have the AutoDump command enabled, for the job to dump when all the frames are done. If AutoDump is disabled, the only way the DoneCommand will execute, is if someone manually invokes 'rush -dump'.

DoneCommand scripts are passed the jobid in the RUSH_JOBID environment variable, so it's possible for the script to use rush(1) commands to query the job. Exit codes are currently ignored. The stdout and stderr output from the DoneCommand is writted to a file called 'done.log' in the LogDir.

#!/bin/csh -f

# EXAMPLE 'DoneCommand' SCRIPT

set $wwwreport = /somewhere/MYPROJECT/html/`logname`/jobreport.html

# CREATE A CUSTOMIZED WEBPAGE REPORT
set logdir = `dirname $RUSH_LOGFILE`
cat $logdir/framelist | \
    my_report_generator > $wwwreport

# MAIL THE REPORT TO SOMEONE
Mail -s "$RUSH_JOBID Html Report" `logname` < $wwwreport

The DoneCommand should avoid doing anything to the job that might make it continue running. Though possible, this would confuse someone manually trying to dump the job, only to find it requeuing itself.

donecommand -
donecommand $cwd/cleanup

# Disable done commands
# Setup script to run before job dumps itself

DoneMail (rush -donemail)

Arguments should all be valid email addresses. If more than one address needs to be specified, separate with commas. There should be no spaces in the list of addresses. Use '-' to disable sending completion mail (default). Some possible settings for DoneMail:

donemail  -               
donemail  erco@netcom.com    
donemail  fred,jack

# Mail disabled
# Send mail to erco@netcom.com
# Send mail to fred and jack

LogDir

framelist

jobinfo

Frames (rush -af/-rf)

frames  1-10
frames  100-150,2
frames  500 507 615

# Frames 1 thru 10
# Frames  100 thru 150 on twos
# Frames  500, 507 and 615

Frame States. You can set the initial state for the frame on a per-frame basis. Possible frame state values are Done|Fail|Hold|Queue:

frames  1-5=Done      
frames  6-10=Fail     
frames  11-15=Hold    
frames  16-20=Queue

# Frames 1 thru 5 in DONE state
# Frames 6 thru 10 in FAIL state
# Frames 11 thru 15 in HOLD state
# Frames 16 thru 20 in QUEUE state (default)

Notes In Frames. On a per-frame basis, frames can contain notes, which show up in the last column of frame lists. Notes can be specified in the Frames command:

frames  1-10:Black         
frames  11:Fade_up_on_sc17

# Notes for frames 1 thru 10  is "Black"
# Frame 11 has note "Fade_up_on_sc17"

The above example creates a frame list that looks like:


    [erco@howland]% rush -lf
    STAT FRAME TRY HOSTNAME PID   START          ELAPSED  NOTES
    Que  0001  0   -        0     00/00,00:00:00 00:00:00 Black
    Que  0002  0   -        0     00/00,00:00:00 00:00:00 Black
    Que  0003  0   -        0     00/00,00:00:00 00:00:00 Black
    Que  0004  0   -        0     00/00,00:00:00 00:00:00 Black
    Que  0005  0   -        0     00/00,00:00:00 00:00:00 Black
    Que  0006  0   -        0     00/00,00:00:00 00:00:00 Black
    Que  0007  0   -        0     00/00,00:00:00 00:00:00 Black
    Que  0008  0   -        0     00/00,00:00:00 00:00:00 Black
    Que  0009  0   -        0     00/00,00:00:00 00:00:00 Black
    Que  0010  0   -        0     00/00,00:00:00 00:00:00 Black
    Que  0011  0   -        0     00/00,00:00:00 00:00:00 Fade_up_on_sc17

States and Notes.

and

frames 1-5=Done:This_is_a_test

Caveat: In submit scripts, frame notes currently cannot contain spaces; use underbars instead.

LogDir

When a job is dumped, two other files appear in this directory;

framelist -- The job's frame list (ie. 'rush -lf') at the time the job was dumped
jobinfo -- The 'rush -ljf' report

The directory must exist relative to both the job server and all machines participating in rendering, and the directory must be read/writable by the user submitting the job.

logdir /jobs/myjob/logs
logdir - # Logs are dumped in this directory
# Disable log files

WinNT Note - Use a UNC path for the absolute path to the logdir. This prevents problems with inconsistently mapped drive letters. A UNC example:

logdir //server-01/large6/rush/soft/logs

LogFlags

The default behavior is to overwrite frame logfiles, each time a frame renders.

KeepLast tells the system to always keep the previous logfile, if there is one. It does this by renaming the previous log to a ".old" file, before creating the new log for a running frame, similar to running the command:

mv logs/0055 logs/0055.old

KeepAll is similar to KeepLast, with the additional behavior that all 'previous' logs are kept; before a framelog is overwritten, it is concatenated to the .old file, similar to running:

cat logs/0055 >> logs/0055.old

Beware; if your logfiles are long, KeepAll will create significant use of disk space, since the logs will accumulate. A good reason to use KeepLast instead.

`logflags -`	*# Logs are overwritten (Default)*
`logflags keepall`	*# Keep all logs; concatenate old logs in 0000.old*
`logflags keeplast`	*# Like 'keepall', but only keeps last log (don't concatenate)*

NeverCpus (rush -an/-rn)

Cpus

nevercpus tahoe rotwang   # Never use tahoe or rotwang for rendering

Notes (rush -notes)

..
notes Please don't dump this job until you have visually
notes verified the matte transition at frames 205-219.
notes Call me at home if there are problems! -fred
..

Notes for the job appear in the 'rush -ljf' reports:

[erco@howland]% rush -ljf : : Elapsed: 00:00:00 Frames: 22 Cpus: rotwang=2@100k Cpus: how=3@100k Notes[0]: Please don't dump this job until you have visually Notes[1]: verified the matte transition at frames 205-219. Notes[2]: Call me at home if there are problems! -fred

Priority(rush -priority)

cpus

See Priority Description for a description of how priority values work.

Ram (rush -ram)

While job is running, the configured Ram value is compared against the available ram on the remote processors. If the amount of ram your job wants is more than the remote machine has available, then the frame will not be started. This behavior prevents swapping the remote machines.

ram 128      # Only run on machines that have at least 128MB of ram available

State(rush -pause/-cont)

    :
    state  Pause        # Submit job in paused state
    :

    [erco@howland]% rush -lj
    STATUS JOBID       TITLE        OWNER    %DONE BUSY NOTES 
    ------ ----------- ------------ -------- ----- ---- -----------
    Pause  how-857     THX/LOGO     erco     %0    0    Job paused.

Title (rush -title)

    title THX/LOGO

    [erco@howland]% rush -lj 
    STATUS JOBID       TITLE        OWNER    %DONE BUSY NOTES 
    ------ ----------- ------------ -------- ----- ---- ----------- 
    Run    how-857     THX/LOGO     erco     %0    0    00:00:05

WaitFor

Chaining Jobs

waitfor radius-445 radius-446

Rush Command Line

Rush Command Line Arguments

`-ac`	`-af`	`-an`	`-autodump`	`-checkconf`	`-checkhosts`
`-cont`	`-criteria`	`-deltaskfu`	`-dexit`	`-dexitnow`	`-deltask_fu`
`-dlog`	`-done`	`-donemail`	`-down`	`-dump`	`-end`
`-fail`	`-fu`	`-getoff`	`-hold`	`-jobnotes`	`-lac`
`-laj`	`-lajf`	`-lc`	`-lcf`	`-lf`	`-lff`
`-lfi`	`-lj`	`-ljf`	`-notes`	`-offline`	`-online`
`-pause`	`-ping`	`-que`	`-ram`	`-rc`	`-reorder`
`-rf`	`-rn`	`-rotate`	`-status`	`-submit`	`-tasklist`
`-title`	`-trs`	`-tss`	`-uping`

rush -ac <cpuspec..> [jobid..]

    % rush -ac tahoe@300 +rfarm=10@100k +any=10@10

Cpus

rush -af <framerange..> [jobid..]

    % rush -af 100-150      # Add frames 100 thru 150 to the current job

See Frames submit command for more info.

rush -an <hostname|+group..> [jobid..]

    % rush -an tahoe +rfarm

Nevercpus

rush -autodump <`off|done|donefail>` [jobid..]

AutoDump

rush -checkconf <filename>

    #!/bin/csh -f
    ###
    ### Script to edit the rush.conf file, and rdist it out
    ###
    set TMPFILE=/usr/tmp/rush.conf.$$
    cp /usr/local/rush/etc/rush.conf $TMPFILE
    vi $TMPFILE
    rush -checkconf $TMPFILE
    if ( $? ) then
        echo You lose, game over.
	set err=1
    else
	foreach i ( tahoe superior erie )
	    rdist -c $TMPFILE ${i}:/usr/local/rush/etc/rush.conf
	end
	set err=0
    endif
    rm -f $TMPFILE
    exit $err

rush -checkhosts <filename>

(Administration) Checks a hosts file for errors. Returns an exit code indicating success or failure; 0=ok, 1=error(s) found.

rush -cont [jobid..]

rush -pause

rush -criteria 'criteria strings' [jobid..]

Criteria

rush -deltaskfu [..]

(Administrative) Delete a task from a cpu server.

rush -dexit [remotehost..]

(Administration) Tells a daemon to exit immediately via TCP. Acknowledges success.

rush -dexitnow [remotehost..]

(Administration) Tells daemon to exit via UDP. No acknowledge.

rush -dlog [remotehost..]

Logging Flags

These are flags that can be used with rush(1)'s -dlog flag, rushd(8)'s -d flag, and the rush.conf file's LogFlags <logflags>. These flags can be combined to accumulate logging verbosity. All flags can be enabled by specifying 'a'.

a - all b - bump mechanism logging d - log duplicate/redundant receipt of packet drops e - events (time oriented, async) f - fork h - hostname lookups j - Log job submissions k - Log bumped/killed/usurped tasks l - Logical string evaluations o - connect()/open()/close()/bind()/socket() (low level) p - parse command line arguments, submit scripts m - memory calculations (RAM) during priority battles, etc n - network commands (udp/tcp) r - reboot management/transactions s - signals t - tcp u - udp w - 'waitfor' checks y - yp lookups C - class ToWords/FromWords F - File loading line-by-line debugging E - Errors not normally displayed (benign, but suspect) T - task/taskack transactions U - update (scheduling, priority mechanism, idle cpu management) R - Reaper msgs S - Server/Client context switches X - Random UDP message dropping -- TESTING ONLY!! ('a' does not affect this option, it must be specified)

rush -done <framerange|framestate..> [jobid..]

killed

The frames to affect can either be specified as a frame range (ie. 1-100) or as a frame state, for which all frames matching that state will be changed. Examples:

% rush -done 1-100
% rush -done fail
% rush -done fail que # 'Done' frames 1 through 100
# 'Done' all frames currently Fail
# 'Done' all frames currently Fail or Que

rush -donemail user[@domain.com[,user..]] [jobid..]

DoneMail

rush -dump [jobid..]

Dump the job. Busy frames are killed right away.

rush -end [jobid..]

End the job. Busy frames are allowed to finish.

rush -fail <framerange|framestate..> [jobid..]

killed

The frames to affect can either be specified as a frame range (ie. 1-100) or as a frame state, for which all matching frames will be affected. Examples:

% rush -fail 1-100
% rush -fail done
% rush -fail done hold # Fail frames 1 through 100
# Fail all frames currently Done
# Fail all frames currently Done or Hold

rush [operation] -fu

For instance, if you want to control another person's job, you might get an error, eg:

% rush -an vaio va-229 rush: va-229: you're not owner! % rush -an vaio va-229 -fu Add Neverhosts vaio # Attempt to add 'vaio' as a neverhost to someone's job
# Fails because you're not the job's owner

# Same command with -fu to force it..
# ..now it works!

See also the RUSH_FU environment variable.

¹ This acronym is rumored to have an alternate, pejorative expansion.

rush -getoff [remotehost..]

Offlines the local [remote] processors, kills/requeues any running frames, causing them to start elsewhere.

rush -hold <framerange|framestate..> [jobid..]

killed

When a frame is in the Hold state, the frame will not be rendered, and a job will not autodump if there are any Hold frames in the frame list.

The frames to affect can either be specified as a frame range (ie. 1-100) or as a frame state, for which all matching frames will be affected. Examples:

% rush -hold 1-100
% rush -hold fail
% rush -hold fail done # Hold frames 1 through 100
# Hold all frames currently Fail
# Hold all frames currently Fail or Done

rush -jobnotes '<notes>' [jobid..]

Adds 'Job Notes'

Notes

rush -lac [hostname..]

List All Cpus.

If 'rush -lac hostname' is used, the information comes from the cache of the daemon running on the specified host; useful in determining hostname caching problems.

rush -laj

List All Jobs.

rush -lajf

List All Jobs Full.

rush -lc [jobid..]

List Cpus

rush -lcf [jobid..]

List Cpus

rush -lf [jobid..]

List Frames

rush -lff [jobid..]

List Frames Full.

rush -lfi [jobid..]

List Frame Information

rush -lj [remotehost..]

List Jobs.

rush -ljf [jobid..]

List Jobs Full. Lists all jobs running on the local [remote] host, showing 'full information' in a multi-line format.

rush -notes <framerange>:'notes..' [jobid..]

rush -lf

Examples:

    rush -notes 155:"license error"
    rush -notes 200-250:"redo later"

rush -offline [remotehost|+group..]

Offline the local [remote] daemon. Lets frames complete that are busy rendering. No new frames will be started.

rush -online [remotehost|+group..]

Online the local [remote] daemon. Frames will be allowed to start running on the host.

rush -pause [jobid..]

Pause the current [specific] job. Busy frames will be allowed to finish, no new frames will be started.

rush -ping [remotehost|+group..]

(Administrative). Pings the local [remote] daemon to see if it's running, and what it's doing.

rush -priority [jobid..]

Priority

Priority Description

rush -que <framerange|framestate..> [jobid..]

killed

The frames to affect can either be specified as a frame range (ie. 1-100) or as a frame state, for which all frames matching that state will be changed. Examples:

% rush -que 1-100
% rush -que fail
% rush -que fail done # Que frames 1 through 100
# Que all frames currently Fail
# Que all frames currently Fail or Done

rush -ram <ramval> [jobid..]

Ram

rush -rc <cpuspec|tidspec|hostname> [jobid..]

Cpus can be removed in one of several ways:

Individually by the JOBTID number, eg. rush -rc .32
By hostname, eg. rush -rc tahoe
By a particular cpu specification as shown in the 'rush -lc' report, eg. rush -rc tahoe=4@200

To remove a cpu via a JOBTID, you must precede each job tid number with a period, eg. rush -rc .12 .13 .14. When you delete by JOBTID, you are deleting a single cpu from the 'rush -lc' report. If the cpu is part of a larger specification, (eg. tahoe=4@12), then the cpu count for the spec will be modified, the cpu count in the spec will be decremented (eg. tahoe=3@12)

If you remove a hostname (eg. 'rush -rc tahoe') then all cpu specifications that have that host name (eg. tahoe=3@100) will be removed. Also, any host groups that expand to include that host will have that host removed from the expansion (eg. +any=3@100, which includes tahoe).

If you remove a cpu specification (eg. 'rush -rc +any=3@100'), it must match character-for-character the entry shown in the 'rush -lc' report for the job:

% rush -ac tahoe@100                       # Add a cpu.
% rush -rc tahoe@100                       # Now try to remove it
'tahoe@100' no such cpu specification      # FAILED: need to use spec shown in 'rush -lc'

% rush -lc                                 # Look at 'rush -lc' report
CPUSPEC     STATE FRM  PID   ELAPSED  .. 
tahoe=1@100 Run   0002 26747 00:00:11 ..   # More complete specification in report.

% rush -rc tahoe=1@100                     # Remove using spec shown in report
'tahoe=1@100' removed.                     # It works

rush -reorder <framerange..> [jobid..]

Changing the order of the framelist affects the order frames are rendered, since frames are issued from the top of the list, down.

Example. If the frame list is currently:

        1 2 3 4 5 6 7 8 9 10

rush -reorder

        rush -reorder 10-1                  -> 10 9 8 7 6 5 4 3 2 1
        rush -reorder 1-10,2 2-10,2        -> 1 3 5 7 9 2 4 6 8 10

rush -rf <framerange..> [jobid..]

Remove Frames from current [specific] job.

rush -rn <hostname|+group..> [jobid..]

Remove Neverhosts for current [specific] job.

rush -rotate [remotehost|+group..]

Logs can be automatically rotated with the LogRotateHour command in the rush.conf file.

rush -status [-s secs] [-c count] [remhost ..]

'rush -status' quickly reports the status of jobs and renders on the local [remote] hosts.

Optionally, a continuously updating report can be generated, where [-c count] specifies the number of updates, and [-s secs] specifies the seconds delay between each report. If -c is zero, updating is continuous.

The output contains several records of information for each host, one record per line. Host records start with an 'h' record, and terminate with a line of '---'.

4 different types of data records are possible. Data record types are defined by the first character in each record line, and can be one of:

    h - hostname header. Leads off records for the specified host.
    d - daemon information. Info about the running daemon.
    j - job information. One line per job.
    p - processor status. One line per processor.

    h <hostname>
    d <sequence-id> <daemon> <version> <PID=pid> <online state> <jobs> <busy procs> <total procs>
    j <owner> <jobid> <job title> <job state> <elapsed> <percent done> <percent fail> <# frms busy>
    p <owner> <jobid> <job title> <frame> <priority> <pid> <elapsed>

rush -submit [remotehost]

Submits a job to be managed by the local [remote] job server.

rush -tasklist [remotehost..]

List tasks on local [remote] cpu server.

rush -title <text> [jobid..]

Set title for current [specific] job.

rush -trs

% rush -trs > render_me

% chmod +x render_me

% vi render_me

### YOUR RENDER COMMAND(S) HERE

rush -tss

% rush -tss > submit_me

% chmod +x submit_me

% vi submit_me

command

title

cpus

ram

rush -uping [-c count] [remotehost..]

(Administrative) UDP Ping the local [remote] daemon. Useful for checking for UDP packet dropping. Optional argument '-c' can set the number of udp transmissions to send to the remote.

Configuration File
`$RUSH_DIR/etc/rush.conf`

The configuration file should be customized by the systems administrator. Most settings are used only for fine tuning, but some control important security settings (uidrange/gidrange/forceuid/forcegid), and process auditing/logging (cpuacctpath).

The rush.conf file can be updated on the fly; simply edit a copy, make changes, then rdist(1) the copy to all the machines, and the daemons will pick up your changes within one minute.

To make changes to this file and update this to the network, use these commands.

Command	Description	Example
LogFlags	Not to be confused with submit script LogFlags, Configures daemon logging features. Most are debugging flags used to track operation of the system. Flags can be combined to enable multiple debugging features. LogFlags affect both the daemon AND user applications. To affect only the daemon, specify flags on daemon's command line, or use '`rush -dlog <flags>`'. See Logging Flags Table for a complete list of all the one letter log flags.	`logflags jE`
UdpTimeout	The number of seconds between udp re-transmissions.	`udptimeout 8`
UdpMaxRetries	The number of re-transmissions until 'retry time-out' occurs	`udpmaxretries 5`
UdpRestTimeOut	How many secs to rest before recovering from a 'retry time-out'	`udpresttimeout 40`
InMaxMsgs	(Version 101.83+) How many messages (tcp/udp) are received from the input queue at a time, before re-checking output service routines. eg. for (t=0; t < inmaxmsgs; t++ ) select(..) if ( no data ) break;	`inmaxmsgs 30`
LogRotateHour	(Version 101.81+) Sets the hour (0-23) that the logs automatically rotate. A value of -1 disables automatic log rotation.	`logrotatehour 0`
JobUpdateThrottle	Don't advertise jobs' cpus faster than jobthrottlesecs. The daemon will re-advertise cpus that haven't been acknowledged by the remotes at about this rate.	`jobupdatethrottle 10`
JobPassTimeout	The `'jobpasstimeout'` value configures how many seconds the task will remain in the JOBPASS state before re-entering an IDLE state by itself. When a task on a remote cpu becomes IDLE, it tries to convince a job to use its cpu. If the job 'passes' on this request (no more frames to render, etc), the remote task enters a JOBPASS state, to avoid contacting the job again for a while. After the timeout period, the task re-enters an IDLE state to see if maybe the job had a FAIL frame, and has more frames to render after all.	`jobpasstimeout 150`
DaemonHostCache	rushd(8)'s hostname caching options. Only affects the way the daemon caches information. Options can be none, demand or boot: none - No caching. If using only /etc/hosts, or if hostnames change a lot. demand - Cache on demand, or whenever new hostlist reloaded. Prevents repetitive NIS/DNS traffic. boot - Cache entire hostlist and IP mappings on boot, or whenever hostlist changes.	`daemonhostcache boot`
AppHostCache	rush(1) 's host caching option; only affects the rush(1) client application's method of host caching. Can be none or demand.	`apphostcache demand`
NtRushUid	The uid used if an NT submitted job is to run on unix machines. Since the NT version of rush doesn't know how to map the name 'ntrush' to the equivalent uid value, NtRushUid is used to resolve it. Basically, this value should be the same as the uid value for the Unix user 'ntrush'.	`ntrushuid 100`
NtRushGid	The gid used if an NT submitted job is to run on unix machines. Since the NT version of rush doesn't know how to map the name 'ntrush' to the equivalent gid value, NtRushGid is used to resolve it. Basically, this value should be the same as the gid value for the Unix user 'ntrush'.	`ntrushgid 100`
UidRange	Disallow render queue to run processes with a uid outside this range. First value is a minimum, second value is a maximum. When a job is submitted, if the user's uid value is outside the range specified here, an error message is printed, and the job will not be submitted.	`uidrange 100 65000`
GidRange	Controls gid values the same way UidRange controls uid values.	`gidrange 100 65000`
ForceUid	Forces all user processes to run as this uid. Default is -1, allowing user processes to run as the UID of the user who submitted the job.	`forceuid -1` `forceuid 100`
ForceGid	Same as ForceUid for GID values.	`forcegid -1` `forcegid 100`
ServerPort	Set the rushd(1) server daemon's port numbers for UDP/TCP connections. Though unnecessary for proper operation of the render queue, you should register the ServerPort value in your `/etc/services` file, e.g.: rushd 696/tcp # rush server rushd 696/udp # rush server	`serverport 696`
ClientPort	ClientPort is a vestige from the past that is now obsolete and unrecognized. Please remove from all rush.conf files.
CpuAcctPath	Path to cpu accounting file. Set to '-' to disable generation of cpu accounting data.	`cpuacctpath /var/logs/cpu.acct`
AdminUser	Sets login name for user allowed to administer the rush daemons. Commands such as `'rush -dexit', 'rush -dlog a'` and others are limited to root and this user. Set to 'root' if there is no special rush administrative login.	`adminuser root`
DisableFu	Allows administrator to control whether users can use '`rush -fu`' and `$RUSH_FU` to control other people's jobs. If `disablefu` is set to 1, users can't control each other's jobs; only root and `adminuser` can do this. Normally, users should be able to control each other jobs, allowing local policies, peer pressure (and auditing daemon logs) to prevent pandemonium.	`disablefu 0`
~~WebUser~~	Sets login name for user the httpd daemon runs as, in cases where rush is being controlled by web interfaces. This user is allowed to use the RUSH_USER environment variable to pose as other users for the purpose of cgi-bin scripts being able to submit jobs as the user on the other end of Netscape. ~~Set to "root" to disable this feature (default).~~	~~`webuser guest`~~

Hosts File
`$RUSH_DIR/etc/hosts`

The $RUSH_DIR/etc/hosts file must contain the names of all hosts that participate in rendering.
The hosts file can be updated on the fly; simply edit a copy, make changes, then rdist(1) the copy to all the machines, and the daemons will pick up your changes within one minute.
To make changes to this file and update this to the network, use these commands.
The format of the hosts file is single lines of 5 white space separated fields, one line per host:

<Hostname> <#Cpus> <Ram> <Minimum Priority> <Criteria>

Blank lines and lines starting with '#' are ignored:

Hosts File Field Descriptions

<Hostname>

This is the name of the host, and should be the shortest name possible (e.g.. host aliases can be used here).
This is the name that will be used in jobids and other cpu reports, so it is best if short names are used (10 chars or less). Longer names are ok, but will misalign columnar reports. Avoid using FQDN hostnames (e.g.. foo.domain.com).

<#Cpus>

This should be the number of cpus the host has. This is how many processes the host will run at the same time. This value can be larger or smaller than the actual number of physical cpus the machine has.
'0' is an acceptable value that essentially disables the machine from participating in rendering, while allowing the host to be specified in submit scripts.

<Ram>

This is the amount of ram the machine has. This value can be less or more than the actual ram the machine has; usually this value takes into account some percentage of the host's swap space as well. This value is used when accepting frames to render; a frame that asks for more ram than the machine has will be turned away.
On multiprocessor machines, this value is a total from which rendering frames subtract their estimated ram use. For instance, if a 4 cpu machine is configured with a Ram value of 512, and 2 frames are currently rendering each with ram values 200, then only 112 will be left for rendering on the other two processors (112 = 512 - ( 200 x 2 ) ).

<Minimum Priority>

This prevents frames from rendering on this machine if their priority is lower than this value. If zero, all frames will be accepted.

<Criteria>

This is a list of comma separated strings that define platform or operating system specific features for the host. These can be arbitrary alpha-numeric strings that may also contain dashes, underbars and periods, but must not contain any white space. '+' characters have the special purpose of leading off a Host Group specification.
The <Criteria> field might be set to:
+any,linux,linux6.1,prman3.7
These strings can then be used in TD's submit scripts to limit which hosts will render their frames. See the Criteria Submit Script command for more info. All hosts should have a criteria entry that at least contains +any.
Host Group names are configured in this field too. To add a host group called +servers to the above example:
+any,linux,linux6.1,prman3.7,+servers

Example Hosts File

# RUSH HOSTS
#
# The 'Host' field should contain short names for hosts (aliases are ok),
# and must be unique.
#
# The 'Criteria' field must *NOT* contain white space, and words are 
# comma delimited. All hosts should contain '+any' in the criteria field.
#
#Host    Cpus Ram   MinPri Criteria
#-----   ---- ----  ------ -----------
tahoe    2    256   0      +any,+work,sgi,irix,irix6.5
superior 2    256   0      +any,+work,sgi,irix,irix6.2
ontario  1    128   0      +any,+work,linux,linux6.0,intel
erie     1    128   0      +any,+work,sgi,irix,irix6.4
rf1      1    512   0      +any,+rfarm,linux
rf2      1    512   0      +any,+rfarm,linux
rf3      1    512   0      +any,+rfarm,linux
rf4      1    512   0      +any,+rfarm,linux
rf5      1    512   0      +any,+rfarm,linux

Cpu Accounting File
`$RUSH_DIR/var/cpu.acct`

The cpu accounting file is configured with the rush.conf file's CpuAcctPath command. Each time a frame finishes executing, a new entry is created in the Cpu Accounting file logging the name of the job, how long the frame ran, etc.

Cpu Accounting File Example

u 948242700 53
p 948242783 tahoe-798    WERNER/C33 erco     0106 superior 100k  122 0   0
p 948242783 tahoe-798    WERNER/C33 erco     0107 superior 100k  122 0   0
p 948242865 tahoe-797    KILLER     erco     0504 superior 200   121 0   0
u 948246300 5
u 948249900 0

Process Entries


p  948242783 tahoe-798 WERNER/C33 erco  0106  superior  100k  122  0   0
p  948242783 tahoe-798 WERNER/C33 erco  0107  superior  100k  122  0   0
p  948242865 tahoe-797 KILLER     erco  0504  superior  200   121  0   0
-  --------- --------- ---------- ----  ----  --------  ----  ---  -   -
|      |         |          |      |     |       |       |     |   |   |
|      |         |          |      |     |       |       |     |   |   #Secs User Time
|      |         |          |      |     |       |       |     |   |                 
|      |         |          |      User  |       |       |     |   #Secs System Time
|      |         |          |            |       |       |     |
|      |         |          Title of job |       |       |     #Secs Wall Clock Time
|      |         Jobid                   |       |       |
|      |                                 |       |       Priority
|      time(2) process started           |       |
|                                        |       Host that ran the process
'p' indicates 'process entry'            |
					 Frame that ran

Utilization Entries


u  948242700 53
u  948246300 5
-  --------- --
|      |      |
|      |      Percent of time processor(s) were busy rendering. (0-100)
|      |
|      time(2) utilization recorded
|
'u' indicates 'utilization entry'

Note: values showing process execution time are problematic for billing purposes. Wall clock time includes time process may have spent waiting for network load. User and System times only report the respective times spent for the Render Script only; not its sub-processes (eg. the renderer).
To properly bill for cpu time, you would either need to enable unix process accounting to attain accumulated cpu time for all sub-processes in the user's render script, or, create wrapper scripts that use programs like timex(2) to monitor the binary execution time of the critical render/compositor processes.
Tools like timex(2) indicate in their documentation they need unix process accounting is enabled to show sub-process totals. This is usually prohibitive on production machines, due to resources used by the unix process accounting system.

Administration

Unix Installation Instructions

1) Choose a local directory to install rush.

/usr/local/rush is recommended. To install the system on a large network, first install the software on one machine, get everything working, then rdist(1) the directory tree to all the machines on your network.

WARNING: As with all daemons, do *not* install rush binaries in NFS mounted directories; NFS hiccups will cause the executing daemons to hang, since the binaries will be demand paging over NFS. Keep rush binaries local on each machine.

2) The RUSH_DIR environment variable should be set.

The directory rush is installed (ie. /usr/local/rush). This setting should be in all environments that run the rush binaries. This includes boot scripts that start the rushd(8) daemon and user environments.

3) Configure the $RUSH_DIR/etc/rush.conf file.

For most situations the defaults suffice.

Be sure to register your settings for serverport in /etc/services,or equivalent. See serverport for an example entry.

If security is an issue at your site, be sure to check ALL settings, esp. UidRange and GidRange. Also, correctly configure AdminUser and WebUser for your environment. Read about them before accepting the defaults.

If you want to make changes, see Configuration File for more info.

4) Configure the $RUSH_DIR/etc/hosts file

It should contain the names of all hosts that participate in rendering. See Hosts File for more information.

5) Configure the $RUSH_DIR/etc/templates file.

Customize the template render/submit scripts for your local environment. TDs use these templates to create their submit scripts and render scripts via 'rush -tss/-trs', and they will want to inherit settings for typical situations.

6) Configure the $RUSH_DIR/etc/.submit and $RUSH_DIR/etc/.render files.

These files are sourced by the default Submit Script and Render Scripts respectively.

7) Configure daemon to start on boot.

example boot script

8) Configure regular log rotations.

rush -rotate

9) Security issues.

    chmod go-w /usr \
	       /usr/local \
	       /usr/local/rush \
	       /usr/local/rush/bin \
	       /usr/local/rush/bin/* \
	       /usr/local/rush/etc

    chmod 4755 /usr/local/rush/bin/rush \
	       /usr/local/rush/bin/rushd

    chown 0.0 /usr/local/rush/bin/rush \
	      /usr/local/rush/bin/rushd

Network Install

Basically, you want to rdist(1) the /usr/local/rush directory to all the machines, start the daemons, and verify they're running. It's recommended you start the rush daemon after the boot scripts have enabled networking, but BEFORE enabling nfs and rpc services.


    # LINUX
    foreach i ( linux1 linux2 linux3 linux4 )
        echo -n Working on ${i}: dist..
        rdist -c /usr/local/rush             ${i}:/usr/local/rush
	rdist -c /usr/local/rush/etc/S99rush ${i}:/etc/rc.d/init.d/rush
	echo -n rc3..
	rsh $i ln -s /etc/rc.d/init.d/rush /etc/rc.d/rc2.d/S29rush
	echo -n rc5..
	rsh $i ln -s /etc/rc.d/init.d/rush /etc/rc.d/rc5.d/S29rush
	echo -n daemon..
	rsh $i /etc/rc.d/init.d/rush start
    end

    # IRIX
    foreach i ( octane1 octane2 octane3 octane4 )
        echo -n Working on ${i}: dist..
        rdist -c /usr/local/rush             ${i}:/usr/local/rush
	rdist -c /usr/local/rush/etc/S99rush ${i}:/etc/init.d/rush
	echo -n rc..
	rsh $i ln -s /etc/init.d/rush /etc/rc2.d/S35rush
	echo -n daemon..
	rsh $i /etc/init.d/rush start
    end

Now verify all the daemons have started.

    rush -ping +any           # pings all daemons in rush/etc/hosts

NT Installation Instructions

TBD

FAQ - Frequently Asked Questions

TD Questions

How can I use padded frame numbers (0000) in my render script?

Use $RUSH_PADFRAME, it is created for you automatically.
However, you may want to do your own frame number padding, so you can use this unix technique:

set padframe = `perl -e 'printf("%04d",$ENV{RUSH_FRAME});'`
To use different padding widths, just change the '4' (in '%04d') to a different number.

My renders are coming up 'FAIL'. How do I figure out what's wrong?

Check the frame logs being generated by your render script.
Frame logs contain the error messages from each rendered frame which should help you determine the problem. Make sure your submit script has logdir pointing to a valid directory, which is where your frame logs can be found.
Also, make sure your render script is returning the proper exit code. The most common problem is a render script that does not properly handle returning exit codes. Your render script must 'exit 0' for a frame to show up 'DONE' in the frame list. Make sure your script is properly checking the error returns from your renderer, and translating them into the codes rush expects. See Render Scripts for more.

How do I have rush automatically retry frames? How do I set the number of retrys?

See Retrying Frames.

My job isn't starting renders on my cpus. What's going on?

Use 'rush -lc' and check the Notes column for messages.
If you know the remote cpus aren't just busy with other jobs, then list your cpus and check the 'NOTES' column to see if the system is giving you reasons why your cpus are getting rejected.
The job might be in Pause, there are no more frames to render, all the available machines don't have as much ram as your job needs, etc. Here are some typical situations:

[erco@howland]% rush -lc
CPUSPEC            STATE  FRM  PID   ELAPSED  NOTES
placid=3@100k      Idle   -    -     00:04:37 Job state is 'Pause'
tahoe=1@1          Idle   -    -     00:02:08 No more frames
superior=1@1       Idle   -    -     00:02:08 Not enough ram
waccubuc=1@1       Idle   -    -     00:02:08 This is a 'neverhost'
ontario=1@1        Idle   -    -     00:02:08 Failed 'criteria' check

How do I setup my submit script to only render on certain platforms or operating systems?

Use the Criteria submit script command.
This command allows you to build a list of platforms, operating systems, or other general critera to limit which machines will run your renders.
You can see the different criteria names in the output of 'rush -lac'. It is up to your sysadmin to maintain the criteria names.

How can I render several frames in one process using rush?

With clever scripting. See Batching Multiple Frames for how to render several frames at a time.
Sometimes it pays to render several frames at a time rather than one at a time, to decrease the amount of time the renderer spends loading files.
If you have existing script filters which monitor the progress of renders to determine which frames are rendering, you can probably easily modify these scripts to work with rush to reflect changes in the frame list, using either frame notes (rush -notes) or frame state change operations (rush -que/rush -done).

My job has its 'k' flag set; why isn't it bumping off other jobs' frames?

For a job to bump another off a cpu, these things must be true:

A job only bump other jobs of lower priority (ie. not same priority)

A job can't be bumped if almighty flag is set ('a').

A job can't be bumped unless its entry in the -tasklist is either in the Avail or Run state.

When a frame is bumped, the bumped frame will show a message in its frame list indicating the job that bumped it, e.g.:
% rush -lf erie-790
STAT FRAME TRY HOSTNAME PID   ELAPSED  NOTES
Run  0100  0   tahoe    10290 00:00:26 
Run  0101  0   tahoe    10291 00:00:26 
Que  0102  1   tahoe    10292 00:00:09 Bumped by ralph's superior-791,KILLER @300ka
Que  0103  0   -        0     00:00:00 
[..]

Is there an easier way to set the RUSH_JOBID environment variable?

You can use eval `submit` to automatically set it, or a simple alias to set it manually. However, cut and pasting the setenv command is not so hard.
Some people like to use this alias to make it easy to set new jobid variables:

# Put this in your .cshrc

alias jid 'setenv RUSH_JOBID "\!*"'

Then you can use it on the command line to set one or more jobids:

erco@tahoe % jid tahoe-932 tahoe-933

If you want to have the RUSH_JOBID variable set automatically in your shell whenever you invoke your submit script, then use 'eval':

erco@tahoe % eval `my_submit_script`

..the shell automatically parses the 'setenv RUSH_JOBID' command rush prints on stdout when a job is successfully submitted. Error messages are not affected by 'eval', so you don't have to worry about loosing error messages when using this technique.

How can my render script detect it's being 'bumped' by a higher priority job?
Not without clever scripting.
Usually the desire to do this stems from wanting to clean up left over temporary files generated by renders. In most cases, you can avoid left over files by putting temporary files in $RUSH_TMPDIR, which rush cleans automatically, even after bumps.
Bumps and dumps use SIGKILL to kill the render script and its children. This signal is NOT trappable. There's a reason:
Under many circumstances SIGTERM, the 'trappable' kill is not effective, especially during heavy rendering, causing bumped frames not to bump, screwing up unattended use, and leaving processors unproductive.
Since bumps can happen just as readily as dumps, both use SIGKILL, untrappable, and always effective (except in pathological cases where the process is hung).
So do not expect to be able to trap interrupts to detect bumps/dumps.

If you need a way to determine if you are re-rendering a frame that was previous killed mid-execution (ie. bumped by a higher priority job), you can put some logic into your render script:
    #!/bin/csh -f
    ..
    if ( -e /somewhere/$RUSH_FRAME.busy ) then
	echo We are picking up a frame that was killed.
	echo Do pickup stuff here..
    endif

    # Create a 'busy' file for this frame
    #    If we are bumped, busy file is left behind 
    #    so that the above logic can detect it.
    #
    touch /somewhere/$RUSH_FRAME.busy
    echo Do rendering here..
    rm -f /somewhere/$RUSH_FRAME.busy
    

Systems Administrator Questions

What's the best way to verify all the daemons are running?

rush -ping +any

This 'pings' all the daemons in the rush host's file with a TCP message.

If the daemon isn't running, tail(1) the daemon's log file in $RUSH_DIR/var/rushd.log.

Is there an example boot script I can use to invoke rush?

$RUSH_DIR/etc/S99rush

Is there a way to partition a network into separate render queue 'domains'?

serverport

$RUSH_DIR/etc/rush.conf

$RUSH_DIR/etc/hosts

For instance, if you have a network of four hosts, A,B,C, D, and don't want A/B's render queue to communicate with the C/D machines, then configure the rush.conf file on the A/B machines:

serverport 696

..and on the C/D machines:

serverport 10002

By doing this, the A/B host's render queue will not communicate with the C/D render queue, and vice versa. Both the daemons and the rush(1) client will refer to these values automatically.

You can use any port numbers, provided they don't conflict with existing networking protocols. Be sure to reserve both sets of numbers you use in your /etc/services file, for documentation purposes.

How do I update changes to the rush hosts file (or rush.conf file) to the network?


    # SEND A NEW rush.conf
    foreach i ( `awk '/^[a-z]/{print $1}' /usr/local/rush/etc/hosts` )
       rdist -c /usr/tmp/newconf ${i}:/usr/local/rush/etc/rush.conf
    end

    # SEND A NEW RUSH hosts
    foreach i ( `awk '/^[a-z]/{print $1}' /usr/tmp/newhosts` )
       rdist -c /usr/tmp/newhosts ${i}:/usr/local/rush/etc/hosts
    end

NOTE: When sending out new files, you must use rdist(1), and not cp(1) or rcp(1). rdist(1) uses a special 'tmp-file/rename' technique that prevents the daemon from parsing the file before it's finished being written.

Command Reference

Submit Command Reference

Rush Command Line

rush -ac <cpuspec..> [jobid..]

rush -af <framerange..> [jobid..]

rush -an <hostname|+group..> [jobid..]

rush -autodump <off|done|donefail> [jobid..]

rush -checkconf <filename>

rush -checkhosts <filename>

rush -cont [jobid..]

rush -criteria 'criteria strings' [jobid..]

rush -deltaskfu [..]

rush -dexit [remotehost..]

rush -dexitnow [remotehost..]

rush -dlog [remotehost..]

rush -done <framerange|framestate..> [jobid..]

rush -donemail user[@domain.com[,user..]] [jobid..]

rush -dump [jobid..]

rush -end [jobid..]

rush -fail <framerange|framestate..> [jobid..]

rush [operation] -fu

rush -getoff [remotehost..]

rush -hold <framerange|framestate..> [jobid..]

rush -jobnotes '<notes>' [jobid..]

rush -lac [hostname..]

rush -laj

rush -lajf

rush -lc [jobid..]

rush -lcf [jobid..]

rush -lf [jobid..]

rush -lff [jobid..]

rush -lfi [jobid..]

rush -lj [remotehost..]

rush -ljf [jobid..]

rush -notes <framerange>:'notes..' [jobid..]

rush -offline [remotehost|+group..]

rush -online [remotehost|+group..]

rush -pause [jobid..]

rush -ping [remotehost|+group..]

rush -priority [jobid..]

rush -que <framerange|framestate..> [jobid..]

rush -ram <ramval> [jobid..]

rush -rc <cpuspec|tidspec|hostname> [jobid..]

rush -reorder <framerange..> [jobid..]

rush -rf <framerange..> [jobid..]

rush -rn <hostname|+group..> [jobid..]

rush -rotate [remotehost|+group..]

rush -status [-s secs] [-c count] [remhost ..]

rush -submit [remotehost]

rush -tasklist [remotehost..]

rush -title <text> [jobid..]

rush -trs

rush -tss

rush -uping [-c count] [remotehost..]

Configuration File $RUSH_DIR/etc/rush.conf

LogFlags

WebUser

Hosts File $RUSH_DIR/etc/hosts

Cpu Accounting File$RUSH_DIR/var/cpu.acct

Cpu Accounting File Example

Process Entries

Utilization Entries

Administration

Unix Installation Instructions

Network Install

NT Installation Instructions

FAQ - Frequently Asked Questions

TD Questions

Systems Administrator Questions

rush -autodump <`off|done|donefail>` [jobid..]

Configuration File
`$RUSH_DIR/etc/rush.conf`

Hosts File
`$RUSH_DIR/etc/hosts`

Cpu Accounting File
`$RUSH_DIR/var/cpu.acct`