|
Use $RUSH_PADFRAME, it is created for you automatically to do 4 digit padding.To do your own custom frame number padding, use this unix technique:
set padframe = `perl -e 'printf("%04d",$ENV{RUSH_FRAME});'`
To use different padding widths, just change the '4' (in '%04d') to a different number.
The most common problem is a render script that does not properly handle returning exit codes. Make sure your render script is correctly returning an appropriate render script exit code: 0=OK, 1=FAIL, 2=RETRY.Also, check the frame logs being generated by your render script. Frame logs contain the error messages for each rendered frame which should help you determine the problem. Make sure your submit script has logdir pointing to a valid directory, which is where your frame logs will be found.
See Retrying Frames.
Use 'rush -lc' and check the Notes column for messages.If you know the remote cpus aren't just busy with other jobs, then list your cpus and check the 'NOTES' column to see if the system is giving you reasons why your cpus are getting rejected.
The job might be in Pause, there are no more frames to render, all the available machines don't have as much ram as your job needs, etc. Here are some typical situations:
[erco@howland]% rush -lc CPUSPEC[HOST] STATE FRM PID JOBTID ELAPSED NOTES placid=3@100k Idle - - 1 00:04:37 Job state is 'Pause' tahoe=1@1 Idle - - 2 00:02:08 No more frames superior=1@1 Idle - - 3 00:02:08 Not enough ram waccubuc=1@1 Idle - - 4 00:02:08 This is a 'neverhost' ontario=1@1 Idle - - 5 00:02:08 Failed 'criteria' check
Use the Criteria submit script command.This command allows you to build a list of platforms, operating systems, or other general critera to limit which machines will run your renders.
You can see the different criteria names in the output of 'rush -lah'. It is up to your sysadmin to maintain the criteria names.
With clever scripting. See Batching Multiple Frames for how to render several frames at a time.Sometimes it pays to render several frames at a time rather than one at a time, to decrease the amount of time the renderer spends loading files.
If you have existing script filters which monitor the progress of renders to determine which frames are rendering, you can probably easily modify these scripts to work with rush to reflect changes in the frame list, using either frame notes (rush -notes) or frame state change operations (rush -que/rush -done).
For a job to bump another off a cpu, these things must be true:When a frame is bumped, the bumped frame will show a message in its frame list indicating the job that bumped it, e.g.:
- A job only bump other jobs of lower priority (ie. not same priority)
- A job can't be bumped if almighty flag is set ('a').
- A job can't be bumped unless its entry in the -tasklist is either in the Avail or Run state.
% rush -lf erie-790 STAT FRAME TRY HOSTNAME PID ELAPSED NOTES Run 0100 0 tahoe 10290 00:00:26 Run 0101 0 tahoe 10291 00:00:26 Que 0102 1 tahoe 10292 00:00:09 Bumped by ralph's superior-791,KILLER @300ka Que 0103 0 - 0 00:00:00 [..]
You can use eval `submit` to automatically set it, or a simple alias to set it manually. However, cut and pasting the setenv command is not so hard.Some people like to use this alias to make it easy to set new jobid variables:
Then you can use it on the command line to set one or more jobids:
If you want to have the RUSH_JOBID variable set automatically in your shell whenever you invoke your submit script, then use 'eval':
..the shell automatically parses the 'setenv RUSH_JOBID' command rush prints on stdout when a job is successfully submitted. Error messages are not affected by 'eval', so you don't have to worry about loosing error messages when using this technique.
What does 'rush' stand for?Rush is not an acronym, though surely there are some TDs that would like to think it stands for "Render, yoU f*cking piece of SH*t".
Can my render script detect being 'bumped' by higher priority jobs?Not without clever scripting.
Usually the desire to do this stems from wanting to clean up left over temporary files generated by renders. In most cases, you can avoid left over files by putting temporary files in $RUSH_TMPDIR, which rush cleans automatically, even after bumps.
Bumps and dumps use SIGKILL to kill the render script and its children. This signal is NOT trappable. There's a reason:
Under many circumstances SIGTERM, the 'trappable' kill is not effective, especially during heavy rendering, causing bumped frames not to bump, screwing up unattended use, and leaving processors unproductive.
Since bumps can happen just as readily as dumps, both use SIGKILL, untrappable, and always effective (except in pathological cases where the process is hung).
So do not expect to be able to trap interrupts to detect bumps/dumps.
If you need a way to determine if you are re-rendering a frame that was previous killed mid-execution (ie. bumped by a higher priority job), you can put some logic into your render script:
#!/bin/csh -f .. if ( -e /somewhere/$RUSH_FRAME.busy ) then echo We are picking up a frame that was killed. echo Do pickup stuff here.. endif # Create a 'busy' file for this frame # If we are bumped, busy file is left behind # so that the above logic can detect it. # touch /somewhere/$RUSH_FRAME.busy echo Do rendering here.. rm -f /somewhere/$RUSH_FRAME.busy
Can I chain separate jobs together, so that one waits for the other to get done?Yes, see the submit script command WaitFor to have a job wait for others to dump before starting.Also, see DependOn to have a job wait for frames in another job to get done, ie. rather than wait for the entire job to complete.
Is it possible to use negative frame numbers in rush?No. You are evil.If you are trying to include 'handles' and 'slates' by using negative numbers, don't.
Is there a way to see just the cpus busy running my job?Yes. In unix:rush -lc | grep Busy rush -lf | grep Run..and on WinNT, if you don't have grep(1):
rush -lc | findstr Busy rush -lf | findstr Run
Is there a way to see what jobs a machine is busy rendering?In unix:rush -tasklist host | grep Busy..and on WinNT, if you don't have grep(1):
rush -tasklist host | findstr Busy
Is there a way to requeue a busy frame for a host that is down?If a machine goes down while rendering a frame, the frame stays in the Busy state until the machine is rebooted. Once rush realizes the remote machine rebooted, it requeues the frame.But if the machine never reboots, the frame will stay in the Busy state indefinitely, unless you take the following action.
Assuming you're *sure* the machine is down, and not just 'slow', use the following command:
% rush -down hosta hostb..where 'hosta' is the name of the machine that is down, and 'hostb' is the name of the machine that's the server for the job(s) with the hung frame(s).
Beware; if the remote machine is not really down, and is still running the frame, doing the above will start the frame running on another machine, and the two frames will overwrite each other.
How do I list all the machines in a hostgroup?Just grep the output of 'rush -lah', or parse the contents of the $RUSH_DIR/etc/hosts file.For instance, to print all the hosts in the "+foo" hostgroup:
rush -lah | grep +foo..or to precisely parse them from the hosts file with awk (which you should be able to cut and paste into a unix tcsh shell):awk 'BEGIN { s="+foo"; } \ { if (match($0,"^#")) next; n = split($5,arr,","); \ for (i=1; i<=n; i++) { \ if(arr[i]==s) { print $1; break; } \ } \ }' < /usr/local/rush/etc/hosts
Systems Administrator Questions
- What does 'rresvport(): Permission denied' mean?
- What's the best way to verify all the daemons are running?
- How do I stop/start the daemons? (Unix/NT)
- Is there an example boot script I can use to invoke rush?
- Is there a way to run 'rush -online' automatically when someone logs out?
- Is there a way to run 'rush -online' automatically when someone's screensaver pops on?
- What kinds of security issues are there with rush?
- How do I update changes to the rush hosts file (or rush.conf file) to the network?
- Is there a way to track who's jobs are bump who?
- Is there a way to track who's changing other people's jobs?
- Can rush be told to use a different network interface, other than the machine's hostname?
- Where can I get perl for windows?
What does 'rresvport(): Permission denied' mean?
Usually one gets this error in the context of running 'rush' from the command line:
% rush -ping tahoe: rush: rresvport(): Permission deniedRush uses a reserved port to communicate with the daemon, and therefore needs to run SUID root.Make sure the SUID bit is on for the rush(1) binary, and the owner is root:
chmod 4755 /usr/local/rush/bin/rush chown 0.0 /usr/local/rush/bin/rush
What's the best way to verify all the daemons are running?
Use:
rush -ping +any
This 'pings' all the daemons in the $RUSH_DIR/etc/hosts file with a TCP message.
If the daemon isn't running, tail(1) the daemon's log file in $RUSH_DIR/var/rushd.log.
How do I stop/start the daemons? (Unix/NT)
Irix /etc/init.d/rush stop
/etc/init.d/rush startLinux/RedHat 6.x /etc/rc.d/init.d/rush stop
/etc/rc.d/init.d/rush startWindows NT NET STOP RUSHD
NET START RUSHDAll the daemons can be stopped via:
rush -dexit +any
Is there an example boot script I can use to invoke rush?
Is there a way to run 'rush -online' automatically when someone logs out?
Yes; when a user logs out of the window manager, the sysadmin can configure the following files to run 'rush -online':
Irix /usr/lib/X11/xdm/Xreset Linux/RedHat 6.x /etc/X11/xdm/TakeConsole A literal example of what should be added to these files would be:
/usr/local/rush/bin/rush -online
logger -t RUSH "Rush online (user logout)"Use of logger(1) is optional; it leaves an audit trail in the syslog. Include full path to logger(1) if security is an issue.
Is there a way to run 'rush -online' automatically when someone's screensaver pops on?
There probably is, but I don't know how to do it.
If you have any suggestions on how to do it on various platforms, please send me email.
What kinds of security issues are there with rush??
To avoid root loopholes, be sure all subdirs in the path to the setuid binaries and config files have tight permissions, eg. if rush is installed in /usr/local/rush/bin:
chmod go-w /usr \ /usr/local \ /usr/local/rush \ /usr/local/rush/bin \ /usr/local/rush/bin/* \ /usr/local/rush/etc \ /usr/local/rush/var \ /usr/local/rush/var/* chmod 4755 /usr/local/rush/bin/rush chmod 755 /usr/local/rush/bin/rushd chown 0.0 /usr/local/rush/bin/rush \ /usr/local/rush/bin/rushdBy default, rush uses reserved port 696 to communicate udp/tcp packets. For secure networks, make sure users do not have access to root to avoid renegade software from exploiting the port.Rush daemons will not run any job as a uid or gid less than 100. You can further restrict which uids/gids rush can run processes as via UidRange and GidRange or even ForceUid/ForceGid.
Rush daemons will only trust remote machines that are configured in its host list. Rush will log all connection attempts from machines not configured in the hosts file. Sysadmins can grep the rushd.log files for the string 'SECURITY' to detect security related problems.
How do I update changes to the rush hosts file (or rush.conf file) to the network?
You should use rdist(1), and the changed files will be picked up automatically by the daemons within a minute. Here's some examples:
# SEND A NEW rush.conf foreach i ( `awk '/^[a-z]/{print $1}' /usr/local/rush/etc/hosts` ) rdist -c /usr/tmp/newconf ${i}:/usr/local/rush/etc/rush.conf end # SEND A NEW RUSH hosts foreach i ( `awk '/^[a-z]/{print $1}' /usr/tmp/newhosts` ) rdist -c /usr/tmp/newhosts ${i}:/usr/local/rush/etc/hosts end
NOTE: When sending out new files, you must use rdist(1), and not cp(1) or rcp(1). rdist(1) uses a special 'tmp-file/rename' technique that prevents the daemon from parsing the file before it's finished being written.
Is there a way to track who's jobs are bump who?
Grep the $RUSH_DIR/var/rushd.log file for BUMP messages.
Is there a way to track who's changing other people's jobs?
Grep the $RUSH_DIR/var/rushd.log file for SECURITY messages.
Can rush be told to use a different network interface, other than the machine's hostname?
Yes. In the rush hostlist, the hostname can actually be a pair of hostnames separated by a ':', eg. tahoe:tahoe-eth.
The name on the left of the ':' is the familiar hostname(1) of the machine, and the name that follows the ':' is the alternate network interface you want to use.
See also the Hosts File: Hostname section on the hostname field.
Where should I get perl for windows?
It is highly recommended you use ActiveState Perl.
It's definitely the best. Both well integrated and documented specifically for the windows platform. Highly cross platform compatible, with excellent windows-specific modules, and many of the standard internet modules, including Mail/FTP/NNTP, etc.
I've personally tested and used it extensively in various production enviroments and found it to be the most stable perl available.
It's a free download.