On 11/03/11 11:03, Lutz Paelike wrote:
> we have sometimes some MaxTime failures in our rush queue and the frames =
> are then killed after MaxTime is reached. This is fine but still every =
> frame is rendered, reaches MaxTime and is finally killed.
> We would like to monitor a job and if more then, let's say 5 frames,
> are killed due to Maxtime the job (or a series of jobs) should be =
> skipped completely and no more frames should be renderered.=20
I'd suggest instead of using MaxTime, to handle this
specific set of circumstances, you'd probably want to
instead put some logic in your script to handle the
more specific behavior you want.
For instance, I could see having your own "Render Max Time:"
field in the submit form that passes the value to the
render script, which in turn would take this value,
fork()s the render off as a child, and then monitors
the execution time of the render.
This way the script can decide if it should kill the render,
and if so, implement its own logic to modify the job.
For instance, I could see logic that adds a job remark
(rush -jobremark) and frame notes (rush -notes) to tell the user
what happened, and have the script then either pause the job (rush -pause)
or have it fail all the Que frames (rush -fail que) so that the job
simply fails itself quickly.
> Because we usually chain several jobs together with the WaitFor command,
> a single jobs with 100 frames reaching MaxTime blocks the renderfarm for =
> several hours which is mostly a problem at night when the farm is not =
> watched.
If you used the above technique to 'Fail' all the Que frames,
then the job would suddenly fail itself, allowing other the
other waitfor jobs to start running.
Just curious though: are you using 'waitfor' to simulate
a FIFO queue? If so, did you rule out using rush's FIFO
scheduling? (eg. 'sched fifo' in the rush.conf file)
Perhaps that's not what you need, but since it sounds like
you want the other jobs to continue if this one keeps hanging,
then I imagine the jobs really shouldn't be dependent on
each other, and perhaps just FIFO scheduled..
> A solution would be to have something like a TimeOutCommand, that
> calls a script that can take appropriate action (This would be on a per =
> frame basis), or even better a general StatusCommand that could be =
> called for every frame, or for every job and additional information =
> could be passed via environment variables.
I think this kind of thing is best done as logic in the
script itself; background the command, and monitor its
execution time.. if it exceeds the max, the script can
choose what to do.
> Since the killing of the process is initiated by rush, my custom render
> script can not detect that it was killed because it reached MaxTime.
Right -- a good reason not to use it in this case,
and use the above instead, I would think.
> The only solution i can think of right now is to go through every job in
> the queue and parse the log files if there is any MAXTIME entry.
I once investigated trying to make a 'callback option'
for maxtime so that when it expires, a script could be
run to do post-kill logic.. but I soon realized there
would need to be all kinds of options to do what someone
would want; run the script BEFORE the kill occurs, or
AFTER it occurs, or have the script decide whether to
kill it or not, etc.
Seemed best to implement such things in the script itself.
--
Greg Ercolano, erco@(email surpressed)
Seriss Corporation
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)ext.23
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)
|