From: Lutz Paelike <lp@(email surpressed)> Subject: How to detect and handle Frame MaxTime failures Date: Thu, 03 Nov 2011 14:03:54 -0400 |
Msg# 2142 View Complete Thread (5 articles) | All Threads Last Next |
Hi, we have sometimes some MaxTime failures in our rush queue and the frames are then killed after MaxTime is reached. This is fine but still every frame is rendered, reaches MaxTime and is finally killed. We would like to monitor a job and if more then, let's say 5 frames, are killed due to Maxtime the job (or a series of jobs) should be skipped completely and no more frames should be renderered. Because we usually chain several jobs together with the WaitFor command, a single jobs with 100 frames reaching MaxTime blocks the renderfarm for several hours which is mostly a problem at night when the farm is not watched. A solution would be to have something like a TimeOutCommand, that calls a script that can take appropriate action (This would be on a per frame basis), or even better a general StatusCommand that could be called for every frame, or for every job and additional information could be passed via environment variables. Since the killing of the process is initiated by rush, my custom render script can not detect that it was killed because it reached MaxTime. The only solution i can think of right now is to go through every job in the queue and parse the log files if there is any MAXTIME entry. Am i missing something here or what would be the best approach for this problem? Cheers, Lutz Paelike Pipeline Supervisor D-Facto-Motion GmbH |
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: How to detect and handle Frame MaxTime failures Date: Thu, 03 Nov 2011 18:58:03 -0400 |
Msg# 2144 View Complete Thread (5 articles) | All Threads Last Next |
On 11/03/11 11:03, Lutz Paelike wrote: > we have sometimes some MaxTime failures in our rush queue and the frames = > are then killed after MaxTime is reached. This is fine but still every = > frame is rendered, reaches MaxTime and is finally killed. > We would like to monitor a job and if more then, let's say 5 frames, > are killed due to Maxtime the job (or a series of jobs) should be = > skipped completely and no more frames should be renderered.=20 I'd suggest instead of using MaxTime, to handle this specific set of circumstances, you'd probably want to instead put some logic in your script to handle the more specific behavior you want. For instance, I could see having your own "Render Max Time:" field in the submit form that passes the value to the render script, which in turn would take this value, fork()s the render off as a child, and then monitors the execution time of the render. This way the script can decide if it should kill the render, and if so, implement its own logic to modify the job. For instance, I could see logic that adds a job remark (rush -jobremark) and frame notes (rush -notes) to tell the user what happened, and have the script then either pause the job (rush -pause) or have it fail all the Que frames (rush -fail que) so that the job simply fails itself quickly. > Because we usually chain several jobs together with the WaitFor command, > a single jobs with 100 frames reaching MaxTime blocks the renderfarm for = > several hours which is mostly a problem at night when the farm is not = > watched. If you used the above technique to 'Fail' all the Que frames, then the job would suddenly fail itself, allowing other the other waitfor jobs to start running. Just curious though: are you using 'waitfor' to simulate a FIFO queue? If so, did you rule out using rush's FIFO scheduling? (eg. 'sched fifo' in the rush.conf file) Perhaps that's not what you need, but since it sounds like you want the other jobs to continue if this one keeps hanging, then I imagine the jobs really shouldn't be dependent on each other, and perhaps just FIFO scheduled.. > A solution would be to have something like a TimeOutCommand, that > calls a script that can take appropriate action (This would be on a per = > frame basis), or even better a general StatusCommand that could be = > called for every frame, or for every job and additional information = > could be passed via environment variables. I think this kind of thing is best done as logic in the script itself; background the command, and monitor its execution time.. if it exceeds the max, the script can choose what to do. > Since the killing of the process is initiated by rush, my custom render > script can not detect that it was killed because it reached MaxTime. Right -- a good reason not to use it in this case, and use the above instead, I would think. > The only solution i can think of right now is to go through every job in > the queue and parse the log files if there is any MAXTIME entry. I once investigated trying to make a 'callback option' for maxtime so that when it expires, a script could be run to do post-kill logic.. but I soon realized there would need to be all kinds of options to do what someone would want; run the script BEFORE the kill occurs, or AFTER it occurs, or have the script decide whether to kill it or not, etc. Seemed best to implement such things in the script itself. -- Greg Ercolano, erco@(email surpressed) Seriss Corporation Rush Render Queue, http://seriss.com/rush/ Tel: (Tel# suppressed)ext.23 Fax: (Tel# suppressed) Cel: (Tel# suppressed) |
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: How to detect and handle Frame MaxTime failures Date: Thu, 03 Nov 2011 20:54:18 -0400 |
Msg# 2147 View Complete Thread (5 articles) | All Threads Last Next |
On 11/03/11 15:58, Greg Ercolano wrote: > I'd suggest instead of using MaxTime, to handle this > specific set of circumstances, you'd probably want to > instead put some logic in your script to handle the > more specific behavior you want. > > For instance, I could see having your own "Render Max Time:" > field in the submit form that passes the value to the > render script, which in turn would take this value, > fork()s the render off as a child, and then monitors > the execution time of the render. > > This way the script can decide if it should kill the render, > and if so, implement its own logic to modify the job. As an actual perl coding example, here's a unix-specific technique that defines a function called 'RunCommandMaxTime()' that takes two arguments: the command to run, and the max # seconds. So calling it is as simple as: my $cmd = "yourcommand -arg1 -arg2 .."; # COMMAND TO RUN my $maxsecs = 800; # HOW MANY SECONDS IS 'TOO LONG'.. RunCommandMaxTime($cmd, $maxsecs); What follows is the definition of that function, which you can customize to include whatever post-kill logic you want (see '# ADD POST KILL LOGIC HERE'). You could add this function to the .common.pl file, so that any of the submit scripts could use it if you wanted. --- snip use POSIX; # RUN A COMMAND WITH A MAXIMUM TIME # Unix only. # $1 -- command to run # $2 -- maximum number of seconds command should take before being killed # sub RunCommandMaxTime($$) { my ($cmd, $maxtime) = @_; my $starttime = time(); my $pid = fork(); if ( $pid == -1 ) { # ERROR print "ERROR: fork() failed?! $!\n"; exit(1); } elsif ( $pid == 0 ) { # CHILD PROCESS POSIX::setsid(); exec($cmd); print "ERROR: exec() failed: $!\n"; exit(1); } else { # PARENT -- WATCH CHILD my $childpid = $pid; my $exitstatus = 0; my $killed = 0; while ( 1 ) { # WATCH THE CHILD PROCESS # See if it finished, and if so, reap. # If it didn't, see if maxtime expired. If so, kill and reap. # Otherwise, keep waiting.. # my $kid = POSIX::waitpid($childpid, WNOHANG); # see if child finished if ( $kid > 0 ) { $exitstatus = $?; last; } # finished? reap + break loop # SEE IF MAXTIME EXPIRED if ( ( time() - $starttime ) > $maxtime ) { print STDERR "\n--- MAXTIME EXPIRED! Killing child..\n"; kill(-9, $childpid); # -9 means kill *process group* $killed = 1; # Add logic here that you want to do if maxtime expired } sleep(1); } # CHILD FINISHED if ( $killed ) { print STDERR "--- Render took too long and was killed.\n"; exit(1); } print STDERR "Child finished in time. EXITCODE=" . ($exitstatus >> 8) . " (status=$exitstatus)\n"; } } --- snip PS. If you're instead using windows, you'd have to replace the fork()/exec() stuff with the WIN32 equivalent, which in activestate perl is possible with 'use Win32::Process;' and a combo of Win32::Process::Create() to background the child, and Wait() with some number of seconds, and GetExitCode(). There's actually an example of this in .common.pl To handle killing the process, I would stay away from any of the win32 stuff, and simply call 'rush -fail $ENV{RUSH_FRAME}' to cause the script to commit suicide "cleanly", as the logic for getting that right is tricky to do from a script. If I decide on vacationing in the sixth circle of hell, I can follow up with the WIN32 equivalent code. |
From: Lutz Paelike <lp@(email surpressed)> Subject: Re: How to detect and handle Frame MaxTime failures Date: Fri, 04 Nov 2011 08:31:12 -0400 |
Msg# 2149 View Complete Thread (5 articles) | All Threads Last Next |
Hey Greg, > I'd suggest instead of using MaxTime, to handle this > specific set of circumstances, you'd probably want to > instead put some logic in your script to handle the > more specific behavior you want. Ok i will change my script as you suggested. > If I decide on vacationing in the sixth circle of hell, I can follow up with > the WIN32 equivalent code. Thanks for your example script. If i use perl i will join you on your vacation ;) I will stick to python, these things are nicely encapsuled in the subprocess module. Cheers, Lutz Paelike Pipeline Supervisor D-Facto-Motion GmbH |
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: How to detect and handle Frame MaxTime failures Date: Fri, 04 Nov 2011 12:18:07 -0400 |
Msg# 2150 View Complete Thread (5 articles) | All Threads Last Next |
On 11/04/11 05:31, Lutz Paelike wrote: >> If I decide on vacationing in the sixth circle of hell, I can > follow up with the WIN32 equivalent code. > > Thanks for your example script. > If i use perl i will join you on your vacation ;) Ha, I guess I should have asked you if you were using python. > I will stick to python, these things are nicely encapsuled in the = > subprocess module. That's interesting; I guess you could do a non-blocking read on the subprocess.Popen() pipe, in which case that would probably work OK, because then your read loop wouldn't hang if the program stopped outputting data, so it can detect a timeout. If you can, post a simplified version of what you come up with. If I get a chance, I'll try to post some code that does what I describe above. I think the above technique could have been done in perl, but I didn't investigate non-blocking reads, as I knew waitpid() would work.. but that might be easier. It also gives you the option to parse the output of the render while it runs, so you can catch errors as they happen. Be aware when you 'kill' the render, the renderer MIGHT have started children, so you want to use a process group to be sure to kill not only the immediate child, but all its children too. For sure 'rush -fail $os.environ["RUSH_FRAME"]' would clean all this up for you, killing your own script as well as the render and any of its children. So if you're worried about using kill correctly, you could use that instead. (Just be sure that's the /last/ thing you do, as your script will probably be unceremoniously killed within the next fraction of a second. -- Greg Ercolano, erco@(email surpressed) Seriss Corporation Rush Render Queue, http://seriss.com/rush/ Tel: (Tel# suppressed)ext.23 Fax: (Tel# suppressed) Cel: (Tel# suppressed) |