From: Dylan Penhale <dylan@(email surpressed).au>
Subject: Requeing failed frames from a batch
   Date: Thu, 09 Mar 2006 19:04:46 -0500
Msg# 1252
View Complete Thread (4 articles) | All Threads
Last Next
I know this has been covered before but I don't seem able to find it for the life of me.

When submitting batches of frames to render in maya I understand that the batch command is under the control of mayabatch, with rush waiting for an exit code. Therefore if mayabatch was to fail a frame for whatever reason, rush wouldn't know which frame had failed, it would only know there was a problem due to the non zero exit code* We can however assume that if the batch of frames was to fail it would fail all frames after the failure (it wouldn't fail a frame in the middle and then continue to render the remaining frames successfully).

*this said we sometimes see maya exit with a code of zero when no cameras are set in the scene and no image has been rendered. Technically there is no error, so we parse the log file for this error and fail the batch.

I need to find a way to re-submit/re-queue failed frames when submit from batches.

My thoughts:

o monitor the logs and append all successful frames to a temp file, then calculate the missing frames and resubmit as a separate job. Not sure how this could be done though, I thought perhaps a waitfor job submitted at the same time, but if the batch job fails then the waitfor job would never get to run.

o monitor the logs as they are being written and log failed frames to a temp file. Then append a "check" frame at the end of the job to do the re-queue function acting on that file. Sounds tricky too.

o upon completion of each batch check the logs for output lines and do a file size check on each frame. If the number of frames in the batch doesn't equal the number of frames in the batch then re-queue from the failed frame. This is kind of what we do already, we already check the frames with image size check, but we need to re-queue next. This method only checks frames that are output into the log file though, and doesn't know about possible missing frames. It also doesn't deal with a job that may hang.

It's easy to re-gueue the whole batch but if only one frame has failed it's re-rendering all the good frames again. I wonder if anyone is running similar checking on batches, or have any clever way of dealing with failed batches?


_________________________________________

Dylan Penhale
Systems Administrator
Fuel International




Last Next