I know this has been covered before but I don't seem able to find it
for the life of me.
When submitting batches of frames to render in maya I understand that
the batch command is under the control of mayabatch, with rush
waiting for an exit code. Therefore if mayabatch was to fail a frame
for whatever reason, rush wouldn't know which frame had failed, it
would only know there was a problem due to the non zero exit code* We
can however assume that if the batch of frames was to fail it would
fail all frames after the failure (it wouldn't fail a frame in the
middle and then continue to render the remaining frames successfully).
*this said we sometimes see maya exit with a code of zero when no
cameras are set in the scene and no image has been rendered.
Technically there is no error, so we parse the log file for this
error and fail the batch.
I need to find a way to re-submit/re-queue failed frames when submit
from batches.
My thoughts:
o monitor the logs and append all successful frames to a temp file,
then calculate the missing frames and resubmit as a separate job. Not
sure how this could be done though, I thought perhaps a waitfor job
submitted at the same time, but if the batch job fails then the
waitfor job would never get to run.
o monitor the logs as they are being written and log failed frames to
a temp file. Then append a "check" frame at the end of the job to do
the re-queue function acting on that file. Sounds tricky too.
o upon completion of each batch check the logs for output lines and
do a file size check on each frame. If the number of frames in the
batch doesn't equal the number of frames in the batch then re-queue
from the failed frame. This is kind of what we do already, we already
check the frames with image size check, but we need to re-queue next.
This method only checks frames that are output into the log file
though, and doesn't know about possible missing frames. It also
doesn't deal with a job that may hang.
It's easy to re-gueue the whole batch but if only one frame has
failed it's re-rendering all the good frames again. I wonder if
anyone is running similar checking on batches, or have any clever way
of dealing with failed batches?
_________________________________________
Dylan Penhale
Systems Administrator
Fuel International
|