From: Mat X <info@matx.ca> Subject: Nuke GenArts Sapphire renders failing Date: Mon, 07 Feb 2011 22:53:30 -0800 |
Msg# 2010 View Complete Thread (6 articles) | All Threads Last Next |
I wanted to give a heads up to anyone else having issues with GenArts
Sapphire (I know of at least one other facility) and a failing frames
in Nuke.
The solution, for those not wanting to read through a long rambling post, is to set the nuke disk cache to the rush temp dir, and everyone lives happily ever after. For the long story version read on....I ran into this really weird issue when we upgraded our farm to Mac OS X 10.6.4 and our GenArts Sapphire Nuke renders started failing. We had upgraded to Sapphire v5 previously so I didn't think that was the issue, but the errors were a mixture of "plugin not installed", "unknown plugin", "unknown command" and "corrupt nuke script". Of course I contacted The Foundry and Gen Arts support, but they were stumped and could not really reproduce the errors (besides the Foundry releasing new Nuke versions which actually supposedly fixed some Sapphire issues, but not my failing frames). What I tried: I reinstalled Sapphire. - seemed to work, but would start failing again soon enough I copied the Sapphire plugin bundle into the Nuke built-in plugins folder - seemed to work, but would start failing again soon enoughI set the SAPPHIRE_OFX_DIR and the RLM_LICENSE variables in the submit script and moved the Sapphire bundle properly to our central plugin fileserver - seemed to work, but would start failing again soon enoughIn conclusion: most of my solutions "seemed to work", but would start failing again soon enough. Then I noticed artist clean his local disk cache because his local renders were failing. So, on a hunch I wrote a simple script to clear the local disk cache on the render nodes and set it up with a RUSH submit-generic to allow the artists to get their renders to stop failing frames. And it worked. When a render would start failing frames they would run the submit-generic script and the renders would work again. The problem was it a manual procedure and the artists did not find it simple enough. Fair enough, it was a workaround. But I couldn't automate since if one person clear the cache on a node while other renders were running they would fail their frames also. I did not want to set it up as a pre or post render action for that reason. The other solution was to go back to rendering as unique users, instead of forcing renders to render as one user (set in rush.conf). But I got linux, windows and mac renders all working as the same user, so I didn't want to change that now. The best solution to all this was Greg Ercolano's idea to tie the nuke temp directory to the rush temp directory. Since each launch of the submit process brought a new rush process with its own temp dir that would be useful to stash the nuke disk cache there also. and rush cleans up after it's done and that solves the need to run scripts afterwards to clean up. |
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: Nuke GenArts Sapphire renders failing Date: Tue, 08 Feb 2011 10:43:10 -0500 |
Msg# 2011 View Complete Thread (6 articles) | All Threads Last Next |
|
From: Craig Allison <craigallison@(email surpressed)> Subject: < MinTime then ReQueue Date: Wed, 16 Feb 2011 12:03:56 -0500 |
Msg# 2019 View Complete Thread (6 articles) | All Threads Last Next |
Hey Greg For the last couple of years I've been using Rush to create standardised production QuickTimes from rendered frames, email the relevant groups, publish frames, move data around the network etc and it's been working really well overall, but I'm getting a reoccurring problem with the initial render where it's immediately moving to state "Done" without the render ever happening, when I click "Que" the job will then render properly without complaint. As the time elapsed on these problem jobs shows as 00:00:00 I thought there might be a way of saying if job < 00:00:01 then ReQueue? Regards Craig Craig Allison Digital Systems & I/O Manager The Senate Visual Effects Twickenham Film Studios St. Margarets Middlesex TW1 2AW +44208 607 8866 craigallison@(email surpressed) skype: craig_9000 |
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: < MinTime then ReQueue Date: Wed, 16 Feb 2011 12:19:05 -0500 |
Msg# 2020 View Complete Thread (6 articles) | All Threads Last Next |
Craig Allison wrote: > For the last couple of years I've been using Rush to create standardised = > production QuickTimes from rendered frames, email the relevant groups, = > publish frames, move data around the network etc and it's been working = > really well overall, but I'm getting a reoccurring problem with the = > initial render where it's immediately moving to state "Done" without the = > render ever happening, when I click "Que" the job will then render = > properly without complaint. Hmm, can you include some more info.. when this happens, paste me: 1) The 'Frames' report for the first time it says 'Done' (but no render). I'd like to see what machine it picked up on where it quickly became "done". 1a) Is a frame log generated for that frame? If so, paste that here too. 2) The 'Jobs Full' report for this job. 3) The script that submits the job, and the script (if separate) that renders it. It's possible the script is submitting the job in such a way that the frame is forced to start in the 'Done' state on submit. (It is possible to do this) 4) The rushd.log from the machine that took the render the first time (where the render time is 00:00:00) > As the time elapsed on these problem jobs shows as 00:00:00 I thought = > there might be a way of saying if job < 00:00:01 then ReQueue? You could make a done command that checks this and requeues it, but before you try covering up the problem, let's first try to determine the cause. -- Greg Ercolano, erco@(email surpressed) Seriss Corporation Rush Render Queue, http://seriss.com/rush/ Tel: (Tel# suppressed)ext.23 Fax: (Tel# suppressed) Cel: (Tel# suppressed) |
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: < MinTime then ReQueue Date: Thu, 17 Feb 2011 12:33:34 -0800 |
Msg# 2022 View Complete Thread (6 articles) | All Threads Last Next |
Craig has followed up with me offline on this; seems it might just be a small omission in script he's working with. The script writes out a shake file then submits a shake job to render it. The problem just might be an issue of a missing close() in the script; after writing the shake file it submits the job. But without the close(), if the job picks up quickly, shake will see an empty file, and finishes immediately with the frame "Done". The intermittent behavior seems due to how quickly the job picks up; if it picks up quickly, shake sees an empty file. But if it takes an extra second or two to pick up, the submit script finishes executing, automatically closing the file, flushing it to disk.. then when the job kicks in, shake reads the proper file. This would also explain why re-que'ing the frame always renders successfully. Adding a close() to the custom script will likely fix the problem. Otherwise, if there's still trouble, I'd suggest experimenting with sync(1) and/or fsync(2) to ensure the OS commits the file to the remote server before continuing. (But that really shouldn't be necessary.) |
From: Craig Allison <craigallison@(email surpressed)> Subject: Re: < MinTime then ReQueue Date: Fri, 18 Feb 2011 07:46:14 -0500 |
Msg# 2023 View Complete Thread (6 articles) | All Threads Last Next |
I can confirm that the close () command has fixed the issue, haven't had a problem since. Awesome work once again Greg! Thank you Craig Craig Allison Digital Systems & I/O Manager The Senate Visual Effects Twickenham Film Studios St. Margarets Middlesex TW1 2AW +44208 607 8866 craigallison@(email surpressed) skype: craig_9000 On 17 Feb 2011, at 20:33, Greg Ercolano wrote: > [posted to rush.general] > > Craig has followed up with me offline on this; seems it might just be > a small omission in script he's working with. > > The script writes out a shake file then submits a shake job to render it. > > The problem just might be an issue of a missing close() in the script; > after writing the shake file it submits the job. But without the close(), > if the job picks up quickly, shake will see an empty file, and finishes > immediately with the frame "Done". > > The intermittent behavior seems due to how quickly the job picks up; > if it picks up quickly, shake sees an empty file. But if it takes > an extra second or two to pick up, the submit script finishes executing, > automatically closing the file, flushing it to disk.. then when the job > kicks in, shake reads the proper file. > > This would also explain why re-que'ing the frame always renders successfully. > > Adding a close() to the custom script will likely fix the problem. > Otherwise, if there's still trouble, I'd suggest experimenting with sync(1) > and/or fsync(2) to ensure the OS commits the file to the remote server before > continuing. (But that really shouldn't be necessary.) > |