I wanted to give a heads up to anyone else having issues with GenArts
Sapphire (I know of at least one other facility) and a failing frames
in Nuke.
The solution, for those not wanting to read through a long rambling
post, is to set the nuke disk cache to the rush temp dir, and everyone
lives happily ever after.
For the long story version read on....
I ran into this really weird issue when we upgraded our farm to Mac OS
X 10.6.4 and our GenArts Sapphire Nuke renders started failing.
We had upgraded to Sapphire v5 previously so I didn't think that was
the issue, but the errors were a mixture of "plugin not installed",
"unknown plugin", "unknown command" and "corrupt nuke script".
Of course I contacted The Foundry and Gen Arts support, but they were
stumped and could not really reproduce the errors (besides the Foundry
releasing new Nuke versions which actually supposedly fixed some
Sapphire issues, but not my failing frames).
What I tried:
I reinstalled Sapphire.
- seemed to work, but would start failing again soon enough
I copied the Sapphire plugin bundle into the Nuke built-in plugins folder
- seemed to work, but would start failing again soon enough
I set the SAPPHIRE_OFX_DIR and the RLM_LICENSE variables in the submit
script and moved the Sapphire bundle properly to our central plugin
fileserver
- seemed to work, but would start failing again soon enough
In conclusion: most of my solutions "seemed to work", but would start
failing again soon enough.
Then I noticed artist clean his local disk cache because his local
renders were failing. So, on a hunch I wrote a simple script to clear
the local disk cache on the render nodes and set it up with a RUSH
submit-generic to allow the artists to get their renders to stop
failing frames. And it worked. When a render would start failing frames
they would run the submit-generic script and the renders would work
again.
The problem was it a manual procedure and the artists did not find it
simple enough. Fair enough, it was a workaround. But I couldn't
automate since if one person clear the cache on a node while other
renders were running they would fail their frames also. I did not want
to set it up as a pre or post render action for that reason.
The other solution was to go back to rendering as unique users, instead
of forcing renders to render as one user (set in rush.conf). But I got
linux, windows and mac renders all working as the same user, so I
didn't want to change that now.
The best solution to all this was Greg Ercolano's idea to tie the nuke
temp directory to the rush temp directory. Since each launch of the
submit process brought a new rush process with its own temp dir that
would be useful to stash the nuke disk cache there also. and rush
cleans up after it's done and that solves the need to run scripts
afterwards to clean up.
|