From: Gary Jaeger <gary@(email surpressed)>
Subject: aerender CS5 write permissions
   Date: Mon, 07 Mar 2011 12:37:23 -0500
Msg# 2041
View Complete Thread (7 articles) | All Threads
Last Next
So we're having an odd thing with CS 5:

aerender Error: After Effects error: Error in output for render queue item 1, output module 1. Can not create a file in directory /foo/bar Try checking write permissions.
aerender version 10.0x458

what makes it odd is that in fact those machines have no trouble writing files to that directory. In fact they are writing files just fine in other frame batches. It seems random, and re-queing (sometimes more than once) it will eventually "stick" and render out fine, often with the same machine that gave the earlier permissions error. It appears random, though I think it's happened to every machine we have at least once.

Does aerender prefer to not have multiple instances running at the same time? Can't quite figure out why this would happen.

. . . . . . . . . . . .
Gary Jaeger // Core Studio
86 Graham Street, Suite 120
San Francisco, CA 94129
(Tel# suppressed)
http://corestudio.com	


   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: aerender CS5 write permissions
   Date: Mon, 07 Mar 2011 13:16:40 -0500
Msg# 2042
View Complete Thread (7 articles) | All Threads
Last Next
Gary Jaeger wrote:
> [posted to rush.general]
> 
> So we're having an odd thing with CS 5:
> 
> aerender Error: After Effects error: Error in output for render queue =
> item 1, output module 1. Can not create a file in directory /foo/bar Try =
> checking write permissions.
> aerender version 10.0x458
> 
> what makes it odd is that in fact those machines have no trouble writing =
> files to that directory. In fact they are writing files just fine in =
> other frame batches. It seems random, and re-queing (sometimes more than =
> once) it will eventually "stick" and render out fine, often with the =
> same machine that gave the earlier permissions error. It appears random, =
> though I think it's happened to every machine we have at least once.
> 
> Does aerender prefer to not have multiple instances running at the same =
> time? Can't quite figure out why this would happen.

	Permission issues? Hmm, tell me more about your render node/file server
	config:

		1) What OS on the file server (Mac OSX 10.??, Linux, Netapp, etc)
		2) What OS on the render nodes?
		3) What file network system (NFS, SMB/CIFS, AFP, etc)

	I've seen this problem appear with AE only where the issue is with
	the file server, and not an AE problem. (AE just seems to be good
	at exacerbating it). In particular with the file server is OSX
	(specifically, Snow Leopard) with both NFS and Samba.

	In the case of NFS, checking the system.log showed weird errors from,
	IIRC, gssd, that lead to some kind of default behavior of stock OSX nfs
	which tries to involve kerberos, and under very light load would fail,
	causing random perm errors. If you have OSX as your file server, check
	look in your system.log (and perhaps others in /var/log) for OS errors
	around the time of the failures (use the 'Frames' report 'Start Time'
	column to determine the approx time the frame ran and failed to correlate)

	In the case of SMB, check your samba logs on the server for any weird errors
	on or around the time of the failed render.

	BTW, paste the complete frame log here showing the perm error
	so we can see some context.

-- 
Greg Ercolano, erco@(email surpressed)
Seriss Corporation
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)ext.23
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)

   From: Gary Jaeger <gary@(email surpressed)>
Subject: Re: aerender CS5 write permissions
   Date: Tue, 08 Mar 2011 11:36:43 -0500
Msg# 2043
View Complete Thread (7 articles) | All Threads
Last Next
On Mar 7, 2011, at 10:16 AM, Greg Ercolano wrote:

1) What OS on the file server (Mac OSX 10.??, Linux, Netapp, etc)

OSX 10.6 Server

2) What OS on the render nodes?

10.6

3) What file network system (NFS, SMB/CIFS, AFP, etc)

AFP

I'll get you the log asap. 

. . . . . . . . . . . .
Gary Jaeger // Core Studio
86 Graham Street, Suite 120
San Francisco, CA 94129
415 543 8140


   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: aerender CS5 write permissions
   Date: Tue, 08 Mar 2011 12:16:08 -0500
Msg# 2044
View Complete Thread (7 articles) | All Threads
Last Next
Gary Jaeger wrote:
> On Mar 7, 2011, at 10:16 AM, Greg Ercolano wrote:
> 
>> 1) What OS on the file server (Mac OSX 10.??, Linux, Netapp, etc)
> 
> OSX 10.6 Server
> 
>> 2) What OS on the render nodes?
> 
> 10.6

	So far, sounds like the exact combo where I've seen be a problem.

	I assume you've been using AE for a while, and only noticed this
	recently. Did the problem show up after a recent upgrade of either
	the server to 10.6, or AE to CS5?

	By the sounds of it (random permission errors), it sounds like
	the file system or operations related to the file system
	(eg. authentication)	

>> 3) What file network system (NFS, SMB/CIFS, AFP, etc)
> 
> AFP

	Hmm, first time of heard of the issue with AFP, so there
	might be a subtle difference.

	Funny thing is, it's only been in the last few weeks this
	problem was popping up, I can't tell if it's something that
	changed in AE that exacerbates the problem, or a 10.x update.
	My bet is on a 10.x update, since the time scale is so narrow,
	and the common factor has been a 10.6 server.

	Sniff around in the server logs (and perhaps, client logs as well)
	to see if there are any complaints about the network file system.

	Due to the intermittent aspect of the permission related errors,
	it would be random behavior of the either the network file system,
	or related subsystems eg. authentication (LDAP, NIS, Kerberos..)

-- 
Greg Ercolano, erco@(email surpressed)
Seriss Corporation
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)ext.23
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)

   From: Victor DiMichina <victor@(email surpressed)>
Subject: Re: aerender CS5 write permissions
   Date: Tue, 08 Mar 2011 22:31:00 -0500
Msg# 2045
View Complete Thread (7 articles) | All Threads
Last Next
Gary,  I knew this sounded familiar.   I have what sounds like the exact setup as you,  OS X servers,  AFP shares, AE,  etc.

I got those permissions problems you described,  and with the help of a certain perl expert I know (cough cough...greg...cough),   I put the following into my submit.afterfx.pl to parse for that error.    It's since become a distant memory.     I actually had this in previous versions of AE as well.    

I could sit with you and discuss *many* things about aerender that bother me,  it seems to get worse with each version.   

# INVOKE AERENDER, CHECK FOR ERRORS
print "\nExecuting: $command\n";
$exitcode = RunCommand($command, \$errmsg);
if ( $exitcode == 9 )
   {
   # VICTOR WANTS CHECK FOR "FALSE ERRORS" FROM AE
   #    "there is an annoying failure that AE exits on,  and it actually does
   #    fail the frame. It's a false error,  just ae being stupid.   It says
   #    "Can not create a file in directory /blah/blah.  Try checking write permissions."
   #    and returns an exit code of 9.  I don't want to auto-requeue on every exit code 9,
   #    just the one that says "Try checking write permissions."
   my $logmsg = LogCheck("^Executing: ",
   ( 
"Try checking write permissions." 
   ) );
   if ( $logmsg ne "" )
   {
print STDERR "--- AE EXIT 9: FALSE ERROR DETECTED: $logmsg\n";
system("rush -fu -notes $ENV{RUSH_FRAME}:\"RETRY: AE EXIT 9 FALSE ERROR\"");
exit(2);         # RETRY
   }
   }


Best, 

Vic

On Mar 8, 2011, at 8:36 AM, Gary Jaeger wrote:

On Mar 7, 2011, at 10:16 AM, Greg Ercolano wrote:

1) What OS on the file server (Mac OSX 10.??, Linux, Netapp, etc)

OSX 10.6 Server

2) What OS on the render nodes?

10.6

3) What file network system (NFS, SMB/CIFS, AFP, etc)

AFP

I'll get you the log asap. 

. . . . . . . . . . . .
Gary Jaeger // Core Studio
86 Graham Street, Suite 120
San Francisco, CA 94129
415 543 8140



   From: Gary Jaeger <gary@(email surpressed)>
Subject: Re: aerender CS5 write permissions
   Date: Wed, 09 Mar 2011 01:21:08 -0500
Msg# 2046
View Complete Thread (7 articles) | All Threads
Last Next
oh yes, that would a long conversation :)

thanks for that. i'll try dropping that in!

On Mar 8, 2011, at 7:31 PM, Victor DiMichina wrote:

I could sit with you and discuss *many* things about aerender that bother me,  it seems to get worse with each version.   


. . . . . . . . . . . .
Gary Jaeger // Core Studio
86 Graham Street, Suite 120
San Francisco, CA 94129
415 543 8140


   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: aerender CS5 write permissions
   Date: Wed, 09 Mar 2011 06:51:24 -0500
Msg# 2047
View Complete Thread (7 articles) | All Threads
Last Next
Victor DiMichina wrote:
Gary,  I knew this sounded familiar.   I have what sounds like the exact
setup as you,  OS X servers,  AFP shares, AE,  etc.
    Thanks for the post, Victor!

    I was actually waiting to see Gary's frame log to see if the actual error
    messages had those same 'exit 9' errors.

    Do you know if you were getting these same random errors with file servers
    running Tiger? I'm curious if the problem correlates to newer OSX servers
    (10.6), or if it happened with Tiger/10.4 as well.
I got those permissions problems you described,  and with the help of a
certain perl expert I know (cough cough...greg...cough),   I put the
following into my submit.afterfx.pl <http://submit.afterfx.pl> to parse
for that error.    It's since become a distant memory.
    It's a reasonable workaround, but be careful:

    In cases where there might actually be a /real/ permission error,
    the logic shown would just keep retrying. But based on Victor's request,
    we coded it that way.

    The 'exit 9' might make it unique, but I don't think we tested for
    the condition of an actual perm error to see if it throws the same error code.

    You might want to have it fail after some number of retries
    so it doesn't retry forever. And perhaps a sleep(5) in there too
    so that it doesn't spin.

    If you wanted that behavior, in place of these three lines:


        print STDERR "--- AE EXIT 9: FALSE ERROR DETECTED: $logmsg\n";
        system("rush -fu -notes $ENV{RUSH_FRAME}:\"RETRY: AE EXIT 9 FALSE ERROR\"");
        exit(2);         # RETRY


    ..you could add this extra logic (in red):
   

        if ( $ENV{RUSH_TRY} < 10 )         # limit to x10 retries
        {
            print STDERR "--- AE EXIT 9: FALSE ERROR DETECTED: $logmsg\n";
            system("rush -fu -notes $ENV{RUSH_FRAME}:\"RETRY: AE EXIT 9 FALSE ERROR\"");
            sleep(5);                      # prevent spin
            exit(2);                       # retry (up to 10 times)
        }
        print STDERR "--- AE EXIT 9: FAILING AFTER 10 RETRIES\n";
        system("rush -fu -notes $ENV{RUSH_FRAME}:\"FAIL AFTER 10 TRYS: AE EXIT 9 FALSE ERROR\"");
        exit(1);                           # fail


    In the cases where I've actually looked into these errors with sysadmins,
    the actual file system /was/ throwing real OS errors in the actual OS logs
    (eg. /var/log/system.log)

    In one case I helped troubleshoot where we were testing AE for this intermittent
    perm error:

----
aerender Error: After Effects error: Error in output for render queue item 2, output module 1.
                Can not create a file in directory /snowserver/foo/bar. Try checking write permissions.
----

    ..we found the following error messages in the system.log of the snow lep file server
    while we had an otherwise idle network all to our tests:

----
 Feb 23 15:32:20 snowserver gssd[35670]: Error returned by svc_mach_gss_init_sec_context:
 Feb 23 15:32:20 snowserver gssd[35670]:      Major error = 851968: Unspecified GSS failure.  Minor code may provide more information
 Feb 23 15:32:20 snowserver gssd[35670]:      Minor error = 100006:
 Feb 23 15:40:14 snowserver sshd[35675]: USER_PROCESS: 35680 ttys000
 Feb 23 15:42:23 snowserver gssd[35699]: Error returned by svc_mach_gss_init_sec_context:
 Feb 23 15:42:23 snowserver gssd[35699]:      Major error = 851968: Unspecified GSS failure.  Minor code may provide more information
 Feb 23 15:42:23 snowserver gssd[35699]:      Minor error = 100006:

----

    The timestamps correlated to the random AE perm errors.
    Googling these gssd errors seemed to show being kerberos related (even though
    in our case kerberos was not enabled; these were static NFS mounts with local
    user accounts)

    Google showed many other folks were encountering this random perm behavior
    in other contexts outside render farms.

    These errors were truly random; requeing the same job over and over, clearing
    the output directory before each test, random frames/random machines would
    have this problem. When a machine 'caught' the problem, it would have it several
    times in a row, then it would go back to working again.

    It was interesting that on the same farm, maya renders never had a problem.
    Only AE renders had the issue. So it seemed only AE caused this. But the problem
    was traceable to actual file system errors. (gssd in this case)

    In another case, it was with samba; complaints about oplocks failing causing
    not perm errors, but 'drop outs' in connections, and truncated render logs.
    (This was with maya + windows farm + Snow Leopard server with a samba config)

    An interesting test with folks having this intermittent behavior from AE
    with an OSX file server would be to put the test data on a /non-OSX server/
    (eg. linux) and see if you can replicate the problem.
If you can get random
    perm errors with that too, then that would nail AE being nutty. But if it goes away...
    might be the server! (Apple)

I could sit with you and discuss *many* things about aerender that
bother me,  it seems to get worse with each version.
    Me too, don't get me wrong ;)

    My biggest peeves: inconsistently printing filenames during rendering,
    not printing actual OS error messages, disconnecting processes from the
    process hierarchy (re-parenting AfterFx to launchd! CS4 and CS5),
    interacting with the window manager for command line rendering (!),
    ignoring frame ranges when comp names aren't specified, etc. etc.
    all some of my biggest peeves.
-- 
Greg Ercolano, erco@(email surpressed)
Seriss Corporation
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)ext.23
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)