From: Dylan Penhale <dylan@(email surpressed).au>
Subject: Shake INIT_Processeses problem
   Date: Mon, 24 Jul 2006 22:58:23 -0400
Msg# 1351
View Complete Thread (10 articles) | All Threads
Last Next
Has anyone seen the following error when trying to render shake jobs through rush?

Executing: shake -exec
/var/tmp/.RUSH_TMP.42/re_245_330_x005sc_F003.shk -t 26-26
-proxyscale Base -vv -cpus 2
INIT_Processeses(), could not establish the default
connection to the WindowServer.--- shake: terminated by signal 6

This is only happening on 3 machines, the others are fine.
The 3 machines are able to resolve DNS, and get the UID/GID of the submitting user.
Shake runs fine on these boxes.

I notice that this may be similar to the AE issue listed here: http:// seriss.com/rush-current/issues-afterfx-6.5/index.html

Should I change the shake owner to 0:0 on the problem hosts? I can't figure why only some boxes have the problem.


Regards

Dylan Penhale
Systems Administrator
Fuel International




   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: Shake INIT_Processeses problem
   Date: Mon, 24 Jul 2006 23:16:20 -0400
Msg# 1352
View Complete Thread (10 articles) | All Threads
Last Next
> INIT_Processeses(), could not establish the default
> connection to the WindowServer.--- shake: terminated by signal 6

Sounds like shake is trying to access the window manager
when it shouldn't be.

The two most common causes of this:

    1) User error -- the shake file is trying to render
           to the screen, instead of rendering to a file.

    2) Bad OS library (eg. quicktime) loaded by shake
       that is trying to manipulate the window manager.

Regaring #1, try running the same shake command from a terminal
to see if it opens a GUI. If it does, that's the problem.

If it doesn't, then it's probably #2, which means some OSX library
(that shake is loading) is trying to access the window manager when
the library is loaded and initialized.

In the past I've seen QuickTime libraries cause this, where someone
either updated the quicktime libs from Apple with buggy libs causing
the problem, or a recent OS re-install from CDs that DIDN'T take the
latest updates from Apple.

> This is only happening on 3 machines, the others are fine.

Check the patch level of the machines (ie. run 'sw_vers' on each box)

You can probably replicate this problem by ssh'ing into the same machine
that rendered the frame and failed, and logging in as the same user the
rush render was running shake as. This user likely doesn't match the user
logged into the window manager, and thus the error about being unable to
connect to the window manager.

Shake renders should not be trying to access the window manager
unless something is wrong.. ie. #1 or #2 above.

Dylan Penhale wrote:
[posted to rush.general]

Has anyone seen the following error when trying to render shake jobs through rush?

Executing: shake -exec
/var/tmp/.RUSH_TMP.42/re_245_330_x005sc_F003.shk -t 26-26
-proxyscale Base -vv -cpus 2
INIT_Processeses(), could not establish the default
connection to the WindowServer.--- shake: terminated by signal 6

This is only happening on 3 machines, the others are fine.
The 3 machines are able to resolve DNS, and get the UID/GID of the submitting user.
Shake runs fine on these boxes.

I notice that this may be similar to the AE issue listed here: http://seriss.com/rush-current/issues-afterfx-6.5/index.html

Should I change the shake owner to 0:0 on the problem hosts? I can't figure why only some boxes have the problem.


Regards

Dylan Penhale
Systems Administrator
Fuel International







--
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)

   From: Dylan Penhale <dylanpenhale@(email surpressed)>
Subject: RE: Shake INIT_Processeses problem
   Date: Tue, 01 Aug 2006 22:32:07 -0400
Msg# 1359
View Complete Thread (10 articles) | All Threads
Last Next
Thanks Greg

If I ssh into the problem machine as the user that submits the job and try
to launch shake I get:

kCGErrorRangeCheck : Window Server communications from outside of session
allowed for root and console user only INIT_Processeses(), could not
establish the default connection to the WindowServer.Abort trap

However I get that error on other machines that "are" able to render out the
frame fine.

This problem is intermittent too. Some times the machine can render,
occasionally we get this:
Executing: shake -exec
/var/tmp/.RUSH_TMP.42/re_245_330_x005sc_F003.shk -t 26-26 -proxyscale 
Base -vv -cpus 2 INIT_Processeses(), could not establish the default 
connection to the WindowServer.--- shake: terminated by signal 6

I think the error is linked though. The user that is having the problem has
a lower ID than others (169 compared to the usual 1000+) and I do remember
reading something about low ID's being a problem for Mac machines. 

I will change his ID and report back.



 

-----Original Message-----
From: Greg Ercolano [mailto:erco@(email surpressed)] 
Sent: 25 July 2006 13:16
To: void@(email surpressed)
Subject: Re: Shake INIT_Processeses problem

[posted to rush.general]

 > INIT_Processeses(), could not establish the default  > connection to the
WindowServer.--- shake: terminated by signal 6

Sounds like shake is trying to access the window manager when it shouldn't
be.

The two most common causes of this:

     1) User error -- the shake file is trying to render
            to the screen, instead of rendering to a file.

     2) Bad OS library (eg. quicktime) loaded by shake
        that is trying to manipulate the window manager.

Regaring #1, try running the same shake command from a terminal to see if it
opens a GUI. If it does, that's the problem.

If it doesn't, then it's probably #2, which means some OSX library (that
shake is loading) is trying to access the window manager when the library is
loaded and initialized.

In the past I've seen QuickTime libraries cause this, where someone either
updated the quicktime libs from Apple with buggy libs causing the problem,
or a recent OS re-install from CDs that DIDN'T take the latest updates from
Apple.

 > This is only happening on 3 machines, the others are fine.

Check the patch level of the machines (ie. run 'sw_vers' on each box)

You can probably replicate this problem by ssh'ing into the same machine
that rendered the frame and failed, and logging in as the same user the rush
render was running shake as. This user likely doesn't match the user logged
into the window manager, and thus the error about being unable to connect to
the window manager.

Shake renders should not be trying to access the window manager unless
something is wrong.. ie. #1 or #2 above.

Dylan Penhale wrote:
> [posted to rush.general]
> 
> Has anyone seen the following error when trying to render shake jobs 
> through rush?
> 
> Executing: shake -exec
> /var/tmp/.RUSH_TMP.42/re_245_330_x005sc_F003.shk -t 26-26 -proxyscale 
> Base -vv -cpus 2 INIT_Processeses(), could not establish the default 
> connection to the WindowServer.--- shake: terminated by signal 6
> 
> This is only happening on 3 machines, the others are fine.
> The 3 machines are able to resolve DNS, and get the UID/GID of the 
> submitting user.
> Shake runs fine on these boxes.
> 
> I notice that this may be similar to the AE issue listed here: 
> http://seriss.com/rush-current/issues-afterfx-6.5/index.html
> 
> Should I change the shake owner to 0:0 on the problem hosts? I can't 
> figure why only some boxes have the problem.
> 
> 
> Regards
> 
> Dylan Penhale
> Systems Administrator
> Fuel International
> 
> 
> 
> 
> 


--
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)


   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: Shake INIT_Processeses problem
   Date: Wed, 02 Aug 2006 00:39:45 -0400
Msg# 1360
View Complete Thread (10 articles) | All Threads
Last Next
Dylan Penhale wrote:
[posted to rush.general]

Thanks Greg

If I ssh into the problem machine as the user that submits the job and try
to launch shake I get:

kCGErrorRangeCheck : Window Server communications from outside of session
allowed for root and console user only INIT_Processeses(), could not
establish the default connection to the WindowServer.Abort trap

However I get that error on other machines that "are" able to render out the
frame fine.

This problem is intermittent too. Some times the machine can render,
occasionally we get this:
Executing: shake -exec
/var/tmp/.RUSH_TMP.42/re_245_330_x005sc_F003.shk -t 26-26 -proxyscale Base -vv -cpus 2 INIT_Processeses(), could not establish the default connection to the WindowServer.--- shake: terminated by signal 6

I think the error is linked though. The user that is having the problem has
a lower ID than others (169 compared to the usual 1000+) and I do remember
reading something about low ID's being a problem for Mac machines.
I will change his ID and report back.



-----Original Message-----
From: Greg Ercolano [mailto:erco@(email surpressed)] Sent: 25 July 2006 13:16
To: void@(email surpressed)
Subject: Re: Shake INIT_Processeses problem

[posted to rush.general]

 > INIT_Processeses(), could not establish the default  > connection to the
WindowServer.--- shake: terminated by signal 6

Sounds like shake is trying to access the window manager when it shouldn't
be.

The two most common causes of this:

     1) User error -- the shake file is trying to render
            to the screen, instead of rendering to a file.

     2) Bad OS library (eg. quicktime) loaded by shake
        that is trying to manipulate the window manager.

Regaring #1, try running the same shake command from a terminal to see if it
opens a GUI. If it does, that's the problem.

If it doesn't, then it's probably #2, which means some OSX library (that
shake is loading) is trying to access the window manager when the library is
loaded and initialized.

In the past I've seen QuickTime libraries cause this, where someone either
updated the quicktime libs from Apple with buggy libs causing the problem,
or a recent OS re-install from CDs that DIDN'T take the latest updates from
Apple.

 > This is only happening on 3 machines, the others are fine.

Check the patch level of the machines (ie. run 'sw_vers' on each box)

You can probably replicate this problem by ssh'ing into the same machine
that rendered the frame and failed, and logging in as the same user the rush
render was running shake as. This user likely doesn't match the user logged
into the window manager, and thus the error about being unable to connect to
the window manager.

Shake renders should not be trying to access the window manager unless
something is wrong.. ie. #1 or #2 above.

Dylan Penhale wrote:
[posted to rush.general]

Has anyone seen the following error when trying to render shake jobs through rush?

Executing: shake -exec
/var/tmp/.RUSH_TMP.42/re_245_330_x005sc_F003.shk -t 26-26 -proxyscale Base -vv -cpus 2 INIT_Processeses(), could not establish the default connection to the WindowServer.--- shake: terminated by signal 6

This is only happening on 3 machines, the others are fine.
The 3 machines are able to resolve DNS, and get the UID/GID of the submitting user.
Shake runs fine on these boxes.

I notice that this may be similar to the AE issue listed here: http://seriss.com/rush-current/issues-afterfx-6.5/index.html

Should I change the shake owner to 0:0 on the problem hosts?

	Doing a chmod 4755; chown 0:0 will surely by pass the problem,
	similar to how that 'fixes' the problem with AfterFx.

	It's not a great solution, of course, as it makes the program
	run as root, and the files it reads/writes are accessed as root too.
	But in a production environment, a sysadmin has ta 'do what you gotta do'
	to keep the production locomotive running on the track, permissions be damned.

I can't figure why only some boxes have the problem.

	I'd bet it's a library issue or plugin issue, or a combo of the two
	where some machines have different versions of libraries and/or plugins
	than others.

	In ssh, you might try ktrace'ing the binary to see if you can
	determine /which/ library is being initialized that is causing
	the problem.

	Sometimes libraries initialize right after they load, giving a
	tell-tale sign as to the problem. If you can figure out which
	lib it is, you might then be able to compare the file size or
	rev number of that lib against the working machines.


--
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)

   From: Dylan Penhale <dylanpenhale@(email surpressed)>
Subject: RE: Shake INIT_Processeses problem
   Date: Fri, 04 Aug 2006 03:41:49 -0400
Msg# 1362
View Complete Thread (10 articles) | All Threads
Last Next
I have just noticed that this is happening on some boxes that are not
rendering. We have just started rolling out a few of the 102.42a6 update to
the farm today. Do you think this is related?


-----Original Message-----
From: Greg Ercolano [mailto:erco@(email surpressed)] 
Sent: 02 August 2006 14:40
To: void@(email surpressed)
Subject: Re: Shake INIT_Processeses problem

[posted to rush.general]

Dylan Penhale wrote:
> [posted to rush.general]
> 
> Thanks Greg
> 
> If I ssh into the problem machine as the user that submits the job and 
> try to launch shake I get:
> 
> kCGErrorRangeCheck : Window Server communications from outside of 
> session allowed for root and console user only INIT_Processeses(), 
> could not establish the default connection to the WindowServer.Abort 
> trap
> 
> However I get that error on other machines that "are" able to render 
> out the frame fine.
> 
> This problem is intermittent too. Some times the machine can render, 
> occasionally we get this:
> Executing: shake -exec
> /var/tmp/.RUSH_TMP.42/re_245_330_x005sc_F003.shk -t 26-26 -proxyscale 
> Base -vv -cpus 2 INIT_Processeses(), could not establish the default 
> connection to the WindowServer.--- shake: terminated by signal 6
> 
> I think the error is linked though. The user that is having the 
> problem has a lower ID than others (169 compared to the usual 1000+) 
> and I do remember reading something about low ID's being a problem for Mac
machines.
> 
> I will change his ID and report back.
> 
> 
> 
>  
> 
> -----Original Message-----
> From: Greg Ercolano [mailto:erco@(email surpressed)]
> Sent: 25 July 2006 13:16
> To: void@(email surpressed)
> Subject: Re: Shake INIT_Processeses problem
> 
> [posted to rush.general]
> 
>  > INIT_Processeses(), could not establish the default  > connection 
> to the
> WindowServer.--- shake: terminated by signal 6
> 
> Sounds like shake is trying to access the window manager when it 
> shouldn't be.
> 
> The two most common causes of this:
> 
>      1) User error -- the shake file is trying to render
>             to the screen, instead of rendering to a file.
> 
>      2) Bad OS library (eg. quicktime) loaded by shake
>         that is trying to manipulate the window manager.
> 
> Regaring #1, try running the same shake command from a terminal to see 
> if it opens a GUI. If it does, that's the problem.
> 
> If it doesn't, then it's probably #2, which means some OSX library 
> (that shake is loading) is trying to access the window manager when 
> the library is loaded and initialized.
> 
> In the past I've seen QuickTime libraries cause this, where someone 
> either updated the quicktime libs from Apple with buggy libs causing 
> the problem, or a recent OS re-install from CDs that DIDN'T take the 
> latest updates from Apple.
> 
>  > This is only happening on 3 machines, the others are fine.
> 
> Check the patch level of the machines (ie. run 'sw_vers' on each box)
> 
> You can probably replicate this problem by ssh'ing into the same 
> machine that rendered the frame and failed, and logging in as the same 
> user the rush render was running shake as. This user likely doesn't 
> match the user logged into the window manager, and thus the error 
> about being unable to connect to the window manager.
> 
> Shake renders should not be trying to access the window manager unless 
> something is wrong.. ie. #1 or #2 above.
> 
> Dylan Penhale wrote:
>> [posted to rush.general]
>>
>> Has anyone seen the following error when trying to render shake jobs 
>> through rush?
>>
>> Executing: shake -exec
>> /var/tmp/.RUSH_TMP.42/re_245_330_x005sc_F003.shk -t 26-26 -proxyscale 
>> Base -vv -cpus 2 INIT_Processeses(), could not establish the default 
>> connection to the WindowServer.--- shake: terminated by signal 6
>>
>> This is only happening on 3 machines, the others are fine.
>> The 3 machines are able to resolve DNS, and get the UID/GID of the 
>> submitting user.
>> Shake runs fine on these boxes.
>>
>> I notice that this may be similar to the AE issue listed here: 
>> http://seriss.com/rush-current/issues-afterfx-6.5/index.html
>>
>> Should I change the shake owner to 0:0 on the problem hosts?

	Doing a chmod 4755; chown 0:0 will surely by pass the problem,
	similar to how that 'fixes' the problem with AfterFx.

	It's not a great solution, of course, as it makes the program
	run as root, and the files it reads/writes are accessed as root too.
	But in a production environment, a sysadmin has ta 'do what you
gotta do'
	to keep the production locomotive running on the track, permissions
be damned.

> I can't figure why only some boxes have the problem.

	I'd bet it's a library issue or plugin issue, or a combo of the two
	where some machines have different versions of libraries and/or
plugins
	than others.

	In ssh, you might try ktrace'ing the binary to see if you can
	determine /which/ library is being initialized that is causing
	the problem.

	Sometimes libraries initialize right after they load, giving a
	tell-tale sign as to the problem. If you can figure out which
	lib it is, you might then be able to compare the file size or
	rev number of that lib against the working machines.


--
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)


   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: Shake INIT_Processeses problem
   Date: Fri, 04 Aug 2006 04:12:29 -0400
Msg# 1364
View Complete Thread (10 articles) | All Threads
Last Next
Dylan Penhale wrote:
I have just noticed that this is happening on some boxes that are not rendering.

	Hmm, not sure I follow.

	This error shouldn't have anything to do with whether the machines
	are rendering anything.

	The error is caused by shake trying to access the window manager,
	and failing because it isn't being invoked by the same user logged
	into the window manager.

	Actually I doubt it's shake's code that's responsible for the error
	(unless the user is rendering to the screen). It's more likely that shake
	is loading a dynamic library from the OS (like the quicktime lib), and
	the library's initialization code is trying to manipulate or in some way
	access the window manager.

We have just started rolling out a few of the 102.42a6 update to
the farm today. Do you think this is related?

	No, I can't see how the Rush install could impact shake.

	You can replicate this problem with ssh entirely outside of rush,
	so the only possible way rush could affect shake is if rush manipulated
	the shake directory or the OS libraries.. it doesn't.

	Maybe I'm missing something about your question.
	By what means do you think one program could affect the other?

	The only files the rush install modifies outside of the rush
	directory is the rush boot script, and adding rush/bin to the PATH
	of the csh and sh startup files.

--
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)

   From: Dylan Penhale <dylanpenhale@(email surpressed)>
Subject: RE: Shake INIT_Processeses problem
   Date: Tue, 22 Aug 2006 01:31:19 -0400
Msg# 1377
View Complete Thread (10 articles) | All Threads
Last Next
To follow up on this issue, we found that it WAS in fact Quicktime that was
failing. Only on certain machines when trying to render shake scenes
containing Quicktime files. When Quicktime was unable to open an error
dialogue box to inform the user of the error we got the error about the
windows manager being displayed in the shake log, presumably because it gets
written to stderr.

Reinstalling Quicktime on the few boxes with this problem has remedied the
problem.

Thanks Greg


-----Original Message-----
From: Greg Ercolano [mailto:erco@(email surpressed)] 
Sent: 04 August 2006 18:12
To: void@(email surpressed)
Subject: Re: Shake INIT_Processeses problem

[posted to rush.general]

Dylan Penhale wrote:
> I have just noticed that this is happening on some boxes that are not
rendering.

	Hmm, not sure I follow.

	This error shouldn't have anything to do with whether the machines
	are rendering anything.

	The error is caused by shake trying to access the window manager,
	and failing because it isn't being invoked by the same user logged
	into the window manager.

	Actually I doubt it's shake's code that's responsible for the error
	(unless the user is rendering to the screen). It's more likely that
shake
	is loading a dynamic library from the OS (like the quicktime lib),
and
	the library's initialization code is trying to manipulate or in some
way
	access the window manager.

> We have just started rolling out a few of the 102.42a6 update to the 
> farm today. Do you think this is related?

	No, I can't see how the Rush install could impact shake.

	You can replicate this problem with ssh entirely outside of rush,
	so the only possible way rush could affect shake is if rush
manipulated
	the shake directory or the OS libraries.. it doesn't.

	Maybe I'm missing something about your question.
	By what means do you think one program could affect the other?

	The only files the rush install modifies outside of the rush
	directory is the rush boot script, and adding rush/bin to the PATH
	of the csh and sh startup files.

--
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)


   From: Dylan Penhale <dylanpenhale@(email surpressed)>
Subject: RE: Shake INIT_Processeses problem
   Date: Fri, 04 Aug 2006 04:21:56 -0400
Msg# 1366
View Complete Thread (10 articles) | All Threads
Last Next
Another thing. When I kill the rshd, I notice mayabatch restart several
times afterwards. I have to kill it about 3 times. It looks like something
is trying to relaunch it. Would the perl that rush calls do something like
this?

-----Original Message-----
From: Dylan Penhale [mailto:dylanpenhale@(email surpressed)] 
Sent: 04 August 2006 17:42
To: void@(email surpressed)
Subject: RE: Shake INIT_Processeses problem

[posted to rush.general]

I have just noticed that this is happening on some boxes that are not
rendering. We have just started rolling out a few of the 102.42a6 update to
the farm today. Do you think this is related?


-----Original Message-----
From: Greg Ercolano [mailto:erco@(email surpressed)]
Sent: 02 August 2006 14:40
To: void@(email surpressed)
Subject: Re: Shake INIT_Processeses problem

[posted to rush.general]

Dylan Penhale wrote:
> [posted to rush.general]
> 
> Thanks Greg
> 
> If I ssh into the problem machine as the user that submits the job and 
> try to launch shake I get:
> 
> kCGErrorRangeCheck : Window Server communications from outside of 
> session allowed for root and console user only INIT_Processeses(), 
> could not establish the default connection to the WindowServer.Abort 
> trap
> 
> However I get that error on other machines that "are" able to render 
> out the frame fine.
> 
> This problem is intermittent too. Some times the machine can render, 
> occasionally we get this:
> Executing: shake -exec
> /var/tmp/.RUSH_TMP.42/re_245_330_x005sc_F003.shk -t 26-26 -proxyscale 
> Base -vv -cpus 2 INIT_Processeses(), could not establish the default 
> connection to the WindowServer.--- shake: terminated by signal 6
> 
> I think the error is linked though. The user that is having the 
> problem has a lower ID than others (169 compared to the usual 1000+) 
> and I do remember reading something about low ID's being a problem for 
> Mac
machines.
> 
> I will change his ID and report back.
> 
> 
> 
>  
> 
> -----Original Message-----
> From: Greg Ercolano [mailto:erco@(email surpressed)]
> Sent: 25 July 2006 13:16
> To: void@(email surpressed)
> Subject: Re: Shake INIT_Processeses problem
> 
> [posted to rush.general]
> 
>  > INIT_Processeses(), could not establish the default  > connection 
> to the
> WindowServer.--- shake: terminated by signal 6
> 
> Sounds like shake is trying to access the window manager when it 
> shouldn't be.
> 
> The two most common causes of this:
> 
>      1) User error -- the shake file is trying to render
>             to the screen, instead of rendering to a file.
> 
>      2) Bad OS library (eg. quicktime) loaded by shake
>         that is trying to manipulate the window manager.
> 
> Regaring #1, try running the same shake command from a terminal to see 
> if it opens a GUI. If it does, that's the problem.
> 
> If it doesn't, then it's probably #2, which means some OSX library 
> (that shake is loading) is trying to access the window manager when 
> the library is loaded and initialized.
> 
> In the past I've seen QuickTime libraries cause this, where someone 
> either updated the quicktime libs from Apple with buggy libs causing 
> the problem, or a recent OS re-install from CDs that DIDN'T take the 
> latest updates from Apple.
> 
>  > This is only happening on 3 machines, the others are fine.
> 
> Check the patch level of the machines (ie. run 'sw_vers' on each box)
> 
> You can probably replicate this problem by ssh'ing into the same 
> machine that rendered the frame and failed, and logging in as the same 
> user the rush render was running shake as. This user likely doesn't 
> match the user logged into the window manager, and thus the error 
> about being unable to connect to the window manager.
> 
> Shake renders should not be trying to access the window manager unless 
> something is wrong.. ie. #1 or #2 above.
> 
> Dylan Penhale wrote:
>> [posted to rush.general]
>>
>> Has anyone seen the following error when trying to render shake jobs 
>> through rush?
>>
>> Executing: shake -exec
>> /var/tmp/.RUSH_TMP.42/re_245_330_x005sc_F003.shk -t 26-26 -proxyscale 
>> Base -vv -cpus 2 INIT_Processeses(), could not establish the default 
>> connection to the WindowServer.--- shake: terminated by signal 6
>>
>> This is only happening on 3 machines, the others are fine.
>> The 3 machines are able to resolve DNS, and get the UID/GID of the 
>> submitting user.
>> Shake runs fine on these boxes.
>>
>> I notice that this may be similar to the AE issue listed here: 
>> http://seriss.com/rush-current/issues-afterfx-6.5/index.html
>>
>> Should I change the shake owner to 0:0 on the problem hosts?

	Doing a chmod 4755; chown 0:0 will surely by pass the problem,
	similar to how that 'fixes' the problem with AfterFx.

	It's not a great solution, of course, as it makes the program
	run as root, and the files it reads/writes are accessed as root too.
	But in a production environment, a sysadmin has ta 'do what you
gotta do'
	to keep the production locomotive running on the track, permissions
be damned.

> I can't figure why only some boxes have the problem.

	I'd bet it's a library issue or plugin issue, or a combo of the two
	where some machines have different versions of libraries and/or
plugins
	than others.

	In ssh, you might try ktrace'ing the binary to see if you can
	determine /which/ library is being initialized that is causing
	the problem.

	Sometimes libraries initialize right after they load, giving a
	tell-tale sign as to the problem. If you can figure out which
	lib it is, you might then be able to compare the file size or
	rev number of that lib against the working machines.


--
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)


   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: Shake INIT_Processeses problem
   Date: Fri, 04 Aug 2006 05:07:10 -0400
Msg# 1367
View Complete Thread (10 articles) | All Threads
Last Next
Dylan Penhale wrote:
Another thing. When I kill the rshd..

	rshd or rushd?

	I'm guessing you mean rushd, as rush doesn't make use of rsh or rshd.

	Not sure why you're killing rushd. You should probably just requeue
	the frame via irush (or via 'rush -que') so the script and its
	process hierarchy get killed correctly.

	If you try to kill the mayabatch process, the render script will probably
	think the render failed due to an error, and it's retrying up to three
	times before giving up on the machine. (The user probably has "Retries: 3"
	set when they submitted the job; this retry behavior is in the render script)

I notice mayabatch restart several times afterwards.

	mayabatch, or shake? (This thread is about a problem with shake,
	so I guess I'm not sure how mayabatch snuck in.. maybe you're
	trying to kill other renders to see if they're affecting shake)

I have to kill it about 3 times. It looks like something
is trying to relaunch it. Would the perl that rush calls do something like
this?

	The log for the frame you're trying to kill will probably
	show the retry messages from the script.

	The way rush kills a frame is to kill the entire process group,
	starting at the perl script. So if the process tree is something
	like this:

111 rushd
      \
 112  perl /path/to/renderscript
        \
   113   maya -batch

	..then rush would invoke killpg(2) on PID 112 to kill perl and maya
	with a SIGKILL.

	Under most versions of Unix I've seen, kill(1) (ie. /bin/kill) can signal
	a process group by specifying a negative number for the PID.

	Oddly, the OSX man page for kill(1) makes no mention of process groups
	at all (!) Maybe this is another great man page omission.

	Probably the TCSH and BASH built-in versions of kill(1) support this,
	I'm not sure.

	The easy thing to do would be to use 'rush -getoff; rush -online'
	to quickly kill any renders on the local box, regardless of platform.

--
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)

   From: Dylan Penhale <dylanpenhale@(email surpressed)>
Subject: RE: Shake INIT_Processeses problem
   Date: Fri, 04 Aug 2006 20:53:19 -0400
Msg# 1368
View Complete Thread (10 articles) | All Threads
Last Next
You are right. I have no idea how I got started back on this old thread,
Friday night was pretty busy  :)

Sorry about this, I'll kiil this thread and re-open another.



-----Original Message-----
From: Greg Ercolano [mailto:erco@(email surpressed)] 
Sent: 04 August 2006 19:07
To: void@(email surpressed)
Subject: Re: Shake INIT_Processeses problem

[posted to rush.general]

Dylan Penhale wrote:
> Another thing. When I kill the rshd..

	rshd or rushd?

	I'm guessing you mean rushd, as rush doesn't make use of rsh or
rshd.

	Not sure why you're killing rushd. You should probably just requeue
	the frame via irush (or via 'rush -que') so the script and its
	process hierarchy get killed correctly.

	If you try to kill the mayabatch process, the render script will
probably
	think the render failed due to an error, and it's retrying up to
three
	times before giving up on the machine. (The user probably has
"Retries: 3"
	set when they submitted the job; this retry behavior is in the
render script)

> I notice mayabatch restart several times afterwards. 

	mayabatch, or shake? (This thread is about a problem with shake,
	so I guess I'm not sure how mayabatch snuck in.. maybe you're
	trying to kill other renders to see if they're affecting shake)

> I have to kill it about 3 times. It looks like something is trying to 
> relaunch it. Would the perl that rush calls do something like this?

	The log for the frame you're trying to kill will probably
	show the retry messages from the script.

	The way rush kills a frame is to kill the entire process group,
	starting at the perl script. So if the process tree is something
	like this:

111 rushd
       \
  112  perl /path/to/renderscript
         \
    113   maya -batch

	..then rush would invoke killpg(2) on PID 112 to kill perl and maya
	with a SIGKILL.

	Under most versions of Unix I've seen, kill(1) (ie. /bin/kill) can
signal
	a process group by specifying a negative number for the PID.

	Oddly, the OSX man page for kill(1) makes no mention of process
groups
	at all (!) Maybe this is another great man page omission.

	Probably the TCSH and BASH built-in versions of kill(1) support
this,
	I'm not sure.

	The easy thing to do would be to use 'rush -getoff; rush -online'
	to quickly kill any renders on the local box, regardless of
platform.

--
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)