From: Greg Ercolano <erco@(email surpressed)>
Subject: OSX + Xserves + 4 proc = rushd stopping?
   Date: Thu, 28 Jun 2007 15:24:55 -0400
Msg# 1591
View Complete Thread (4 articles) | All Threads
Last Next
How many of you are using 2 or 4 proc Xserves, and if so,
have you noticed problems with the rushd daemon just stopping
for no apparent reason? (ie. 'Connection refused', and the
daemon not in the process table)

It seems possibly specific to 4 proc Xserves, but I'm
still gathering info, as this is the first I've heard of it.

This is happening with both the new and previous release,
so the fact it's showing up now seems like it might be
hardware specific, or possibly being caused by an OS update.

The crash log evidence I have so far shows it as a threading
problem in the OS, but I'm still investigating. The funny thing
is, the Unix rush daemons don't USE threads, so it's interesting
the OS is showing the daemon stopping in a threading mode.

The immediate workaround my customer is using was to create a
cron script that checks if the daemon isn't running and restarts it,
which at least keeps production moving.

I treat any kind of problems with daemons as a "Big Deal", so I'm
trying to gather as much info as possible to identify and solve this
in a timely manner.


   From: "Flynn, Daniel" <Daniel.Flynn@(email surpressed)>
Subject: Re: OSX + Xserves + 4 proc = rushd stopping?
   Date: Thu, 28 Jun 2007 16:33:49 -0400
Msg# 1592
View Complete Thread (4 articles) | All Threads
Last Next


On 6/28/07 3:24 PM, "Greg Ercolano" <erco@(email surpressed)> wrote:

> [posted to rush.general]
> 
> How many of you are using 2 or 4 proc Xserves, and if so,
> have you noticed problems with the rushd daemon just stopping
> for no apparent reason? (ie. 'Connection refused', and the
> daemon not in the process table)
> 

Running 2 proc Xserve G5 cluster nodes under 10.4.7 and rush 102.42a without
a problem. 10.4.3 was good too.

-dan


   From: Brent Hensarling <brenth@(email surpressed)>
Subject: Re: OSX + Xserves + 4 proc = rushd stopping?
   Date: Thu, 28 Jun 2007 17:00:12 -0400
Msg# 1593
View Complete Thread (4 articles) | All Threads
Last Next
We have a bunch of 4 proc ( dual proc dual core) xserves and a bunch of 2 proc ( G5) xserves running 10.4.8 and rush 102.42a8 without any issues at all.
Thanks,
Brent
_________________________________________________
Brent Hensarling
Luma Pictures
luma-pictures.com
_________________________________________________


On Jun 28, 2007, at Thu 12:24|Jun28, Greg Ercolano wrote:

[posted to rush.general]

How many of you are using 2 or 4 proc Xserves, and if so,
have you noticed problems with the rushd daemon just stopping
for no apparent reason? (ie. 'Connection refused', and the
daemon not in the process table)

It seems possibly specific to 4 proc Xserves, but I'm
still gathering info, as this is the first I've heard of it.

This is happening with both the new and previous release,
so the fact it's showing up now seems like it might be
hardware specific, or possibly being caused by an OS update.

The crash log evidence I have so far shows it as a threading
problem in the OS, but I'm still investigating. The funny thing
is, the Unix rush daemons don't USE threads, so it's interesting
the OS is showing the daemon stopping in a threading mode.

The immediate workaround my customer is using was to create a
cron script that checks if the daemon isn't running and restarts it,
which at least keeps production moving.

I treat any kind of problems with daemons as a "Big Deal", so I'm
trying to gather as much info as possible to identify and solve this
in a timely manner.


   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: OSX + Xserves + 4 proc = rushd stopping?
   Date: Thu, 28 Jun 2007 17:13:00 -0400
Msg# 1594
View Complete Thread (4 articles) | All Threads
Last Next
Brent Hensarling wrote:
> We have a bunch of 4 proc ( dual proc dual core) xserves and a bunch  
> of 2 proc ( G5) xserves running 10.4.8 and rush 102.42a8 without any  
> issues at all.

	Interesting; thanks.

	Client was running 10.4.9, and recently upgraded to 10.4.10.
	Problem was happening on both revs. I wonder if its the newer
	releases that are causing this.

	Client also has large jobs (eg. single jobs with 35,000 frames)
	and quick frame turnaround, so it might be something they're seeing
	due to the high volume.

	They're replicating about 3x a day per box on 4 machines
	with both 102.42a7 and 102.42a8. Not seeing it on other
	platforms, just the 4 proc Xserves.

	I'll follow up here when their tests today are complete using
	new binaries I've provided them.

-- 
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)