How many of you are using 2 or 4 proc Xserves, and if so, have you noticed problems with the rushd daemon just stopping for no apparent reason? (ie. 'Connection refused', and the daemon not in the process table) It seems possibly specific to 4 proc Xserves, but I'm still gathering info, as this is the first I've heard of it. This is happening with both the new and previous release, so the fact it's showing up now seems like it might be hardware specific, or possibly being caused by an OS update. The crash log evidence I have so far shows it as a threading problem in the OS, but I'm still investigating. The funny thing is, the Unix rush daemons don't USE threads, so it's interesting the OS is showing the daemon stopping in a threading mode. The immediate workaround my customer is using was to create a cron script that checks if the daemon isn't running and restarts it, which at least keeps production moving. I treat any kind of problems with daemons as a "Big Deal", so I'm trying to gather as much info as possible to identify and solve this in a timely manner.

On 6/28/07 3:24 PM, "Greg Ercolano" <erco@(email surpressed)> wrote: > [posted to rush.general] > > How many of you are using 2 or 4 proc Xserves, and if so, > have you noticed problems with the rushd daemon just stopping > for no apparent reason? (ie. 'Connection refused', and the > daemon not in the process table) > Running 2 proc Xserve G5 cluster nodes under 10.4.7 and rush 102.42a without a problem. 10.4.3 was good too. -dan

On Jun 28, 2007, at Thu 12:24|Jun28, Greg Ercolano wrote:

[posted to rush.general]

How many of you are using 2 or 4 proc Xserves, and if so,
have you noticed problems with the rushd daemon just stopping
for no apparent reason? (ie. 'Connection refused', and the
daemon not in the process table)

It seems possibly specific to 4 proc Xserves, but I'm
still gathering info, as this is the first I've heard of it.

This is happening with both the new and previous release,
so the fact it's showing up now seems like it might be
hardware specific, or possibly being caused by an OS update.

The crash log evidence I have so far shows it as a threading
problem in the OS, but I'm still investigating. The funny thing
is, the Unix rush daemons don't USE threads, so it's interesting
the OS is showing the daemon stopping in a threading mode.

The immediate workaround my customer is using was to create a
cron script that checks if the daemon isn't running and restarts it,
which at least keeps production moving.

I treat any kind of problems with daemons as a "Big Deal", so I'm
trying to gather as much info as possible to identify and solve this
in a timely manner.

Brent Hensarling wrote: > We have a bunch of 4 proc ( dual proc dual core) xserves and a bunch > of 2 proc ( G5) xserves running 10.4.8 and rush 102.42a8 without any > issues at all. Interesting; thanks. Client was running 10.4.9, and recently upgraded to 10.4.10. Problem was happening on both revs. I wonder if its the newer releases that are causing this. Client also has large jobs (eg. single jobs with 35,000 frames) and quick frame turnaround, so it might be something they're seeing due to the high volume. They're replicating about 3x a day per box on 4 machines with both 102.42a7 and 102.42a8. Not seeing it on other platforms, just the 4 proc Xserves. I'll follow up here when their tests today are complete using new binaries I've provided them. -- Greg Ercolano, erco@(email surpressed) Rush Render Queue, http://seriss.com/rush/ Tel: (Tel# suppressed) Fax: (Tel# suppressed) Cel: (Tel# suppressed)