From: "Mr. Daniel Browne" <d.list@(email surpressed)>
Subject: Rush 102.42a9c and Snow Leopard
   Date: Mon, 07 Dec 2009 15:03:23 -0500
Msg# 1907
View Complete Thread (9 articles) | All Threads
Last Next
Hi Greg,

	The 102.42a9c update seems to have fixed most of our operation problems with Rush on SL (we love the new GUI look too, by the way), but there is one thing we've noticed; rush on SL machines seems to stop responding after pushing a hosts file. The failure happens even if the file hasn't been changed. This requires you to restart the rush daemon, much like how before this release we had to do whenever rush on an SL machine stopped responding.

Thanks,

-Dan


----------
Dan "Doc" Browne
System Administrator

Evil Eye Pictures
d.list at evileyepictures.com
Office: (415) 777-0666


   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: Rush 102.42a9c and Snow Leopard
   Date: Mon, 07 Dec 2009 15:52:47 -0500
Msg# 1908
View Complete Thread (9 articles) | All Threads
Last Next
Mr. Daniel Browne wrote:
> [posted to rush.general]
> 
> Hi Greg,
> 
> 	The 102.42a9c update seems to have fixed most of our operation =
> problems with Rush on SL (we love the new GUI look too, by the way), but =
> there is one thing we've noticed; rush on SL machines seems to stop =
> responding after pushing a hosts file. The failure happens even if the =
> file hasn't been changed. This requires you to restart the rush daemon, =
> much like how before this release we had to do whenever rush on an SL =
> machine stopped responding.

	Hmm, can you send me the contents of the rushd.log from that machine
	where the daemon stopped responding?

-- 
Greg Ercolano, erco@(email surpressed)
Seriss Corporation
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)

   From: "Mr. Daniel Browne" <d.list@(email surpressed)>
Subject: Re: Rush 102.42a9c and Snow Leopard
   Date: Mon, 07 Dec 2009 17:01:26 -0500
Msg# 1909
View Complete Thread (9 articles) | All Threads
Last Next
Here you go (IP addresses removed):


12/07,00:00:04 ROTATE     rushd.log rotated. pid=203, 0/1 busy, ONLINE
12/07,00:00:04 ROTATE     wayne RUSHD 102.42a9c PID=203     Boot=12/04/09,14:07:03
12/07,09:47:26 SECURITY   Daemon changed to Offline by root@wayne[], Remark:offline by dbrowne via login script state (offline) by root@wayne[]
12/07,12:20:35 SECURITY   Daemon changed to Online by root@wayne[], Remark:online by dbrowne via logout script state (online) by root@wayne[]
12/07,12:20:36 SECURITY   Daemon changed to Getoff by root@wayne[] state (getoff) by root@wayne[]
12/07,12:20:40 INIT.D     Shutdown script '/usr/local/rush/etc/S99rush stop'
12/07,12:20:40 EXIT       Shutdown script '/usr/local/rush/etc/S99rush stop'
---------
12/07,12:21:34 INIT.D     Startup script '/usr/local/rush/etc/S99rush start'
12/07,12:21:34 INIT.D     Executing: ( /usr/local/rush/etc/S99rush waitlocal && cd /usr/local/rush/var && /usr/libexec/StartupItemContext /usr/local/rush/bin/rushd ) &
12/07,12:21:34 INIT.D     Local hostname wayne won't resolve (ping: cannot resolve wayne: Unknown host): 10 sec retries
12/07,12:21:45 LICENSE    validated with server vegas
12/07,12:21:45 LICENSE    expires 03/15/2036
12/07,12:21:45 START      wayne RUSHD 102.42a9c PID=189     Boot=12/07/09,12:21:45  Online
12/07,12:21:45 INFO       TCP listening on port 696, service 'rushd', sockfd=4
12/07,12:21:45 INFO       UDP listening on port 696, service 'rushd', sockfd=5
12/07,12:21:45 CHECKPOINT START: Loading /usr/local/rush/var/jobs-checkpoint
12/07,12:21:45 CHECKPOINT DONE
12/07,12:21:51 SECURITY   Daemon changed to Offline by root@wayne[], Remark:offline by dbrowne via login script state (offline) by root@wayne[172.21.7.9]
12/07,13:57:54 PUSH       /usr/local/rush/etc/hosts pushed from ?@vegas:52291[]
12/07,13:58:45 INFO/HOST  Reloading /usr/local/rush/etc/hosts



On Dec 7, 2009, at 12:52 PM, Greg Ercolano wrote:

[posted to rush.general]

Mr. Daniel Browne wrote:
> [posted to rush.general]
> 
> Hi Greg,
> 
> 	The 102.42a9c update seems to have fixed most of our operation =
> problems with Rush on SL (we love the new GUI look too, by the way), but =
> there is one thing we've noticed; rush on SL machines seems to stop =
> responding after pushing a hosts file. The failure happens even if the =
> file hasn't been changed. This requires you to restart the rush daemon, =
> much like how before this release we had to do whenever rush on an SL =
> machine stopped responding.

	Hmm, can you send me the contents of the rushd.log from that machine
	where the daemon stopped responding?

-- 
Greg Ercolano, erco@(email surpressed)
Seriss Corporation
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)


----------
Dan "Doc" Browne
System Administrator

Evil Eye Pictures
d.list at evileyepictures.com
Office: (415) 777-0666


   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: Rush 102.42a9c and Snow Leopard
   Date: Mon, 07 Dec 2009 17:30:20 -0500
Msg# 1910
View Complete Thread (9 articles) | All Threads
Last Next
Hmm, at what point in the rushd.log did the daemon stop?

Every START entry seems to have a matching EXIT entry.
Is it possible the problem happened on a different day from today?
(eg. if yesterday, paste the Orushd.log (O=old))

It would appear there was a temporary problem with the machine's own hostname (wayne)
not being resolvable during startup, where it says:

	Local hostname wayne won't resolve (ping: cannot resolve wayne: Unknown host): 10 sec retries

The daemon won't start the rush daemon until 'ping' can resolve the local machine's
own hostname, to make sure hostname resolution is working before rush is started.
In pseudocode, basically:

	while ( ping `hostname` doesn't work )
	{
	   sleep 10
	   try again
	}

Mr. Daniel Browne wrote:
> Here you go (IP addresses removed):
> 
> 12/07,00:00:04 ROTATE     rushd.log rotated. pid=3D203, 0/1 busy, ONLINE
> 12/07,00:00:04 ROTATE     wayne RUSHD 102.42a9c PID=3D203     Boot=3D12/04/09,14:07:03
> 12/07,09:47:26 SECURITY   Daemon changed to Offline by root@wayne[], Remark:offline by dbrowne via login script state (offline) by root@wayne[]
> 12/07,12:20:35 SECURITY   Daemon changed to Online by root@wayne[], Remark:online by dbrowne via logout script state (online) by root@wayne[]
> 12/07,12:20:36 SECURITY   Daemon changed to Getoff by root@wayne[] state (getoff) by root@wayne[]
> 12/07,12:20:40 INIT.D     Shutdown script '/usr/local/rush/etc/S99rush stop'
> 12/07,12:20:40 EXIT       Shutdown script '/usr/local/rush/etc/S99rush stop'
> ---------
> 12/07,12:21:34 INIT.D     Startup script '/usr/local/rush/etc/S99rush start'
> 12/07,12:21:34 INIT.D     Executing: ( /usr/local/rush/etc/S99rush waitlocal && cd /usr/local/rush/var && /usr/libexec/StartupItemContext /usr/local/rush/bin/rushd ) &
> 12/07,12:21:34 INIT.D     Local hostname wayne won't resolve (ping: cannot resolve wayne: Unknown host): 10 sec retries
> 12/07,12:21:45 LICENSE    validated with server vegas
> 12/07,12:21:45 LICENSE    expires 03/15/2036
> 12/07,12:21:45 START      wayne RUSHD 102.42a9c PID=3D189     Boot=3D12/07/09,12:21:45  Online
> 12/07,12:21:45 INFO       TCP listening on port 696, service 'rushd', sockfd=3D4
> 12/07,12:21:45 INFO       UDP listening on port 696, service 'rushd', sockfd=3D5
> 12/07,12:21:45 CHECKPOINT START: Loading /usr/local/rush/var/jobs-checkpoint
> 12/07,12:21:45 CHECKPOINT DONE
> 12/07,12:21:51 SECURITY   Daemon changed to Offline by root@wayne[], Remark:offline by dbrowne via login script state (offline) by root@wayne[172.21.7.9]
> 12/07,13:57:54 PUSH       /usr/local/rush/etc/hosts pushed from ?@vegas:52291[]
> 12/07,13:58:45 INFO/HOST  Reloading /usr/local/rush/etc/hosts


-- 
Greg Ercolano, erco@(email surpressed)
Seriss Corporation
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)

   From: "Mr. Daniel Browne" <d.list@(email surpressed)>
Subject: Re: Rush 102.42a9c and Snow Leopard
   Date: Mon, 07 Dec 2009 17:32:23 -0500
Msg# 1911
View Complete Thread (9 articles) | All Threads
Last Next
The machine stops responding after the last line, which corresponds to the knew hosts file being acknowledged:

> 12/07,13:58:45 INFO/HOST  Reloading /usr/local/rush/etc/hosts




On Dec 7, 2009, at 2:30 PM, Greg Ercolano wrote:

[posted to rush.general]

Hmm, at what point in the rushd.log did the daemon stop?

Every START entry seems to have a matching EXIT entry.
Is it possible the problem happened on a different day from today?
(eg. if yesterday, paste the Orushd.log (O=old))

It would appear there was a temporary problem with the machine's own hostname (wayne)
not being resolvable during startup, where it says:

	Local hostname wayne won't resolve (ping: cannot resolve wayne: Unknown host): 10 sec retries

The daemon won't start the rush daemon until 'ping' can resolve the local machine's
own hostname, to make sure hostname resolution is working before rush is started.
In pseudocode, basically:

	while ( ping `hostname` doesn't work )
	{
	   sleep 10
	   try again
	}

Mr. Daniel Browne wrote:
> Here you go (IP addresses removed):
> 
> 12/07,00:00:04 ROTATE     rushd.log rotated. pid=3D203, 0/1 busy, ONLINE
> 12/07,00:00:04 ROTATE     wayne RUSHD 102.42a9c PID=3D203     Boot=3D12/04/09,14:07:03
> 12/07,09:47:26 SECURITY   Daemon changed to Offline by root@wayne[], Remark:offline by dbrowne via login script state (offline) by root@wayne[]
> 12/07,12:20:35 SECURITY   Daemon changed to Online by root@wayne[], Remark:online by dbrowne via logout script state (online) by root@wayne[]
> 12/07,12:20:36 SECURITY   Daemon changed to Getoff by root@wayne[] state (getoff) by root@wayne[]
> 12/07,12:20:40 INIT.D     Shutdown script '/usr/local/rush/etc/S99rush stop'
> 12/07,12:20:40 EXIT       Shutdown script '/usr/local/rush/etc/S99rush stop'
> ---------
> 12/07,12:21:34 INIT.D     Startup script '/usr/local/rush/etc/S99rush start'
> 12/07,12:21:34 INIT.D     Executing: ( /usr/local/rush/etc/S99rush waitlocal && cd /usr/local/rush/var && /usr/libexec/StartupItemContext /usr/local/rush/bin/rushd ) &
> 12/07,12:21:34 INIT.D     Local hostname wayne won't resolve (ping: cannot resolve wayne: Unknown host): 10 sec retries
> 12/07,12:21:45 LICENSE    validated with server vegas
> 12/07,12:21:45 LICENSE    expires 03/15/2036
> 12/07,12:21:45 START      wayne RUSHD 102.42a9c PID=3D189     Boot=3D12/07/09,12:21:45  Online
> 12/07,12:21:45 INFO       TCP listening on port 696, service 'rushd', sockfd=3D4
> 12/07,12:21:45 INFO       UDP listening on port 696, service 'rushd', sockfd=3D5
> 12/07,12:21:45 CHECKPOINT START: Loading /usr/local/rush/var/jobs-checkpoint
> 12/07,12:21:45 CHECKPOINT DONE
> 12/07,12:21:51 SECURITY   Daemon changed to Offline by root@wayne[], Remark:offline by dbrowne via login script state (offline) by root@wayne[172.21.7.9]
> 12/07,13:57:54 PUSH       /usr/local/rush/etc/hosts pushed from ?@vegas:52291[]
> 12/07,13:58:45 INFO/HOST  Reloading /usr/local/rush/etc/hosts


-- 
Greg Ercolano, erco@(email surpressed)
Seriss Corporation
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)


----------
Dan "Doc" Browne
System Administrator

Evil Eye Pictures
d.list at evileyepictures.com
Office: (415) 777-0666


   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: Rush 102.42a9c and Snow Leopard
   Date: Mon, 07 Dec 2009 18:45:25 -0500
Msg# 1912
View Complete Thread (9 articles) | All Threads
Last Next
Mr. Daniel Browne wrote:
> The machine stops responding after the last line, which corresponds to =
> the knew hosts file being acknowledged:
> 
>> 12/07,13:58:45 INFO/HOST  Reloading /usr/local/rush/etc/hosts

	Hmm, we'll have to work offline on this; I can't seem to replicate here.
	Will contact each other by phone.

-- 
Greg Ercolano, erco@(email surpressed)
Seriss Corporation
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)

   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: Rush 102.42a9c and Snow Leopard
   Date: Mon, 07 Dec 2009 19:31:21 -0500
Msg# 1913
View Complete Thread (9 articles) | All Threads
Last Next
Greg Ercolano wrote:
> Mr. Daniel Browne wrote:
>> The machine stops responding after the last line, which corresponds to =
>> the knew hosts file being acknowledged:
>>
>>> 12/07,13:58:45 INFO/HOST  Reloading /usr/local/rush/etc/hosts
> 
> 	Hmm, we'll have to work offline on this; I can't seem to replicate here.
> 	Will contact each other by phone.

   OK, after a phone conversation, it seems on Snow Leopard machines
   when the rush/etc/hosts file reloads, the UDP listener is getting closed,
   and replaced with a unix domain socket.

   Since rush doesn't use unix domain sockets, this seems a bit odd.
   Also, I can't seem to replicate this on other platforms, only snow leopard.

   Whatever the case, this is causing the daemon to become unresponsive to
   UDP transactions, but still responds to TCP (eg. 'rush -ping' still works).

   This can be seen pretty clearly with 'lsof' before and after:

BEFORE:
$ lsof | grep rushd
rushd     1447           root  cwd       DIR       14,5       374 600231 /usr/local/rush/var
rushd     1447           root  txt       REG       14,5   3086040 599807 /usr/local/rush/bin/rushd
rushd     1447           root  txt       REG       14,5   1054960  24705 /usr/lib/dyld
rushd     1447           root  txt       REG       14,5 199118848 526079 /private/var/db/dyld/dyld_shared_cache_i386
rushd     1447           root    0r      CHR        3,2       0t0    303 /dev/null
rushd     1447           root    1w      REG       14,5      3169 600232 /usr/local/rush/var/rushd.log
rushd     1447           root    2w      REG       14,5      3169 600232 /usr/local/rush/var/rushd.log
rushd     1447           root    3w      REG       14,5         5 600323 /usr/local/rush/var/.rushd.LCK
rushd     1447           root    4u     IPv4 0x040f1ef8       0t0    TCP *:rushd (LISTEN)
rushd     1447           root    5u     IPv4 0x047cc9c8       0t0    UDP snow.erco.x:rushd
                                 ^^                                  ^^^^^^^^^^^^^^^^^^^^^

AFTER:
lsof | grep rushd
tail      1100           root    3r      REG       14,5      3231 600232 /usr/local/rush/var/rushd.log
rushd     1447           root  cwd       DIR       14,5       374 600231 /usr/local/rush/var
rushd     1447           root  txt       REG       14,5   3086040 599807 /usr/local/rush/bin/rushd
rushd     1447           root  txt       REG       14,5   1054960  24705 /usr/lib/dyld
rushd     1447           root  txt       REG       14,5 199118848 526079 /private/var/db/dyld/dyld_shared_cache_i386
rushd     1447           root    0r      CHR        3,2       0t0    303 /dev/null
rushd     1447           root    1w      REG       14,5      3231 600232 /usr/local/rush/var/rushd.log
rushd     1447           root    2w      REG       14,5      3231 600232 /usr/local/rush/var/rushd.log
rushd     1447           root    3w      REG       14,5         5 600323 /usr/local/rush/var/.rushd.LCK
rushd     1447           root    4u     IPv4 0x040f1ef8       0t0    TCP *:rushd (LISTEN)
rushd     1447           root    5u     unix 0x037c8910       0t0        ->0x03fbf1c0
                                 ^^     ^^^^                             ^^^^^^^^^^^^
                                        Unix socket                      What the heck?

    I'll follow up when I've done more tracing to see what's causing that.

   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: Rush 102.42a9c and Snow Leopard
   Date: Tue, 08 Dec 2009 01:39:08 -0500
Msg# 1914
View Complete Thread (9 articles) | All Threads
Last Next
> Greg Ercolano wrote:
>> Mr. Daniel Browne wrote:
>>> The machine stops responding after the last line, which corresponds to =
>>> the knew hosts file being acknowledged:
> 
>    OK, after a phone conversation, it seems on Snow Leopard machines
>    when the rush/etc/hosts file reloads, the UDP listener is getting closed,
>    and replaced with a unix domain socket.
> 
>    Since rush doesn't use unix domain sockets, this seems a bit odd.
> 
>    Also, I can't seem to replicate this on other platforms, only snow leopard.


   Indeed, the same code running on Tiger and Leopard is fine,
   so this seems to be operating system specific only to Snow Leopard.

   I think I've found the solution now -- running some tests.

   Dan, I'll have a new binary for you Tuesday to try.

   Seems to have to do with Snow Leopard's changes to hostname resolution
   that use using unix domain sockets.

   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: Rush 102.42a9c and Snow Leopard
   Date: Fri, 11 Dec 2009 23:57:14 -0500
Msg# 1915
View Complete Thread (9 articles) | All Threads
Last Next
Looks like Daniel has confirmed the fix, so I bumped the OSX version
number to 102.42a9d (emphasis on "d") which has that fix, and modified
the 102.42a9c release page so that downloading the mac version gets you
the 'd' release binary with the fix.

So if you want the 'd' fix release, go to the public upgrade page
and download the Mac version. (ie. go to http://seriss.com/rush/
and click on the red upgrade monkeys)

Greg Ercolano wrote:
>> Greg Ercolano wrote:
>>> Mr. Daniel Browne wrote:
>>>> The machine stops responding after the last line, which corresponds to =
>>>> the knew hosts file being acknowledged:
>>    OK, after a phone conversation, it seems on Snow Leopard machines
>>    when the rush/etc/hosts file reloads, the UDP listener is getting closed,
>>    and replaced with a unix domain socket.
>>
>>    Since rush doesn't use unix domain sockets, this seems a bit odd.
>>
>>    Also, I can't seem to replicate this on other platforms, only snow leopard.
> 
> 
>    Indeed, the same code running on Tiger and Leopard is fine,
>    so this seems to be operating system specific only to Snow Leopard.
> 
>    I think I've found the solution now -- running some tests.
> 
>    Dan, I'll have a new binary for you Tuesday to try.
> 
>    Seems to have to do with Snow Leopard's changes to hostname resolution
>    that use using unix domain sockets.