From: Greg Ercolano <erco@(email surpressed)> Subject: OSX 10.3.9 + Rush 102.42a -- problems with rush taking up to 15 mins Date: Sat, 19 Nov 2005 10:19:50 -0800 |
Msg# 1116 View Complete Thread (3 articles) | All Threads Last Next |
Problem Description ------------------- Specific to Rush on OSX machines. (seen on 10.3.9 with 102.42a) Customer reports after rebooting Rush would be stuck for about 15 minutes trying to access the license server, repeating in the rushd.log: 11/18,15:41:16 LICENSE select() on connect(): Connection refused 11/18,15:41:16 LICENSE no servers could validate license (30 sec retries) ...then finally, after exactly 15 minutes it would suddenly kick in by itself: 11/18,15:56:16 LICENSE validated with server CGISVR1 <-- 11/18,15:56:16 LICENSE expires 08/04/2032 11/18,15:56:16 START r120 RUSHD 102.42 PID=377 Boot=11/18/05,15:41:16 Online 11/18,15:56:16 INFO TCP listening on port 696, service 'rushd', sockfd=5 11/18,15:56:16 INFO UDP listening on port 696, service 'rushd', sockfd=6 This 15 minute delay prevents users from being to submit jobs from that machine until the daemon kicks back in. Cause ----- It was determined the problem is in the OS; at boot time the rushd service has a boot script dependency on the "Resolver" service, so as to not start before name lookups are working properly. What was happening is OSX would start "lookupd", then tell rush to start before lookupd is working fully. Debugging --------- We were able to verify name lookups were not working yet when the rush boot script ran, by adding 'ping -c 1 <local_hostname>' commands to the rush boot script. ping reported 'unknown host', making it clear OSX was prematurely invoking the rush boot script, regardless of the dependency. Solution -------- Customer modified the /usr/local/rush/etc/S99rush script to preface the starting of the daemon with an 'ipconfig waitall' command, ie: BEFORE: # Start in background, incase name lookups are slow ( cd $RUSH_DIR/var && $RUSH_DIR/bin/rushd ) & AFTER: # Start in background, incase name lookups are slow ( /usr/sbin/ipconfig waitall; cd $RUSH_DIR/var && $RUSH_DIR/bin/rushd ) & ^^^^^^^^^^^^^^^^^^^^^^^^^^^ This causes the boot script to delay starting rush until all network services have confirmed starting. Caveats ------- 10.3.9 machines have the ipconfig command, but no man pages for it. In 10.4.x, they included a man pages which clearly documents the 'waitall' option. |
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: OSX 10.3.9 + Rush 102.42a -- problems with rush taking up to Date: Wed, 07 Dec 2005 16:07:20 -0800 |
Msg# 1149 View Complete Thread (3 articles) | All Threads Last Next |
[This followup article was rescinded by the author -- its contents was incorrect -ed] CORRECTION: [..incorrect information supplied..] |
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: OSX 10.3.9 + Rush 102.42a -- problems with rush taking up to15 Date: Fri, 09 Dec 2005 04:37:11 -0800 |
Msg# 1152 View Complete Thread (3 articles) | All Threads Last Next |
Greg Ercolano wrote: Greg Ercolano wrote:Solution --------Customer modified the /usr/local/rush/etc/S99rush script to preface the starting of the daemon with an 'ipconfig waitall' command..CORRECTION: [..incorrect information supplied..] You know, I spaced on the above "correction". The original article was correct -- the mod should be made to the rush/etc/S99rush script. I'll re-edit the articles on the newsgroup to show the correct info, and remove this confusing thread :/ |