From: Patrick Boucher <patrickb@(email surpressed)>
Subject: Rush didn't start jobs after system reboot
   Date: Thu, 09 Nov 2006 19:07:32 -0500
Msg# 1426
View Complete Thread (5 articles) | All Threads
Last Next
I've never ever had this happen to me and I've been using Rush for a while now.

Jobs were submitted from a workstation (3DFX037) the checkpoint was done and then the workstation rebooted.

The workstation tries to read it's checkpoint and didn't restart a bunch of jobs. Here is an excerpt of the log on the system.

Any help would be appreciated.

--
Patrick Boucher
TD - Coder - Resident geek
Buzz Image Group
Tel 514.848.0579
Fax 514.848.6371

www.buzzimage.com
www.xsi-blog.com



11/09,18:05:08 START     3dfx037 RUSHD 102.41 PID=284 Boot=11/09/06,18:05:08 Offline
11/09,18:05:08 INFO      TCP listening on port 696, service 'rushd', sockfd=196
11/09,18:05:08 INFO      UDP listening on port 696, service 'rushd', sockfd=176
11/09,18:05:08 CHECKPOINT START: Loading c:\rush/var/jobs-checkpoint
11/09,18:05:10 CHECKPOINT c:\rush/var/jobs-checkpoint: Line 124: gethostbyname_NET(ren001): unknown host
11/09,18:05:10 ALERT     *NOT* starting jobid 3dfx037.459 due to above error
11/09,18:05:10 JOB START martinp@3dfx037.459(BBJB60_006F_RENDER_01a.scn),48 frames
11/09,18:05:12 CHECKPOINT c:\rush/var/jobs-checkpoint: Line 397: gethostbyname_NET(ren001): unknown host
11/09,18:05:12 ALERT     *NOT* starting jobid 3dfx037.460 due to above error
11/09,18:05:12 JOB START martinp@3dfx037.460(BBJB60_007F_RENDER_01a.scn),56 frames
11/09,18:05:15 CHECKPOINT c:\rush/var/jobs-checkpoint: Line 670: gethostbyname_NET(ren001): unknown host
11/09,18:05:15 ALERT     *NOT* starting jobid 3dfx037.461 due to above error
11/09,18:05:15 JOB START martinp@3dfx037.461(BBJB60_007F_RENDER_01a.scn),56 frames
11/09,18:05:17 CHECKPOINT c:\rush/var/jobs-checkpoint: Line 957: gethostbyname_NET(ren001): unknown host
11/09,18:05:17 ALERT     *NOT* starting jobid 3dfx037.462 due to above error
11/09,18:05:17 JOB START martinp@3dfx037.462(BBJB60_009F_RENDER_01a.scn),70 frames
11/09,18:05:19 CHECKPOINT c:\rush/var/jobs-checkpoint: Line 1266: gethostbyname_NET(ren001): unknown host
11/09,18:05:19 ALERT     *NOT* starting jobid 3dfx037.463 due to above error
11/09,18:05:19 JOB START martinp@3dfx037.463(BBJB60_010F_RENDER_01a.scn),92 frames
11/09,18:05:21 CHECKPOINT c:\rush/var/jobs-checkpoint: Line 1575: gethostbyname_NET(ren001): unknown host
11/09,18:05:21 ALERT     *NOT* starting jobid 3dfx037.464 due to above error
11/09,18:05:21 JOB START martinp@3dfx037.464(BBJB60_010F_RENDER_01a.scn),92 frames
11/09,18:05:24 CHECKPOINT c:\rush/var/jobs-checkpoint: Line 1836: gethostbyname_NET(ren001): unknown host
11/09,18:05:24 ALERT     *NOT* starting jobid 3dfx037.465 due to above error
11/09,18:05:24 JOB START martinp@3dfx037.465(BBJB60_011F_RENDER_01a.scn),44 frames
11/09,18:05:26 CHECKPOINT c:\rush/var/jobs-checkpoint: Line 2131: gethostbyname_NET(ren001): unknown host
11/09,18:05:26 ALERT     *NOT* starting jobid 3dfx037.466 due to above error
11/09,18:05:26 JOB START martinp@3dfx037.466(BBJB60_012F_RENDER_01a...GORDON_Props_OCC,GORDON_Props_RGB.scn),78 frames
11/09,18:05:28 CHECKPOINT c:\rush/var/jobs-checkpoint: Line 2504: gethostbyname_NET(ren001): unknown host
11/09,18:05:28 ALERT     *NOT* starting jobid 3dfx037.467 due to above error
11/09,18:05:28 JOB START martinp@3dfx037.467(BBJB60_012F_RENDER_01a.scn),156 frames
11/09,18:05:30 CHECKPOINT c:\rush/var/jobs-checkpoint: Line 2877: gethostbyname_NET(ren001): unknown host
11/09,18:05:30 ALERT     *NOT* starting jobid 3dfx037.468 due to above error
11/09,18:05:30 JOB START martinp@3dfx037.468(BBJB60_012F_RENDER_01a.scn),156 frames
11/09,18:05:31 JOB START martinp@3dfx037.469(BBJB60_013F_RENDER_01a.scn),76 frames
11/09,18:05:31 JOB START martinp@3dfx037.470(BBRD_006A_Doors_LIGHTING_01b.scn),55 frames
11/09,18:05:31 JOB START martinp@3dfx037.471(BBJB60_014F_RENDER_01a.scn),61 frames
11/09,18:05:31 JOB START martinp@3dfx037.472(BBJB60_014F_RENDER_01a.scn),61 frames
11/09,18:05:31 CHECKPOINT DONE

   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: Rush didn't start jobs after system reboot
   Date: Thu, 09 Nov 2006 20:21:01 -0500
Msg# 1427
View Complete Thread (5 articles) | All Threads
Last Next
Patrick Boucher wrote:
[posted to rush.general]

I've never ever had this happen to me and I've been using Rush for a while now.

Jobs were submitted from a workstation (3DFX037) the checkpoint was done and then the workstation rebooted.

The workstation tries to read it's checkpoint and didn't restart a bunch of jobs. Here is an excerpt of the log on the system.

Any help would be appreciated.

    This is a bug that was fixed in a July 2005 release (102.42).

    From the release notes for 102.42, which describes the problem:

        o Fixed problem with loading checkpoint files.
	  On reboot, job was not loading if it contained a hostname that was no longer
	  in the rush hosts file. Example:
	
	      1) Job requests hosts a,b,c
	      2) Sysadmin removes host 'b' from hosts file
	      3) Daemon reboots
	      4) On reloading job requesting hosts "a,b,c", job fails to load
		 because 'b' is no longer a valid host.

    Looks like you're currently running 102.41, which is very old:

11/09,18:05:08 START     3dfx037 RUSHD 102.41 PID=..
                                       ^^^^^^

    When production allows, upgrade to the current version, 102.42a7,
    the upgrade is free.

    Contact me directly via email, and I'll send you the upgrade
    instructions.

--
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)

   From: Patrick Boucher <patrickb@(email surpressed)>
Subject: Re: Rush didn't start jobs after system reboot
   Date: Fri, 10 Nov 2006 10:42:36 -0500
Msg# 1428
View Complete Thread (5 articles) | All Threads
Last Next
Doh!
Do you think Buzz could get their mitts on 102.42a7?

Thanks,

--
Patrick Boucher
TD - Coder - Resident geek
Buzz Image Group
Tel 514.848.0579
Fax 514.848.6371

www.buzzimage.com
www.xsi-blog.com


Greg Ercolano wrote:
Any help would be appreciated.

     This is a bug that was fixed in a July 2005 release (102.42).

     From the release notes for 102.42, which describes the problem:

         o Fixed problem with loading checkpoint files.
	  On reboot, job was not loading if it contained a hostname that was no longer
	  in the rush hosts file. Example:
	
	      1) Job requests hosts a,b,c
	      2) Sysadmin removes host 'b' from hosts file
	      3) Daemon reboots
	      4) On reloading job requesting hosts "a,b,c", job fails to load
		 because 'b' is no longer a valid host.

     Looks like you're currently running 102.41, which is very old:

11/09,18:05:08 START     3dfx037 RUSHD 102.41 PID=..
                                        ^^^^^^

     When production allows, upgrade to the current version, 102.42a7,
     the upgrade is free.

     Contact me directly via email, and I'll send you the upgrade
     instructions.




   From: Patrick Boucher <patrickb@(email surpressed)>
Subject: Re: Rush didn't start jobs after system reboot
   Date: Fri, 10 Nov 2006 10:44:42 -0500
Msg# 1429
View Complete Thread (5 articles) | All Threads
Last Next
Should have gone to Greg directly.
Sorry for the noise.

--
Patrick Boucher
TD - Coder - Resident geek
Buzz Image Group
Tel 514.848.0579
Fax 514.848.6371

www.buzzimage.com
www.xsi-blog.com


Patrick Boucher wrote:
Doh!
Do you think Buzz could get their mitts on 102.42a7?

Thanks,


   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: Rush didn't start jobs after system reboot
   Date: Fri, 10 Nov 2006 14:07:46 -0500
Msg# 1430
View Complete Thread (5 articles) | All Threads
Last Next
Patrick Boucher wrote:
[posted to rush.general]

Should have gone to Greg directly.
Sorry for the noise.

	No, that's alright -- it might be something others have run into
	that have old releases.. it's good that it appears here.

--
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)