From: Patrick Boucher <patrickb@(email surpressed)> Subject: Rush didn't start jobs after system reboot Date: Thu, 09 Nov 2006 19:07:32 -0500 |
Msg# 1426 View Complete Thread (5 articles) | All Threads Last Next |
I've never ever had this happen to me and I've been using Rush for a
while now.
Jobs were submitted from a workstation (3DFX037) the checkpoint was done and then the workstation rebooted. The workstation tries to read it's checkpoint and didn't restart a bunch of jobs. Here is an excerpt of the log on the system. Any help would be appreciated. -- Patrick Boucher TD - Coder - Resident geek Buzz Image Group Tel 514.848.0579 Fax 514.848.6371 www.buzzimage.com www.xsi-blog.com 11/09,18:05:08 START 3dfx037 RUSHD 102.41 PID=284 Boot=11/09/06,18:05:08 Offline 11/09,18:05:08 INFO TCP listening on port 696, service 'rushd', sockfd=196 11/09,18:05:08 INFO UDP listening on port 696, service 'rushd', sockfd=176 11/09,18:05:08 CHECKPOINT START: Loading c:\rush/var/jobs-checkpoint 11/09,18:05:10 CHECKPOINT c:\rush/var/jobs-checkpoint: Line 124: gethostbyname_NET(ren001): unknown host 11/09,18:05:10 ALERT *NOT* starting jobid 3dfx037.459 due to above error 11/09,18:05:10 JOB START martinp@3dfx037.459(BBJB60_006F_RENDER_01a.scn),48 frames 11/09,18:05:12 CHECKPOINT c:\rush/var/jobs-checkpoint: Line 397: gethostbyname_NET(ren001): unknown host 11/09,18:05:12 ALERT *NOT* starting jobid 3dfx037.460 due to above error 11/09,18:05:12 JOB START martinp@3dfx037.460(BBJB60_007F_RENDER_01a.scn),56 frames 11/09,18:05:15 CHECKPOINT c:\rush/var/jobs-checkpoint: Line 670: gethostbyname_NET(ren001): unknown host 11/09,18:05:15 ALERT *NOT* starting jobid 3dfx037.461 due to above error 11/09,18:05:15 JOB START martinp@3dfx037.461(BBJB60_007F_RENDER_01a.scn),56 frames 11/09,18:05:17 CHECKPOINT c:\rush/var/jobs-checkpoint: Line 957: gethostbyname_NET(ren001): unknown host 11/09,18:05:17 ALERT *NOT* starting jobid 3dfx037.462 due to above error 11/09,18:05:17 JOB START martinp@3dfx037.462(BBJB60_009F_RENDER_01a.scn),70 frames 11/09,18:05:19 CHECKPOINT c:\rush/var/jobs-checkpoint: Line 1266: gethostbyname_NET(ren001): unknown host 11/09,18:05:19 ALERT *NOT* starting jobid 3dfx037.463 due to above error 11/09,18:05:19 JOB START martinp@3dfx037.463(BBJB60_010F_RENDER_01a.scn),92 frames 11/09,18:05:21 CHECKPOINT c:\rush/var/jobs-checkpoint: Line 1575: gethostbyname_NET(ren001): unknown host 11/09,18:05:21 ALERT *NOT* starting jobid 3dfx037.464 due to above error 11/09,18:05:21 JOB START martinp@3dfx037.464(BBJB60_010F_RENDER_01a.scn),92 frames 11/09,18:05:24 CHECKPOINT c:\rush/var/jobs-checkpoint: Line 1836: gethostbyname_NET(ren001): unknown host 11/09,18:05:24 ALERT *NOT* starting jobid 3dfx037.465 due to above error 11/09,18:05:24 JOB START martinp@3dfx037.465(BBJB60_011F_RENDER_01a.scn),44 frames 11/09,18:05:26 CHECKPOINT c:\rush/var/jobs-checkpoint: Line 2131: gethostbyname_NET(ren001): unknown host 11/09,18:05:26 ALERT *NOT* starting jobid 3dfx037.466 due to above error 11/09,18:05:26 JOB START martinp@3dfx037.466(BBJB60_012F_RENDER_01a...GORDON_Props_OCC,GORDON_Props_RGB.scn),78 frames 11/09,18:05:28 CHECKPOINT c:\rush/var/jobs-checkpoint: Line 2504: gethostbyname_NET(ren001): unknown host 11/09,18:05:28 ALERT *NOT* starting jobid 3dfx037.467 due to above error 11/09,18:05:28 JOB START martinp@3dfx037.467(BBJB60_012F_RENDER_01a.scn),156 frames 11/09,18:05:30 CHECKPOINT c:\rush/var/jobs-checkpoint: Line 2877: gethostbyname_NET(ren001): unknown host 11/09,18:05:30 ALERT *NOT* starting jobid 3dfx037.468 due to above error 11/09,18:05:30 JOB START martinp@3dfx037.468(BBJB60_012F_RENDER_01a.scn),156 frames 11/09,18:05:31 JOB START martinp@3dfx037.469(BBJB60_013F_RENDER_01a.scn),76 frames 11/09,18:05:31 JOB START martinp@3dfx037.470(BBRD_006A_Doors_LIGHTING_01b.scn),55 frames 11/09,18:05:31 JOB START martinp@3dfx037.471(BBJB60_014F_RENDER_01a.scn),61 frames 11/09,18:05:31 JOB START martinp@3dfx037.472(BBJB60_014F_RENDER_01a.scn),61 frames 11/09,18:05:31 CHECKPOINT DONE |
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: Rush didn't start jobs after system reboot Date: Thu, 09 Nov 2006 20:21:01 -0500 |
Msg# 1427 View Complete Thread (5 articles) | All Threads Last Next |
Patrick Boucher wrote: [posted to rush.general]I've never ever had this happen to me and I've been using Rush for a while now.Jobs were submitted from a workstation (3DFX037) the checkpoint was done and then the workstation rebooted.The workstation tries to read it's checkpoint and didn't restart a bunch of jobs. Here is an excerpt of the log on the system.Any help would be appreciated. This is a bug that was fixed in a July 2005 release (102.42). From the release notes for 102.42, which describes the problem: o Fixed problem with loading checkpoint files. On reboot, job was not loading if it contained a hostname that was no longer in the rush hosts file. Example: 1) Job requests hosts a,b,c 2) Sysadmin removes host 'b' from hosts file 3) Daemon reboots 4) On reloading job requesting hosts "a,b,c", job fails to load because 'b' is no longer a valid host. Looks like you're currently running 102.41, which is very old: 11/09,18:05:08 START 3dfx037 RUSHD 102.41 PID=.. ^^^^^^ When production allows, upgrade to the current version, 102.42a7, the upgrade is free. Contact me directly via email, and I'll send you the upgrade instructions. -- Greg Ercolano, erco@(email surpressed) Rush Render Queue, http://seriss.com/rush/ Tel: (Tel# suppressed) Fax: (Tel# suppressed) Cel: (Tel# suppressed) |
From: Patrick Boucher <patrickb@(email surpressed)> Subject: Re: Rush didn't start jobs after system reboot Date: Fri, 10 Nov 2006 10:42:36 -0500 |
Msg# 1428 View Complete Thread (5 articles) | All Threads Last Next |
Doh! Do you think Buzz could get their mitts on 102.42a7? Thanks, -- Patrick Boucher TD - Coder - Resident geek Buzz Image Group Tel 514.848.0579 Fax 514.848.6371 www.buzzimage.com www.xsi-blog.com Greg Ercolano wrote: Any help would be appreciated.This is a bug that was fixed in a July 2005 release (102.42). From the release notes for 102.42, which describes the problem: o Fixed problem with loading checkpoint files. On reboot, job was not loading if it contained a hostname that was no longer in the rush hosts file. Example: 1) Job requests hosts a,b,c 2) Sysadmin removes host 'b' from hosts file 3) Daemon reboots 4) On reloading job requesting hosts "a,b,c", job fails to load because 'b' is no longer a valid host. Looks like you're currently running 102.41, which is very old: 11/09,18:05:08 START 3dfx037 RUSHD 102.41 PID=.. ^^^^^^ When production allows, upgrade to the current version, 102.42a7, the upgrade is free. Contact me directly via email, and I'll send you the upgrade instructions. |
From: Patrick Boucher <patrickb@(email surpressed)> Subject: Re: Rush didn't start jobs after system reboot Date: Fri, 10 Nov 2006 10:44:42 -0500 |
Msg# 1429 View Complete Thread (5 articles) | All Threads Last Next |
Should have gone to Greg directly. Sorry for the noise. -- Patrick Boucher TD - Coder - Resident geek Buzz Image Group Tel 514.848.0579 Fax 514.848.6371 www.buzzimage.com www.xsi-blog.com Patrick Boucher wrote: Doh! Do you think Buzz could get their mitts on 102.42a7? Thanks, |
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: Rush didn't start jobs after system reboot Date: Fri, 10 Nov 2006 14:07:46 -0500 |
Msg# 1430 View Complete Thread (5 articles) | All Threads Last Next |
Patrick Boucher wrote: [posted to rush.general] Should have gone to Greg directly. Sorry for the noise. No, that's alright -- it might be something others have run into that have old releases.. it's good that it appears here. -- Greg Ercolano, erco@(email surpressed) Rush Render Queue, http://seriss.com/rush/ Tel: (Tel# suppressed) Fax: (Tel# suppressed) Cel: (Tel# suppressed) |