From: Craig Allison <craig@(email surpressed)>
Subject: Rendered frame stuck on status "Run"
   Date: Mon, 17 Mar 2008 06:49:45 -0400
Msg# 1706
View Complete Thread (7 articles) | All Threads
Last Next
Hello!

I've just upgraded a box to Debian 4 and have installed Shake 4.1, Rush 102.42a.

The box goes online and picks up jobs fine, it will render the first allocated frame no problem and then when it picks up the next, it renders the frame and gets Shake exit code 0 
but Rush doesn't release the frame and move on.  The box will sit all night with status "Run" even though the log shows an exit code of 0.  When you login to the box it will show 2 Shake processes (1 for each cpu) taking 0% CPU.

Any ideas?

Thanks

Craig



Craig Allison

Digital Systems & Data Manager

The Senate Visual Effects

Twickenham Film Studios

St.Margarets

Twickenham

TW1 2AW


t: 0208 607 8866 

www.senatevfx.com





   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: Rendered frame stuck on status "Run"
   Date: Mon, 17 Mar 2008 09:20:20 -0400
Msg# 1707
View Complete Thread (7 articles) | All Threads
Last Next
Craig Allison wrote:
> Hello!
> 
> I've just upgraded a box to Debian 4 and have installed Shake 4.1,  
> Rush 102.42a.
> 
> The box goes online and picks up jobs fine, it will render the first  
> allocated frame no problem and then when it picks up the next, it  
> renders the frame and gets Shake exit code 0
> but Rush doesn't release the frame and move on.  The box will sit all  
> night with status "Run" even though the log shows an exit code of 0.   
> When you login to the box it will show 2 Shake processes (1 for each  
> cpu) taking 0% CPU.
> 
> Any ideas?

Hi Craig,

	Hit "All Jobs" to make sure you don't have two jobs running
	the same scene and frame range.

	It's possible the log you're looking at that shows the frame
	as 'exit 0' is not the same log as the one that is running the
	frame. A possible situation is if there are *two* jobs both
	running the same shake job and frame range; the stuck job's
	log getting overwritten by the other job that successfully
	ran on a different machine.

	If this were the case, the "Jobid" and "Hostname" field of the
	'Frames' report wouldn't match the same fields in the header
	of the frame log.

	Can you include the "Frames" report from irush showing the
	frame in the Run state, and the *complete* frame log (including
	the headers at the top, and the shake exit message at the bottom)?


-- 
Greg Ercolano, erco@(email surpressed)
Seriss Corporation
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)

   From: Craig Allison <craig@(email surpressed)>
Subject: Re: Rendered frame stuck on status "Run"
   Date: Thu, 20 Mar 2008 09:26:45 -0400
Msg# 1708
View Complete Thread (7 articles) | All Threads
Last Next
Sorry about the delay getting back on this one, I've been very busy!

The problem was intermittent but a change of kernel seems to have eradicated the problem altogether...

Thanks for your time

Regards


Craig Allison

Digital Systems & Data Manager

The Senate Visual Effects

Twickenham Film Studios

St.Margarets

Twickenham

TW1 2AW


t: 0208 607 8866 

www.senatevfx.com




On 17 Mar 2008, at 13:20, Greg Ercolano wrote:

[posted to rush.general]

Craig Allison wrote:
Hello!

I've just upgraded a box to Debian 4 and have installed Shake 4.1,  
Rush 102.42a.

The box goes online and picks up jobs fine, it will render the first  
allocated frame no problem and then when it picks up the next, it  
renders the frame and gets Shake exit code 0
but Rush doesn't release the frame and move on.  The box will sit all  
night with status "Run" even though the log shows an exit code of 0.   
When you login to the box it will show 2 Shake processes (1 for each  
cpu) taking 0% CPU.

Any ideas?

Hi Craig,

Hit "All Jobs" to make sure you don't have two jobs running
the same scene and frame range.

It's possible the log you're looking at that shows the frame
as 'exit 0' is not the same log as the one that is running the
frame. A possible situation is if there are *two* jobs both
running the same shake job and frame range; the stuck job's
log getting overwritten by the other job that successfully
ran on a different machine.

If this were the case, the "Jobid" and "Hostname" field of the
'Frames' report wouldn't match the same fields in the header
of the frame log.

Can you include the "Frames" report from irush showing the
frame in the Run state, and the *complete* frame log (including
the headers at the top, and the shake exit message at the bottom)?


-- 
Seriss Corporation
Rush Render Queue, http://seriss.com/rush/
Tel: 626-795-5922
Fax: 626-795-5947
Cel: 310-266-8906



   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: Rendered frame stuck on status "Run"
   Date: Thu, 20 Mar 2008 15:12:35 -0400
Msg# 1709
View Complete Thread (7 articles) | All Threads
Last Next
Craig Allison wrote:
> Sorry about the delay getting back on this one, I've been very busy!
> 
> The problem was intermittent but a change of kernel seems to have  
> eradicated the problem altogether...

	Thanks for the follow up, Craig..

	So the problem was with the default Debian 4 + Shake 4.1 combo.
	Interesting.

	Just curious: is it a known issue where a kernel upgrade was
	a known solution, or was it just a happy coincidence?

	If it was a known issue, guess I'd be curious to see any links
	that might have helped you, in case they cover details as to the
	cause, in case it's something I should watch out for across the
	board with linux.

-- 
Greg Ercolano, erco@(email surpressed)
Seriss Corporation
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)

   From: Craig Allison <craig@(email surpressed)>
Subject: Re: Rendered frame stuck on status "Run"
   Date: Tue, 25 Mar 2008 14:15:54 -0400
Msg# 1711
View Complete Thread (7 articles) | All Threads
Last Next
Once again apologies, I've been incredibly busy!

Looks like the kernel change was a "happy coincidence", it certainly solved my multi cpu issue (only one was seen) and it seemed that it also solved my status "Run" issue, but it looks like this was more shake script specific than an actual solution.

I'd noticed that the /tmp partition was being filled to 100% as it was relatively small so I've reconfigured that today so that's there's plenty of headroom, this has fixed the "Run" status issue but now I'm getting wildly differing render times for consecutive frames. See below:

Done 0002     3   render43    4773     render48.62     03/25,16:58:19 00:23:43 
Done 0003     3   render43    4775     render48.62     03/25,16:58:19 00:23:42 
Done 0004     3   render43    4828     render48.62     03/25,17:22:02 00:00:18 
Done 0005     3   render43    4836     render48.62     03/25,17:22:03 00:00:17 
Done 0006     3   render43    4846     render48.62     03/25,17:22:21 00:00:22 
Done 0007     3   render43    4848     render48.62     03/25,17:22:21 00:00:22 
Done 0008     3   render43    4864     render48.62     03/25,17:22:44 00:00:34 
Done 0009     3   render43    4866     render48.62     03/25,17:22:44 00:00:34 
Done 0010     3   render43    4882     render48.62     03/25,17:23:19 00:30:35 
Done 0011     3   render43    4884     render48.62     03/25,17:23:19 00:30:35 
Done 0012     3   render43    4907     render48.62     03/25,17:53:54 00:00:17 
Done 0013     3   render43    4909     render48.62     03/25,17:53:54 00:00:15 

At least the status is not sticking on "Run" but I don't understand why the frames are differing so much - I will let it run on a few more scenes before I hit the panic button!


Just though I'd leave an update!


Cheers


Craig

Craig Allison

Digital Systems & Data Manager

The Senate Visual Effects

Twickenham Film Studios

St.Margarets

Twickenham

TW1 2AW


t: 0208 607 8866 

www.senatevfx.com




On 20 Mar 2008, at 19:12, Greg Ercolano wrote:

[posted to rush.general]

Craig Allison wrote:
Sorry about the delay getting back on this one, I've been very busy!

The problem was intermittent but a change of kernel seems to have  
eradicated the problem altogether...

Thanks for the follow up, Craig..

So the problem was with the default Debian 4 + Shake 4.1 combo.
Interesting.

Just curious: is it a known issue where a kernel upgrade was
a known solution, or was it just a happy coincidence?

If it was a known issue, guess I'd be curious to see any links
that might have helped you, in case they cover details as to the
cause, in case it's something I should watch out for across the
board with linux.

-- 
Seriss Corporation
Rush Render Queue, http://seriss.com/rush/
Tel: 626-795-5922
Fax: 626-795-5947
Cel: 310-266-8906



   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: Rendered frame stuck on status "Run"
   Date: Tue, 25 Mar 2008 23:04:41 -0400
Msg# 1714
View Complete Thread (7 articles) | All Threads
Last Next
Craig Allison wrote:
> Done 0010     3   render43    4882     render48.62     03/25,17:23:19  00:30:35
> Done 0011     3   render43    4884     render48.62     03/25,17:23:19  00:30:35
> Done 0012     3   render43    4907     render48.62     03/25,17:53:54  00:00:17
> Done 0013     3   render43    4909     render48.62     03/25,17:53:54  00:00:15
> 
> At least the status is not sticking on "Run" but I don't understand  
> why the frames are differing so much - I will let it run on a few  
> more scenes before I hit the panic button!
> 
> Just though I'd leave an update!

	It does seem odd that 4 renders kick in on the box, and two
	take 30 mins to complete, and the other two take 20 seconds.

	I guess jump over to the machine while the frames are taking
	a while, and use 'strace' to see what shake is doing while
	the frames are 'stuck' running, eg:

rsh render43
ps fax				<< determine the PID of the stuck shake process
strace -p <shake_PID>		<< see what shake is doing..

	Also look in the logs while the frames are stuck to see
	if there's any way to tell what shake is doing that it's
	getting stuck on. (Turn on verbose mode if not already..)

-- 
Greg Ercolano, erco@(email surpressed)
Seriss Corporation
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)

   From: Craig Allison <craig@(email surpressed)>
Subject: Re: Rendered frame stuck on status "Run"
   Date: Wed, 30 Apr 2008 08:22:13 -0400
Msg# 1728
View Complete Thread (7 articles) | All Threads
Last Next
After a very, very thorough test of lots of different scenes/scripts I can say that this issue is resolved, the frames were definitely sticking on "Run" due to the /tmp partition being filled.

The other issue below seems to be script specific as the box has not behaved like this since!

Thanks for all your help Greg

Craig


Craig Allison

Digital Systems & Data Manager

The Senate Visual Effects

Twickenham Film Studios

St.Margarets

Twickenham

TW1 2AW


t: 0208 607 8866 

www.senatevfx.com




On 26 Mar 2008, at 03:04, Greg Ercolano wrote:

[posted to rush.general]

Craig Allison wrote:
Done 0010     3   render43    4882     render48.62     03/25,17:23:19  00:30:35
Done 0011     3   render43    4884     render48.62     03/25,17:23:19  00:30:35
Done 0012     3   render43    4907     render48.62     03/25,17:53:54  00:00:17
Done 0013     3   render43    4909     render48.62     03/25,17:53:54  00:00:15

At least the status is not sticking on "Run" but I don't understand  
why the frames are differing so much - I will let it run on a few  
more scenes before I hit the panic button!

Just though I'd leave an update!

It does seem odd that 4 renders kick in on the box, and two
take 30 mins to complete, and the other two take 20 seconds.

I guess jump over to the machine while the frames are taking
a while, and use 'strace' to see what shake is doing while
the frames are 'stuck' running, eg:

rsh render43
ps fax << determine the PID of the stuck shake process
strace -p <shake_PID> << see what shake is doing..

Also look in the logs while the frames are stuck to see
if there's any way to tell what shake is doing that it's
getting stuck on. (Turn on verbose mode if not already..)

-- 
Seriss Corporation
Rush Render Queue, http://seriss.com/rush/
Tel: 626-795-5922
Fax: 626-795-5947
Cel: 310-266-8906