From: Andrew Kingston <andrew@peerless.co.uk> Subject: jobs not picking up spare cpus Date: Fri, 25 May 2007 06:02:25 -0400 |
Msg# 1549 View Complete Thread (2 articles) | All Threads Last Next |
HiWe came across a bit of a strange problem yesterday where we had spare cpus available for jobs, but there were certain ones not picking up frames. We'd seen it before, but were never really sure whether it was down to the way people had set their jobs up or something else. However this time I was able to check all the jobs were set up correctly & I also found these kinds of messages in the log on the job server for these jobs:- ALERT Ignoring frmarb 'Run': task in unexpected state 'Idle' (expected Start|Busy) msg from ?@lafarm23:33523 * Lots of these showing up for different farm machines FAIL/LISTCPUS Fputs[2]: write failed: _SureWrite(): Broken pipeALERT Ignoring 'Idle': task in non-applicable state 'Start' for jobid lin2.928 from ?@lafarm19: lafarm19 & lafarm23 were two of the machines not picking up frames.Also I've just checked through the logs this morning & found quite a few of these types of messages on that job server:- ALERT Task 'CpuPass1' ignored for non-existant frame -99999 from ?@lafarm23:33566 Prev=lin2 0 lin2.892,091_070_tiles_v04 -99999 100 2048 JobPass Job state is 'Done' New=lin2 0 lin2.892,091_070_tiles_v04 -99999 100 2048 CpuPass2 Ram unavailable on lafarm23 (2048>0) These only appeared between 4 & 4:15 am, and I'm fairly sure no one was here rendering then... Any ideas? Cheers Andrew |
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: jobs not picking up spare cpus Date: Fri, 25 May 2007 12:40:41 -0400 |
Msg# 1550 View Complete Thread (2 articles) | All Threads Last Next |
Andrew Kingston wrote: > ALERT Ignoring frmarb 'Run': task in unexpected state 'Idle' > (expected Start|Busy) msg from ?@lafarm23:33523 * Lots of these showing > up for different farm machines > > FAIL/LISTCPUS Fputs[2]: write failed: _SureWrite(): Broken pipe > > ALERT Ignoring 'Idle': task in non-applicable state 'Start' for jobid > lin2.928 from ?@lafarm19: Can you send me some complete logs directly via email? (ie. not on the group) Not sure if these are really something to worry about or not. Regarding jobs not picking up spare cpus, focus on the 'Cpus' report for the job (check the STATE and NOTES column) and compare to the 'All Cpus' report to see what's idle vs. inuse. Send me those two reports if need be. > lafarm19 & lafarm23 were two of the machines not picking up frames. > > Also I've just checked through the logs this morning & found quite a few > of these types of messages on that job server:- > > ALERT Task 'CpuPass1' ignored for non-existant frame -99999 from ?@lafarm23:33566 > Prev=lin2 0 lin2.892,091_070_tiles_v04 -99999 100 2048 JobPass Job state is 'Done' > New=lin2 0 lin2.892,091_070_tiles_v04 -99999 100 2048 CpuPass2 Ram unavailable on lafarm23 (2048>0) > > These only appeared between 4 & 4:15 am, and I'm fairly sure no one was > here rendering then... At 5am rush runs a cleanup operation (see 'taskcleanuphours' in rush.conf), but I'm not sure why it would show as 4am instead of 5am. Would need to see some logs to tell what's up there. Is there possibly a mix of different rush versions on the network? When sending the above (in separate email), include the output of: rush -ping +any -t 3 -- Greg Ercolano, erco@(email surpressed) Rush Render Queue, http://seriss.com/rush/ Tel: (Tel# suppressed) Fax: (Tel# suppressed) Cel: (Tel# suppressed) |