Hi Greg,
Need some more info: regarding the unresponsive machines 'loaner5'
and 'loaner16', do you think rush is slow because the rush daemon
is busy,
or because the machine is thrashing due to rendering?
It's important to determine if the rushd is busy, or if the machine
is busy due to rendering.
Ah yes. It would be getting hammered due to rendering - of course -
if the machine is maxed out rendering, it will be slow to respond to
rush! :)
I checked loaner16 - it is reporting 99% of the CPU as being utilised
by mayabatch.exe, so it's definitely running slowly because of that.
When a machine is not being responsive to 'rush -ping', try ssh/
rsh'ing
over to that machine and look at 'top' and/or the output of eg.
'vmstat 3'.
Is rushd using up all the cpu, or is a render? Is the machine
swapping due
to unavailable ram? Does rsh/ssh not even respond when trying to
connect
to the machine? If so, the renders may be using too much in the way of
ram resources, swapping the machine to death.
I didn't think to check machine performance as it hasn't really been
a problem before - most of our dedicated render machines are dual-cpu
hosts though, so it could be why they are a bit more responsive
(loaner16 is a rental single-cpu host)...thanks for pointing that out
as an issue as I didn't think to check load on the troublesome hosts
themselves.
Or, possibly rush is being kept busy; what is the successful output of
'rush -tasklist loaner16? If the list is huge, possibly users are
submitting
with too many +any specifications. For instance if there are 250
jobs each asking for:
+any=3@200 +any=5@150 +any=10@100 +any=20@50
..that will make four entries on each host, multiplying the complexity
to rush by 4 (4 specs per job * 250 jobs = 1000 active tasks)
..consider instead using just a two tier submissions:
+any=3@200 +any=20@50
When I run the rush -tasklist command on loaner16 I get a list with
almost 420 entries. Does that sound reasonable?
Our rush license server is often very heavily loaded up (it also
serves files) - could this be a factor?
Not likely, as the rushd daemons only communicate with the license
server on boot.
Unless, that is, your license server is also acting as a job server
for
jobs (ie. submitting jobs to the license server, such that jobs
have jobids
with the license server's hostname in them)
That should be OK then - the server is a license host only - we host
the jobs on some other machines.
Thank you for your help!
---
Luke Cole
Systems Administrator / TD
FUEL International
65 King St., Newtown, Sydney NSW, Australia 2042
|