Jon Herman wrote:
[posted to rush.general]
I would like to be able to create a rush group that would limit the
number of CPUs to be used in a specific job.
Here's my problem: I need to be able to use all of the CPUs on a host
for rendering with XSI, but I also need to be able to use those same
muli-cpu hosts with an application that work using only one CPU.
So, I'd like to retain the multiple processors for XSI jobs, but use a
single processor for another type of job, on the same set of hosts. I
know I can describe the host's CPUs in the rush.conf file, but is there
a way to make rush think the host only has one processor when it's using
a specific group?
Hi Jon,
It sounds like you want a single XSI frame to take up the whole
machine when it runs, blocking other jobs from using the other cpus,
and causing rush to only run one instance of XSI.
And when single thread non-XSI frames are running on the cpus,
they just take up one cpu, each, such that several frames can
run on a machine, one per cpu.
There are a few techniques defined here:
http://www.seriss.com/rush-current/rush/rush-techniques.html#Threading
None of these approaches are 'pretty'; the issue is that for rush
to do it properly, when a cpu becomes available, rush would have to
hold it available until the OTHER cpu also freed up, so that both
cpus would be free when the XSI job runs. Currently rush doesn't
hold a cpu free to wait for other cpu(s) to free up to run a job.
But I see no way to do it without doing that.
The 'using ram to reserve cpus' approach (#1 in the above) comes close,
but the job will only take a cpu if both cpus are available. You can
submit the XSI job with a higher priority than others using the 'k' flag,
to ensure it first bumps other jobs out of the way, so that it can secure
both processors.
For instance; say all the machines on your farm are configured with
4096 of ram in rush (ie. the 'RAM' field in the rush/etc/hosts file
are all set to 4096), then submitting an XSI job with:
# SUBMIT XSI JOB
rush -submit << EOF
:
ram 4096
cpus +any=5@10k
:
EOF
..will cause the job to request to use all the ram on each machine,
and submits asking for 5 cpus at 10k priority.
This way if two processors on a machine are each rendering single
threaded maya jobs at a lower priority, the above XSI job will bump
those two maya jobs out of the way. because:
> the 10k (the k=kill) will ensure other jobs are cleared off,
because this job will kill off other lower priority jobs
to clear up enough ram to run this one
> the "ram 4096" guarantees all ram will be reserved to this job's
frame, preventing other jobs from jumping in, and also preventing
this job from using more than one cpu on each machine
For instance, here's a maya job using both processors of all machines
on a small network of 4 machines, each with dual procs, running at a
priority of 5:
[erco@ontario] : rush -lac
HOST OWNER JOBID TITLE FRAME PRI PID ELAPSED REMARKS
ontario erco ontario.56 MAYA_JOB 0007 5 7392 00:05:02
ontario erco ontario.56 MAYA_JOB 0008 5 7394 00:05:02
rotwang erco ontario.56 MAYA_JOB 0001 5 29699 00:05:03
rotwang erco ontario.56 MAYA_JOB 0004 5 29701 00:05:03
meade erco ontario.56 MAYA_JOB 0002 5 32204 00:05:03
meade erco ontario.56 MAYA_JOB 0003 5 32206 00:05:03
tower erco ontario.56 MAYA_JOB 0005 5 5062 00:05:03
tower erco ontario.56 MAYA_JOB 0006 5 5063 00:05:03
Now I submit an XSI job asking for all the ram on each machine (4096)
and asking for +any=3@10k, and ram of 4096:
# SUBMIT XSI JOB
rush -submit << EOF
:
ram 4096
cpus +any=5@100k
:
EOF
As soon as the job is submitted, 3 of the 4 machines will get their maya
frames bumped (and requeued), putting the XSI job in their place, one XSI
frame per machine, leaving the other cpu on each machine unavailable:
[erco@ontario] : rush -lac
HOST OWNER JOBID TITLE FRAME PRI PID ELAPSED REMARKS
ontario erco ontario.58 XSI 0002 10k 7461 00:00:09
ontario - - - - - - Online
rotwang erco ontario.58 XSI 0001 10k 29712 00:00:10
rotwang - - - - - - Online
meade erco ontario.58 XSI 0003 10k 32214 00:00:09
meade - - - - - - Online
tower erco ontario.56 MAYA_JOB 0005 5 5062 00:14:36
tower erco ontario.56 MAYA_JOB 0006 5 5063 00:14:36
When you look at the ram available on the machines running XSI,
you'll see the XSI job is taking all the ram, leaving none for
other jobs, preventing other jobs from sneaking in:
[erco@ontario] : rush -ramlist rotwang
STATE JOBID/TITLE PRI RAMUSE NOTES
Busy ontario.58,XSI 10k 4096 <-- asking for all the ram
------
4096
Total ram on rotwang: 4096
Available ram on rotwang: 0 <-- no ram available for other jobs to use the other cpu
Note how only "tower" has two MAYA jobs running; the other 3 machines
are taken over by the XSI job, with only one cpu busy each.
This is not a perfect solution, but it does get you what you want.
Or you can use the 'reserve' approach (#3 in the above link) where
you might make a +xsi group, and then reserve the extra processors
on each machine with a 'sleep' job, and submit the XSI frames to just
that +xsi group.
--
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Cel: (Tel# suppressed)
Fax: (Tel# suppressed)
|