From: "Abraham Schneider" <aschneider@(email surpressed)>
Subject: Multicore workstations and clever farm use
   Date: Tue, 29 Sep 2009 10:54:03 -0400
Msg# 1898
View Complete Thread (4 articles) | All Threads
Last Next
Hi there!

With all the 'new' multicore workstations we have now with 4, 8 or 16 cores, I'm thinking about using them in the renderfarm as well. But I can't find a good solution to do this at the moment. Problem is:

I defined all the machines in the hosts file to be a 'cluster' of 4 cores. So a machine with 4 real cores is listed as 1 Cpu in the hosts file, a machine with 8 cores is listed as 2 Cpus. And in Shake I start my jobs with '-cpus 4' to use 4 cores to render on one frame.

That way of working is fine when I don't use the workstation at all and want to use it completely in the farm. But for machines with 8 or more cores there would be another nice scenario:

Sometimes it would be really nice to use 4 of the cores to work in the GUI and the other 4 cores to render in with rush.

Is there any clever way to handle this in rush? At the moment the only way I see is to edit the hosts file and change the 'cpus' value. But I have to do this every time an operator wants to change from 'all cores to the farm' to 'only some cores to the farm' and back.

Really perfect would be an advanced 'onrush' where I have more than two options: not online, partially online, fully online. Or something similar.

How do you all handle this?

Thanks, Abraham

--
Abraham Schneider
Senior VFX Compositor, 2 D Artist
ARRI Film & TV Services GmbH   Tuerkenstr. 89   D-80799 Muenchen / Germany
Phone (Tel# suppressed) EMail aschneider@(email surpressed)
www.arri.de/filmtv
ARRI Film & TV Services GmbH
Sitz: München  -  Registergericht: Amtsgericht München  -  Handelsregisternummer: HRB 69396
Geschäftsführer: Franz Kraus; Dr. Martin Prillmann; Thomas Till

   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: Multicore workstations and clever farm use
   Date: Tue, 29 Sep 2009 22:21:07 -0400
Msg# 1899
View Complete Thread (4 articles) | All Threads
Last Next
Abraham Schneider wrote:
> Sometimes it would be really nice to use 4 of the cores to work in the
> GUI and the other 4 cores to render in with rush.

	'rush -reserve' is made for this purpose:
	http://www.seriss.com/rush-current/rush/rush-command-line-options.html#-reserve

USING RESERVE JOBS
------------------
	The CPUS number in the rush/etc/hosts file is really to be thought of
	as the number of instances of the renderer you want to run, so if you
	have '4' in the rush/etc/hosts file, and want to reserve 2 for GUI use,
	you could say:

		rush -reserve localhost=2@200

	..which says reserve 2 of rush's 'cpus' at 200 priority on the local machine.
	This means no job under 200 priority can use those two procs, even if its
	a 'killer' job. If you want to make it so no job can run, use 999.

	'rush -reserve' creates a 'dummy sleep job' that holds the processors
	to prevent other renders from using them. Just dump the job to get rid
	of the reservation.

PROCESSOR AFFINITY
------------------
	However, if you've got multithreaded jobs and are specifying fixed
	numbers on the command line (eg. "shake -cpus 4") then threads might
	still leak into using processors you want "reserved". In such cases
	it can get tricky if you're trying to split hairs.

	The desire here is to really lock physical processors to a particular
	user. eg. if 'fred' is logged in, he wants to make sure that of the
	8 processors on his machine, 4 can ONLY be used by him, so that even
	if a multithreaded shake job jumps on the machine, there'd be no
	chance that his 4 procs get used by the renders, no matter how many
	threads are running.

	Under Windows there's "Processor Affinity" which lets one ensure
	a program only uses eg. 1 processor. One way to do this is with
	the Windows 'start' command, eg. "start /AFFINITY <value>",
	where <value> indicates which processors to lock a command to,
	for instance:

		START /AFFINITY 3 shake -exec /path/to/your.shk -t 1-69

	..which will 'lock' the shake process to cpus #0 and #1. The
	/AFFINITY value is a "bit field" which breaks down this way:

		Affinity
		Value		Cpu#	Binary Value
		--------	----	------------
		1 		0	0001
		2		1	0010
		3		0,1	0011
		4		2	0100
		5		0,2	0101
		6		1,2	0110
		7		0,1,2	0111
		:
		16		3	1000
		17		3,0	1001
		:
		etc.

	There's a chart here:
	http://technet.microsoft.com/en-us/library/cc778499%28WS.10%29.aspx

	There are some other techniques you can do too.

	To simplify things, usually all people want to do is use their
	own machine for rendering, and not allow others to use it.

WORKSTATION SPECIFIC RENDERING
------------------------------
	To do this, a common technique is to use 'rush -reserve' to
	reserve the entire machine at some high priority, say @899.
	Then when you want to render on your machine, submit a job
	that asks for your machine (eg. 'tahoe') at a priority of 900k, eg:

		Cpus: tahoe=2@900k

	..and setting e.g. the -cpus for shake to the number of threads
	you want, so that you don't use the entire machine.

	This will let your render bump the reservation job so that it
	can run, and when the render's done, the reserve job will maintain
	a hold on the processor again.

	If you're trying to set things up so that jobs can run on the
	farm AND on your workstation, but on the workstation, jobs never
	use more threads than you want, you'd have to modify the logic
	of the submit scripts to clamp eg. shake's "-cpus #" so that
	if the render is running on that machine, the -cpus number gets
	limited down to the value you want.

	Also, you can use MINPRI column of the hosts file is something
	you can use as well to put a 'wall' on the jobs the workstations
	accept. The 'MINPRI' value sets a minimum priority a job needs
	to render on that machine. This way the machine's processors will
	remain idle unless a job asks for this machine with a high enough
	priority. You can establish policies that allow users to only submit
	to their own machines with a priority sufficient to render on their
	own workstation, while keeping other renders away.

	If you need more specifics on any of these, let me know.

-- 
Greg Ercolano, erco@(email surpressed)
Seriss Corporation
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)

   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: Multicore workstations and clever farm use
   Date: Wed, 30 Sep 2009 10:23:25 -0400
Msg# 1900
View Complete Thread (4 articles) | All Threads
Last Next
	BTW, I added some screenshots below that show how assigning
	affinity really does force renders to use specific processors.

        ** NOTE ** 
        ** Do not casually use this feature. Locking renders to particular      **
        ** physical processors usually does NOT get you what you want, and      **
        ** it can really hobble the OS from making smart scheduling decisions.  **
        **                                                                      **
        ** Just because the background renders are prevented from using         **
        ** certain procs to keep them available to the interactive user,        **
        ** it doesn't mean the machine won't still 'feel sluggish'.             **
        ** Even if cpus are 'free', they might not be able to do much if the    **
        ** machine's I/O bus is jammed with render activity.                    ** 
        **                                                                      **
        ** So in other words, the brain might be free, but the hands are busy.  **

	See "SCREENSHOTS" below.

> PROCESSOR AFFINITY
> ------------------
>       [..]
>
> 	Under Windows there's "Processor Affinity" which lets one ensure
> 	a program only uses eg. 1 processor.

	You can also force a program/render to use only 2 processors,
	or even 4. And, you can specify precisely /which/ processors
	to use.

	"Processor Affinity" is an attribute of the process, and is
	also passed down to child processes. So an affinity assigned
	to a process, it will affect all child processes it creates.

	This means if you tell rushd to have an affinity, then all
	renders rushd starts will also have that same affinity.

	So say you have a 4 processor machine, and you want to force
	rush jobs on that machine to only use physical processors
	0 and 1. This would ensure processors 2 and 3 would be completely
	available for interactive use.

	To do this, you can go into the Task Manager, right click on the
	'rushd.exe' process, and assign it an affinity for cpus #0 and #1.

	Then all renders rush starts will be forced to /only/ use those
	two processors; the other two processors will be idle, and
	available for interactive desktop use.

	CAVEAT: Keep in mind that if the render makes heavy use of the
	hard disk, ram, or network, your "interactive use" may suffer
	at the limitations of those resources. So in other words, you may
	still 'feel' slowness if your interactive use involves any of
	those resources. Remember that a program's responsiveness is not
	/only/ due to cpu availability. I/O bandwidth is also at play.

>	One way to do this is with
> 	the Windows 'start' command, eg. "start /AFFINITY <value>",
> 	where <value> indicates which processors to lock a command to..

	The DOS 'START' command is nice if you want the render script
	to control processor use.

	Note that the START command's 'AFFINITY' flag is only available
	on machines with multiple processors.

SCREENSHOTS
-----------
	Here are some screenshots that show how affinity really forces
	a process to use the processors you specify, no matter how many
	threads that process starts.

	In this case I'm running a program called 'createthread' which
	expects two parameters; a number of threads to start, and a number
	of seconds to run.

	In the following three screenshots, I'm running 'createthread 2 60'
	which starts two threads running for 60 seconds. In these 3 examples,
	I'm changing /only/ the affinity. Each screenshot includes the
	Task Manager's cpu graph, so you can see how the program makes use
	of the processors.

	Here are the three examples:

	1) No affinity (use any cpus):
	   http://seriss.com/rush/misc-docs/affinity-windows/affinity-none-threads-2.png

	2) Affinity of 1 (use cpu#1):
	   http://seriss.com/rush/misc-docs/affinity-windows/affinity-1-threads-2.png

	3) Affinity of 2 (use cpu#2)
	   http://seriss.com/rush/misc-docs/affinity-windows/affinity-2-threads-2.png

	Note that for "no affinity" both cpus are pegged 100%, total cpu usage is 100%.
	For "affinity 1", only cpu #1 is pegged, total cpu usage is only 50%.
	For "affinity 2", only cpu #2 is pegged, total cpu usage is only 50%.

	This makes it clear we can control precisely which processors
	get assigned to the process, and that regardless of the number
	of threads the process starts, only the processors with and
	affinity assigned are actually used.

> 	for instance:
> 
> 		START /AFFINITY 3 shake -exec /path/to/your.shk -t 1-69
> 
> 	..which will 'lock' the shake process to cpus #0 and #1.

	And I should add to prevent START from opening another window,
	you can use:

		START /AFFINITY 3 /B /WAIT shake -exec /path/to/your.shk -t 1-69
		                  --------

	CAVEATS: one general problem with 'START'; it has a bug where it does
	not pass the exit code of the process back to the shell. The exit code
	returned is always 0, even if the process (in this case, shake) returns
	a non-zero exit code. So if you tried to use this in a script, it would
	not be able to tell if shake failed or succeeded. Kinda annoying.

>       The /AFFINITY value is a "bit field" which breaks down this way:
> 
> 		Affinity
> 		Value		Cpu#	Binary Value
> 		--------	----	------------
> 		1 		0	0001
> 		2		1	0010
> 		3		0,1	0011
> 		4		2	0100
> 		5		0,2	0101
> 		6		1,2	0110
> 		7		0,1,2	0111
> 		:
> 		16		3	1000
> 		17		3,0	1001
> 		:
> 		etc.

	Correction -- a few mistakes in the above table
	the entries for 16 and 17 are wrong in two ways:

	First, my binary digits for 16 + 17 are wrong, 
        and second, apparently the START command's /AFFINITY
        parameter's value expects *hex*.

	So a corrected table would be:

 		Affinity
 		Value		Cpu#		Binary Value
 		--------	----		------------
 		1 		0		00001
 		2		1		00010
 		3		0,1		00011
 		4		2		00100
 		5		0,2		00101
 		6		1,2		00110
 		7		0,1,2		00111
 		8               3       	01000
                :		:		  :
		f               0,1,2,3 	01111
 		10		4		10000
 		11		4,0		10001
 		:
 		etc.

	So if you have an 8 processor machine, and want a render
	to only use the first 4 processors (0,1,2,3), then use an
	affinity of 'f'. (Hex counts 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f,10,11,12..)

   [[EDIT: ADDED THE ABOVE 'NOTE' WARNING - 11/16/2010 -erco]]

   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: Multicore workstations and clever farm use
   Date: Thu, 01 Oct 2009 17:15:27 -0400
Msg# 1905
View Complete Thread (4 articles) | All Threads
Last Next
Greg Ercolano wrote:
> 	[..] you can go into the Task Manager, right click on the
> 	'rushd.exe' process, and assign it an affinity for cpus #0 and #1.
> 
> 	Then all renders rush starts will be forced to /only/ use those
> 	two processors; the other two processors will be idle, and
> 	available for interactive desktop use.

   I played around with this a bit yesterday, and it seems to work
   quite well for restricting processors, and this is something you
   all can do with the current release of rush you have.

   In fact, the above should be able to work for all render managing
   software, or any processes, including interactive renders.

   I decided to add an option into the current alpha release of Rush
   (that will go into beta very soon [RELEASED AS 102.42a9c -ed])
   where you can include an optional "processor affinity" setting 
   in the far right column of the Rush hosts file, eg:

#Host		   Cpus	Ram	MinPri Criteria/Hostgroups                                   Affinity
#-------------	   ----	-----	------ -------------------                                   --------
ta                 4    100     0      +any,+erco,+linux,linux,linux6.0,intel,dante,+farm    -
geneva             1	100	0      +any,+w2k,+work                                       affinity=1
meade              8    100     0      +any,+erco,+linux,linux                               -	
superior           2    100     0      +any,+erco,+winxp,winxp                               affinity=3
[..]

   ..the new 'Affinity' field at the far right being the new option.

   Here it shows host 'geneva' that has 2 physical processors, but rush
   is set up to start only one instance of a render, and will restrict
   rendering only to cpu#0 (due to affinity=1), leaving cpu#1 available
   for interactive use.

   Also, 'superior' which has 4 physical processors, but will only start
   two instances of renders, and those renders will be restricted to only
   using physical cpus #0 and #1 (due to affinity=3).

   This way the new release of rush will be able to let the admin force
   it to only use certain physical processors.

   Note the affinity value is a *bit field* that represents each
   physical processor. So for instance 'affinity=4' does NOT mean 
   "use 4 procs", it means "use only processor #2". (See table below).

        Affinity                                             -- CPU# --   TOTAL
        Value (hex)     Cpu#            Binary Value         0 1 2 3 4 5  #CPUS
        -----------     ----            ------------         - - - - - -  -----
        1               0               00001                X - - - - -    1
        2               1               00010                - X - - - -    1
        3               0,1             00011                X X - - - -    2
        4               2               00100                - - X - - -    1
        5               0,2             00101                X - X - - -    2
        6               1,2             00110                - X X - - -    2
        7               0,1,2           00111                X X X - - -    3
        8               3               01000                - - - X - -    1
        :               :                 :
        f               0,1,2,3         01111                X X X X - -    4
        10              4               10000                - - - - X -    1
        11              4,0             10001                X - - - X -    2
        :
        etc.

   Also note the affinity value is in *HEX*.
   This is to be consistent with the syntax of Microsoft's 
   'START /AFFINITY' command.

   So if you set 'affinity=3', "rushtop" will show all rush rendering activity
   locked to the first and second cpu bars (cpu #0 and #1).  You will see this
   same 'locked' behavior in the "Task Manager" as well.

   In my tests with the above alpha release, the task manager showed
   multi-threaded renders started by rushd were indeed restricted
   precisely to the cpus the 'affinity=' setting defined.

   ** NOTE                                                                   **
   **    However, practically, I'm not sure how useful this is.              **
   **    Despite the restriction leaving cpus completely idle,               **
   **    while renders were running, I still could "feel" slowness from the  **
   **    machine during interactive use, even though the 'render' was only   **
   **    doing a sin() computation in a tight loop, not using any I/O, ram,  **
   **    or networking.                                                      **
   **                                                                        **
   **    So even when you get the control you want, it may not get you the   **
   **    results you want. So forcing cpus to be idle won't necessarily      **
   **    get you responsiveness. It probably depends on the mother board     **
   **    I/O config and physical cpus (cores vs cpus) configurations.        **

   [[EDIT: MADE THE ABOVE NOTE IN BOLD          - 11/16/2010 -erco]]
   [[EDIT: ADDED RELEASE VERSION CLARIFICATIONS - 11/16/2010 -erco]]
   [[EDIT: ADDED TABLE OF AFFINITY VALUES       - 11/17/2010 -erco]]

-- 
Greg Ercolano, erco@(email surpressed)
Seriss Corporation
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)