From: Dylan Penhale <dylan@(email surpressed).au>
Subject: QOS 
   Date: Fri, 24 Feb 2006 00:50:29 -0500
Msg# 1248
View Complete Thread (4 articles) | All Threads
Last Next
We have the odd one or two machines that are sometimes slow to respond to anything (including rush -pings) when they are under a very heavy render load. I know one way to solve this is to set the renders at a low run level, probably be starting it from a START / BELOWNORMAL wrapper, but I have also been thinking of QOS on the interface.

How does rush deal with QOS Packet Scheduler under windows XP if at all? My understanding is that it doesn't. By default I disable QOS Packet Scheduler in the interface on the windows machines a thinking it's mainly for streaming services but I wonder if there are any QOS- aware applications that use it on render machines?

Anyone know of any rendering based info on this?


Cheers


_________________________________________

Dylan Penhale
Systems Administrator
Fuel International



   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: QOS
   Date: Fri, 24 Feb 2006 01:57:10 -0500
Msg# 1249
View Complete Thread (4 articles) | All Threads
Last Next
Dylan Penhale wrote:
[posted to rush.general]

We have the odd one or two machines that are sometimes slow to respond to anything (including rush -pings) when they are under a very heavy render load. I know one way to solve this is to set the renders at a low run level, probably be starting it from a START /BELOWNORMAL wrapper, but I have also been thinking of QOS on the interface.

	Traffic shaping, network throttling, and QOS are all used
	to manage network bandwidth issues.

	But I believe in your case it's not a network bandwidth issue
	at hand; it sounds to me like the box is thrashing, and not
	giving any cpu to rushd.

	Rushd overusing network bandwidth (or even cpu) on a render node
	is the /last/ thing I'd expect you to see. Rushd processes on render nodes
	don't do very much.. esp when cpus are busy rendering. Rushd is usually
	just waiting for renders to finish. It has a little activity when
	a cpu becomes idle, because it wants to get a new job running on that cpu
	asap.

	Whether QOS will help or not depends on what's going on
	with the box:

		1) Does the task manager show the cpus pegged due to render
		   activity or swapping? Is memory use pegged?

		2) Is the desktop temporarily frozen or unresponsive to moving
		   windows around?

	If any or all, network QOS won't help, because the machine is
	thrashing, not giving rushd any cpu. This causes rushd to appear
	unresponsive because it's not getting any cpu to be responsive.

	I believe you indicated in a previous email that the render
	process(es) were swapping the box, overusing memory, causing
	the machine to thrash.

	When a box is thrashing, it won't yield the cpu to processes like
	rushd, because it gives priority to swapping. This is why swapping
	is such a bad thing, and machines usually act pretty badly when eg.
	when a render is overusing memory.

	The only situation I can think of where QOS would help is if the
	box's network interface is completely saturated with I/O from some
	other process (eg. rendering I/O), and you want to use QOS to increase
	priority to rush's traffic, so that rush packets have a higher priority
	than the renderer's I/O traffic, so as to be more responsive.

	But that's a fairly unlikely scenario for renders.. comps maybe,
	or realtime video. I'd only expect network bottlenecking on really
	slow network interfaces (eg. a 10MB ethernet on a 1GHz machine,
	or a 100MB ethernet for a dual proc 2GHz box doing high speed I/O)

	A machine that's network bound usually has cpus that are /not/ pegged,
	because they're all waiting on I/O for the network bottleneck to clear.

How does rush deal with QOS Packet Scheduler under windows XP if at all?

	It deals with QOS the same way it deals with a network that drops
	packets. (It's the same thing really)

	Traffic shaping etc is the applied logic of dropping or delaying
	packet delivery to implement network bandwidth control. Kind of like
	how a traffic light delays traffic to allow cross traffic through
	(packet delay), the QOS can delay packet delivery. And when traffic
	gets *really* snarled, the QOS steam shovel appears, plowing the
	snarled traffic off a cliff (ie. drops packets in favor of allowing
	cross traffic to flow more smoothly)

My understanding is that it doesn't. By default I disable QOS Packet Scheduler in the interface on the windows machines a thinking it's mainly for streaming services but I wonder if there are any QOS-aware applications that use it on render machines?

	If a packet rush sends doesn't reach the remote, it tries again
	later. (on the order of a few seconds), same as other network
	applications.

	But if the rushd application isn't getting any cpu, the remote
	will just keep trying to contact it until it times out (rush -ping),
	or until rushd eventaully responds.

	Ethernet technology is inherently error prone; dropped packets are
	'life as usual' on any ethernet. Packet loss is part of the ethernet
	design. All applications (including rush) have to deal with it
	seamlessly.

	Rush throttles back when a network appears to be lossy; this is
	what the <backoff_rate> and backoff_min/max values control, to prevent
	further saturating an already saturated network. When the network
	becomes responsive again, the backoff rates drop back to normal.

--
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Cel: (Tel# suppressed)
Fax: (Tel# suppressed)

   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: QOS
   Date: Fri, 24 Feb 2006 02:01:33 -0500
Msg# 1250
View Complete Thread (4 articles) | All Threads
Last Next
I know one way to solve this is to set the renders at a low run level, probably be starting it from a START /BELOWNORMAL wrapper, but I have also been thinking of QOS on the interface.

	BTW, changing [cpu scheduling] priority probably won't change
	anything either if the box is thrashing, because at that point
	it's not the render process that's using the cpu, it's the swapper.

	When a box is swapping, [most] process scheduling is put on hold
	while the box tries to stabilize memory.

	If a render process keep's requesting more memory resources,
	the kernel has to swap stuff out to make room, and if the
	renderer keeps doing this, the box just spends all its time
	doing high priority swapping.


--
Greg Ercolano, erco@(email surpressed)
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Cel: (Tel# suppressed)
Fax: (Tel# suppressed)

   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: QOS
   Date: Sat, 18 Mar 2006 15:18:41 -0500
Msg# 1256
View Complete Thread (4 articles) | All Threads
Last Next
I have also been thinking of QOS on the interface.

     Followup: since QoS is part of linux kernel routing, there's a great
     "HOW-TO" doc on "Linux Advanced Routing & Traffic Control" here:

     HTML: http://lartc.org/lartc.html
      PDF: http://lartc.org/lartc.pdf

     ..good deep stuff. For bandwidth control, skip to Chap. 9 on "Queueing Disciplines".
     Of such docs will be moot after Apple spends 10 man years wrapping all this up
     in a pretty little GUI that a 5 year old could navigate, replete with interactively
     resizable network valves, drop shadows and shiny buttons. :/

     QoS is useful not so much for Rush as it is for file system traffic management,
     so that workstation traffic gets immediate attention by the file server over
     farm traffic, to decrease perceived 'file server slowness' to interactive users
     when the farm is rendering full bore.

     Most of you trying to do bandwidth control likely have *large* networks in the
     150 - 700 machine range, have NetApps, and are doing traffic control either
     at your file server's interfaces, or on your master switch.

     It's unlikely anyone needing traffic control will be using a linux file server,
     as that box probably melted long ago when you got above 30 machines on your net ;)

     But the above docs are, if nothing else, a good intro to the concepts.

     I would think the best scenario is where you split the farm and workstation
     traffic to separate subnets, and assign them separate interfaces on the
     file server, giving priority workstation interface when the farm interface
     reaches some high watermark, or when the workstation requests are taking
     too long to resolve.

     Or if not that, QoS management at the switch would probably be the second choice,
     such that any workstation traffic triggers a choke algorithm if contention
     for the file server is greater than some percentage of overall bandwidth.