From: Dylan Penhale <dylan@(email surpressed).au> Subject: QOS Date: Fri, 24 Feb 2006 00:50:29 -0500 |
Msg# 1248 View Complete Thread (4 articles) | All Threads Last Next |
We have the odd one or two machines that are sometimes slow to
respond to anything (including rush -pings) when they are under a
very heavy render load. I know one way to solve this is to set the
renders at a low run level, probably be starting it from a START /
BELOWNORMAL wrapper, but I have also been thinking of QOS on the
interface.
How does rush deal with QOS Packet Scheduler under windows XP if at all? My understanding is that it doesn't. By default I disable QOS Packet Scheduler in the interface on the windows machines a thinking it's mainly for streaming services but I wonder if there are any QOS- aware applications that use it on render machines? Anyone know of any rendering based info on this? Cheers _________________________________________ Dylan Penhale Systems Administrator Fuel International |
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: QOS Date: Fri, 24 Feb 2006 01:57:10 -0500 |
Msg# 1249 View Complete Thread (4 articles) | All Threads Last Next |
Dylan Penhale wrote: [posted to rush.general]We have the odd one or two machines that are sometimes slow to respond to anything (including rush -pings) when they are under a very heavy render load. I know one way to solve this is to set the renders at a low run level, probably be starting it from a START /BELOWNORMAL wrapper, but I have also been thinking of QOS on the interface. Traffic shaping, network throttling, and QOS are all used to manage network bandwidth issues. But I believe in your case it's not a network bandwidth issue at hand; it sounds to me like the box is thrashing, and not giving any cpu to rushd. Rushd overusing network bandwidth (or even cpu) on a render node is the /last/ thing I'd expect you to see. Rushd processes on render nodes don't do very much.. esp when cpus are busy rendering. Rushd is usually just waiting for renders to finish. It has a little activity when a cpu becomes idle, because it wants to get a new job running on that cpu asap. Whether QOS will help or not depends on what's going on with the box: 1) Does the task manager show the cpus pegged due to render activity or swapping? Is memory use pegged? 2) Is the desktop temporarily frozen or unresponsive to moving windows around? If any or all, network QOS won't help, because the machine is thrashing, not giving rushd any cpu. This causes rushd to appear unresponsive because it's not getting any cpu to be responsive. I believe you indicated in a previous email that the render process(es) were swapping the box, overusing memory, causing the machine to thrash. When a box is thrashing, it won't yield the cpu to processes like rushd, because it gives priority to swapping. This is why swapping is such a bad thing, and machines usually act pretty badly when eg. when a render is overusing memory. The only situation I can think of where QOS would help is if the box's network interface is completely saturated with I/O from some other process (eg. rendering I/O), and you want to use QOS to increase priority to rush's traffic, so that rush packets have a higher priority than the renderer's I/O traffic, so as to be more responsive. But that's a fairly unlikely scenario for renders.. comps maybe, or realtime video. I'd only expect network bottlenecking on really slow network interfaces (eg. a 10MB ethernet on a 1GHz machine, or a 100MB ethernet for a dual proc 2GHz box doing high speed I/O) A machine that's network bound usually has cpus that are /not/ pegged, because they're all waiting on I/O for the network bottleneck to clear. How does rush deal with QOS Packet Scheduler under windows XP if at all? It deals with QOS the same way it deals with a network that drops packets. (It's the same thing really) Traffic shaping etc is the applied logic of dropping or delaying packet delivery to implement network bandwidth control. Kind of like how a traffic light delays traffic to allow cross traffic through (packet delay), the QOS can delay packet delivery. And when traffic gets *really* snarled, the QOS steam shovel appears, plowing the snarled traffic off a cliff (ie. drops packets in favor of allowing cross traffic to flow more smoothly) My understanding is that it doesn't. By default I disable QOS Packet Scheduler in the interface on the windows machines a thinking it's mainly for streaming services but I wonder if there are any QOS-aware applications that use it on render machines? If a packet rush sends doesn't reach the remote, it tries again later. (on the order of a few seconds), same as other network applications. But if the rushd application isn't getting any cpu, the remote will just keep trying to contact it until it times out (rush -ping), or until rushd eventaully responds. Ethernet technology is inherently error prone; dropped packets are 'life as usual' on any ethernet. Packet loss is part of the ethernet design. All applications (including rush) have to deal with it seamlessly. Rush throttles back when a network appears to be lossy; this is what the <backoff_rate> and backoff_min/max values control, to prevent further saturating an already saturated network. When the network becomes responsive again, the backoff rates drop back to normal. -- Greg Ercolano, erco@(email surpressed) Rush Render Queue, http://seriss.com/rush/ Tel: (Tel# suppressed) Cel: (Tel# suppressed) Fax: (Tel# suppressed) |
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: QOS Date: Fri, 24 Feb 2006 02:01:33 -0500 |
Msg# 1250 View Complete Thread (4 articles) | All Threads Last Next |
I know one way to solve this is to set the renders at a low run level, probably be starting it from a START /BELOWNORMAL wrapper, but I have also been thinking of QOS on the interface. BTW, changing [cpu scheduling] priority probably won't change anything either if the box is thrashing, because at that point it's not the render process that's using the cpu, it's the swapper. When a box is swapping, [most] process scheduling is put on hold while the box tries to stabilize memory. If a render process keep's requesting more memory resources, the kernel has to swap stuff out to make room, and if the renderer keeps doing this, the box just spends all its time doing high priority swapping. -- Greg Ercolano, erco@(email surpressed) Rush Render Queue, http://seriss.com/rush/ Tel: (Tel# suppressed) Cel: (Tel# suppressed) Fax: (Tel# suppressed) |
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: QOS Date: Sat, 18 Mar 2006 15:18:41 -0500 |
Msg# 1256 View Complete Thread (4 articles) | All Threads Last Next |
I have also been thinking of QOS on the interface. Followup: since QoS is part of linux kernel routing, there's a great "HOW-TO" doc on "Linux Advanced Routing & Traffic Control" here: HTML: http://lartc.org/lartc.html PDF: http://lartc.org/lartc.pdf ..good deep stuff. For bandwidth control, skip to Chap. 9 on "Queueing Disciplines". Of such docs will be moot after Apple spends 10 man years wrapping all this up in a pretty little GUI that a 5 year old could navigate, replete with interactively resizable network valves, drop shadows and shiny buttons. :/ QoS is useful not so much for Rush as it is for file system traffic management, so that workstation traffic gets immediate attention by the file server over farm traffic, to decrease perceived 'file server slowness' to interactive users when the farm is rendering full bore. Most of you trying to do bandwidth control likely have *large* networks in the 150 - 700 machine range, have NetApps, and are doing traffic control either at your file server's interfaces, or on your master switch. It's unlikely anyone needing traffic control will be using a linux file server, as that box probably melted long ago when you got above 30 machines on your net ;) But the above docs are, if nothing else, a good intro to the concepts. I would think the best scenario is where you split the farm and workstation traffic to separate subnets, and assign them separate interfaces on the file server, giving priority workstation interface when the farm interface reaches some high watermark, or when the workstation requests are taking too long to resolve. Or if not that, QoS management at the switch would probably be the second choice, such that any workstation traffic triggers a choke algorithm if contention for the file server is greater than some percentage of overall bandwidth. |