From: Marco Recuay <marco@(email surpressed)> Subject: First In-First Out Tip Date: Sun, 19 Jul 2009 21:08:19 -0400 |
Msg# 1871 View Complete Thread (4 articles) | All Threads Last Next |
Just wanted to pass on my experience with the new FIFO scheduling,
since it wasn't working as I expected when first enabled.
It appears that FIFO only schedules the jobs on the same job server. If you want all renders across the network to schedule FIFO, you need to make sure that jobs are sent to one submit host only, which then orders them based on the job age. One trick would also be to have more than one designated submit host, so you could have separate FIFO queues for different tasks. That's it... something that might have been obvious to other admins, but I think it's worth documenting here. -Marco |
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: First In-First Out Tip Date: Mon, 20 Jul 2009 11:18:56 -0400 |
Msg# 1872 View Complete Thread (4 articles) | All Threads Last Next |
Marco Recuay wrote: > Just wanted to pass on my experience with the new FIFO scheduling, > since it wasn't working as I expected when first enabled. > > It appears that FIFO only schedules the jobs on the same job server. If > you want all renders across the network to schedule FIFO, you need to > make sure that jobs are sent to one submit host only, which then orders > them based on the job age. Hmm, an interesting theory, but no -- FIFO works off time stamps from when the job was submitted, regardless which machine is the job server. Assuming the machines acting as job servers are time synched, it shouldn't matter which machine is the job server. You can see the raw FIFO numbers used for scheduling in the "Jobs Full" report: "FifoSchedOrder" -- the raw unix time() value (eg. 3542120794) "FifoSchedOrderDate" -- the same value as a date/time (eg. 06/26/09,09:28:23.41) The 'Elapsed' time in the normal "Jobs" all "All Jobs" report can usually be used as the indicator of the FIFO scheduling value, assuming the clocks are synched well. The way FIFO works: when a proc becomes available, and the priority of the jobs asking to use it are equal, the unix time value is used to figure out which job gets to run; the oldest job wins the proc. So if you have 100 machines and everyone's submitting with +any=100@10, then no matter which machine is the job server, the oldest job gets the next available proc. If you have 100 machines and people are asking for less cpus than are on the farm, eg. +any=5@10, then the oldest job will get the first 5 procs, and the next oldest job(s) will get the rest. Note that if people are submitting with different priorities, then FIFO breaks up into tiers of priority, ie. priority takes precedence over FIFO. For instance if everyone is in the FIFO queue @10 priority (examples above), and NEW GUY submits @11 priority, his job will be in front of all the @10's. And if other people submit @11 (same as him), then FIFO rules will determine which of the @11 jobs get the next cpu. Anything "leftover" goes to the other jobs in the @10 priority. So if these jobs are submitted, the next cpu to come available will try to give itself to one of the jobs in this order, starting at the top: OWNER PRIORITY SUBMIT TIME ----- ---------- ----------- _ Fred +any=100@11 14:00 Today | 11 priority "tier" Gina +any=100@11 15:00 Today _| _ Jane +any=100@10 9:00 Today | Tim +any=100@10 13:00 Today | 10 priority "tier" Sandy +any=100@10 14:40 Today _| So the next available cpu will go to Fred first, and if his job is maxed out on cpus or has no more frames to render, Gina's job is next, then Jane, Tim, and Sandy last. Note that even though Jane submitted before everyone (9am), she was still @10 priority, so she still comes "after" all the higher @11 priority jobs. > One trick would also be to have more than one designated submit host, > so you could have separate FIFO queues for different tasks. You can get different FIFO queues via priority, as shown above where there are two FIFO queues; @11 and @10. You can also get different FIFO queues by requesting different hostgroups, ie. splitting the farm into two or more groups. So if there's a job that needs to be in front of everything in the FIFO queue, but only needs 5 procs, then that job should submit at a higher priority (eg. @11 in the examples above) but should just ask for =5, eg. +any=5@11 Let me know if anyone has questions about the above, or if you'd like more details. Pretty much the easiest way to make use of FIFO is to just have everyone submit at the SAME PRIORITY, and FIFO will be very predictable. It's only when you introduce priority differences that you get the multiple FIFO queues happening. |
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: First In-First Out Tip Date: Mon, 20 Jul 2009 11:40:37 -0400 |
Msg# 1873 View Complete Thread (4 articles) | All Threads Last Next |
BTW, Marco, if you're having strange behavior with different job servers, I'll be happy to help. Make sure you've got 'sched fifo' in ALL the rush.conf files, and are running the same version of rush that supports FIFO (102.42a9 or higher) on all the machines, since the older releases won't understand FIFO scheduling. There are some easy tests you can do from the command line to submit a couple of jobs and watch them compete. For instance, from a Unix machine (linux or OSX), you can submit two 100 frame jobs as a test: (echo title AAA; echo frames 1-100; echo cpus +any=500@10; echo command rush -sleep 10) | rush -submit (echo title BBB; echo frames 1-100; echo cpus +any=500@10; echo command rush -sleep 10) | rush -submit Each line will submit a job that does nothing but sleep 10 seconds per frame. Those should be two separate lines; make sure your newsreader is wide enough to show those as complete lines before copy/pasting. Run the lines one at a time, with at least 1 or 2 seconds between each submit, so the FIFO system can tell which job was first. When running, the AAA job should get all the cpus, and the BBB job should wait around until the AAA job starts finishing. You should also be able to run similar command lines on other job servers, or tack a hostname on the end, after the 'rush -submit', to tell it to use some other machine as the job server, eg. to use 'tahoe' as the job server: (echo title CCC; echo frames 1-100; echo cpus +any=500@10; echo command rush -sleep 10) | rush -submit tahoe -- Greg Ercolano, erco@(email surpressed) Seriss Corporation Rush Render Queue, http://seriss.com/rush/ Tel: (Tel# suppressed) Fax: (Tel# suppressed) Cel: (Tel# suppressed) |
From: Marco Recuay <marco@(email surpressed)> Subject: Re: First In-First Out Tip Date: Mon, 20 Jul 2009 19:37:49 -0400 |
Msg# 1874 View Complete Thread (4 articles) | All Threads Last Next |
Thanks for the correction Greg..I guess I'll have to do some more testing on this. The clocks all appear synchronized on all the machines, and they all have the same rush version and updated rush.conf file. Using the same job server is a workaround for now, but when we have a free moment I'll see how they react to the tests you outlined to pinpoint where the problem lies. On 2009-07-20 08:40:37 -0700, Greg Ercolano <erco@(email surpressed)> said: BTW, Marco, if you're having strange behavior with different job servers, I'll be happy to help. Make sure you've got 'sched fifo' in ALL the rush.conf files, and are running the same version of rush that supports FIFO (102.42a9 or higher) on all the machines, since the older releases won't understand FIFO scheduling. There are some easy tests you can do from the command line to submit a couple of jobs and watch them compete. For instance, from a Unix machine (linux or OSX), you can submit two 100 frame jobs as a test:(echo title AAA; echo frames 1-100; echo cpus +any=500@10; echo command rush -sleep 10) | rush -submit(echo title BBB; echo frames 1-100; echo cpus +any=500@10; echo command rush -sleep 10) | rush -submitEach line will submit a job that does nothing but sleep 10 seconds per frame. Those should be two separate lines; make sure your newsreader is wide enough to show those as complete lines before copy/pasting. Run the lines one at a time, with at least 1 or 2 seconds between each submit, so the FIFO system can tell which job was first. When running, the AAA job should get all the cpus, and the BBB job should wait around until the AAA job starts finishing. You should also be able to run similar command lines on other job servers, or tack a hostname on the end, after the 'rush -submit', to tell it to use some other machine as the job server, eg. to use 'tahoe' as the job server:(echo title CCC; echo frames 1-100; echo cpus +any=500@10; echo command rush -sleep 10) | rush -submit tahoe |