From: "Abraham Schneider" <aschneider@(email surpressed)> Subject: how to deal with missing Nuke plugin licenses Date: Wed, 26 Oct 2011 09:49:10 -0400 |
Msg# 2138 View Complete Thread (2 articles) | All Threads Last Next |
Hi there! For Nuke itself, we have enough render licenses to use the whole farm for rendering. But for some of the plugins (Furnace, Ocula, ...) we have only a limited number of licenses. I'm wondering now how to deal with this on a rush renderfarm. I see three possibilites there: 1. limit the cpus used by the job to the amount of available licenses. Seems to be fine, but has two disadvantages: it only works if you have 1 job on the farm that uses this plugin. If you start a second job, it will try to render on the other free machines and will fail. Second problem is that sometimes license servers will not release the licenses as fast as the jobs jump from machine to machine. So even if I limit the job to the correct amount of cpus, there may be a missing license when one machine finishes a frame and a different machine wants to start a new frame. 2. use the hosts file to define groups of machines which only contain the correct amount of machines. This should avoid the problems above, but handling this is painful. A machine or two may be down, then you have to change the hosts file again. You have slower and faster machines, how do you distribute them to the different groups? It's a possible solution but doesn't feel like THE solution :) 3. Use something like the licpause function of Rush. Problems with licpause: it pauses the JOB, not the frame/batch frames of the machine that has the license problem. And the normal license pause function of the submit-nuke.pl will not work, because some of the plugins will not raise an error exit code, so there is a license error, but the exitcode is 0 and Rush assumes that the rendering went well. So because of my very limited Perl knowledge I have two questions: - How can I check (for example by doing something like a grep of the logfile) for license problems inside of the submit-nuke.pl and raise a different exitcode, so the normal licpause function will also work? - what would be a good way to do something like the license pause on a per-frame base instead of doing it per job? Any suggestions? Thanks, Abraham PS: here is an example of a log file which shows a license problem with the furnace plugin: Executing: logtrim -s 0 -c nuke -V -m 4 -c 1600M -x /mnt/frozone/ projects/filmschulfestivaltrailer_49554/002_010/nuke/ 002_010_degrain_v01_as.nk 4,4 Nuke 6.3v4, 64 bit, built Sep 22 2011. Copyright (c) 2011 The Foundry Visionmongers Ltd. All Rights Reserved. Loading /usr/local/Nuke6.3v4-64/plugins/init.tcl Loading /usr/local/Nuke6.3v4-64/plugins/init.py Loading /usr/local/Nuke6.3v4-64/plugins/setenv.tcl Loading /usr/local/Nuke6.3v4-64/plugins/Tracker3.so Loading /usr/local/Nuke6.3v4-64/plugins/formats.tcl Loading /mnt/libs/nukelib/plugins/init.py Loading /mnt/homes/aschneid/.nuke/init.py Loading /usr/local/Nuke6.3v4-64/plugins/getenv.tcl Loading /usr/local/Nuke6.3v4-64/plugins/dpxReader.so Loading /mnt/libs/nukelib/plugins/gizmos/HandleMarker.gizmo Loading /usr/local/Nuke6.3v4-64/plugins/Constant.so Loading /usr/local/Nuke6.3v4-64/plugins/Mirror.so Loading /usr/local/Nuke6.3v4-64/plugins/Merge2.so Loading /usr/local/Nuke6.3v4-64/plugins/ColorCorrect.so Loading /usr/local/Nuke6.3v4-64/plugins/Reformat.so Loading /usr/local/Nuke6.3v4-64/plugins/ShuffleViews.so Loading /mnt/libs/nukelib/plugins/gizmos/MainFileOut.gizmo Loading /usr/local/Nuke6.3v4-64/plugins/movReader.tcl Loading /usr/local/Nuke6.3v4-64/plugins/ffmpegReader.so Loading /usr/local/Nuke6.3v4-64/plugins/Crop.so Loading /mnt/libs/nukelib/plugins/gizmos/GlobalVars.gizmo Loading /usr/local/Nuke6.3v4-64/plugins/dpxWriter.so Writing /mnt/frozone/projects/filmschulfestivaltrailer_49554/002_010/ precomp/002_010_degrain_v01_as/002_010_degrain_v01_as.0004.dpx DDImage message: FOUNDRY LICENSE ERROR REPORT Abraham Schneider Senior VFX Compositor ARRI Film & TV Services GmbH Tuerkenstr. 89 D-80799 Muenchen / Germany Phone (Tel# suppressed) EMail aschneider@(email surpressed) www.arri.de/filmtv ---------------------------- Timestamp: Wed Oct 26 09:29:17 2011 License Requested: furnace 4.2 for ofx render only with options all f_degrain Extended Info: F_DeGrain on uk.co.thefoundry.nuke (Render) 4.0 Environment Info: /mnt/libs/nukelib/licenses FLEXlm LICENSE DIAGNOSTICS --------------------------- Licensed number of users already reached. Feature: furnace_ofx_r License path: /mnt/libs/nukelib/licenses/license.lic:/mnt/libs/ nukelib/licenes:/usr/local/foundry/FLEXlm: FLEXnet Licensing error:-4,132. System Error: 115 "Operation now in progress" For further information, refer to the FLEXnet Licensing documentation, available at "www.acresso.com". FOUNDRY LICENSE DIAGNOSTICS --------------------------- Error : Maximum user counted exceeded. 6.9 Writing /mnt/frozone/projects/filmschulfestivaltrailer_49554/002_010/ precomp/002_010_degrain_v01_as/002_010_degrain_v01_as.0004.dpx took 1.70 seconds Frame 4 (1 of 1) Total render time: 1.70 seconds Allocated 88.2MiB, 6% of usage limit of 1.56GiB, sbrk = 232MiB. free_*() calls: 0, new_handler() cleanups: 0. Tile Cache: Cache Report =========================================== Current Size: 000 B Max Size: 1.00663 GB Percentage Used: 0 HandleMarker1.Merge1: 12.8MB 2048x1556 rgb 100% w 31120 F_DeGrain in F_DeGrain1: output image :6.49MB HandleMarker1.RotoPaint1: 28.7MB 2048x1168 rgb 100% w 27 F_DeGrain in F_DeGrain1: output image :6.36MB F_DeGrain in F_DeGrain1: output image :6.36MB F_DeGrain in F_DeGrain1: output image :6.36MB F_DeGrain in F_DeGrain1: output image :6.36MB F_DeGrain in F_DeGrain1: output image :6.36MB F_DeGrain in F_DeGrain1: output image :6.36MB F_DeGrain in F_DeGrain1: output image :6.36MB --- NUKE SUCCEEDS ________________________________ ARRI Film & TV Services GmbH Sitz: München Registergericht: Amtsgericht München Handelsregisternummer: HRB 69396 Geschäftsführer: Franz Kraus, Dr. Martin Prillmann, Josef Reidinger |
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: how to deal with missing Nuke plugin licenses Date: Thu, 27 Oct 2011 11:47:36 -0400 |
Msg# 2139 View Complete Thread (2 articles) | All Threads Last Next |
On 10/26/11 06:49, Abraham Schneider wrote: > For Nuke itself, we have enough render licenses to use the whole farm > for rendering. But for some of the plugins (Furnace, Ocula, ...) we > have only a limited number of licenses. I'm wondering now how to deal > with this on a rush renderfarm. I see three possibilites there: > > 1. limit the cpus used by the job to the amount of available licenses. > Seems to be fine, but has two disadvantages: it only works if you have > 1 job on the farm that uses this plugin. If you start a second job, it > will try to render on the other free machines and will fail. Second > problem is that sometimes license servers will not release the > licenses as fast as the jobs jump from machine to machine. So even if > I limit the job to the correct amount of cpus, there may be a missing > license when one machine finishes a frame and a different machine > wants to start a new frame. > > 2. use the hosts file to define groups of machines which only contain > the correct amount of machines. This should avoid the problems above, > but handling this is painful. A machine or two may be down, then you > have to change the hosts file again. Yes; defining a hostgroup such as +furnace would be one way to go. Yes, if one of the machines in the group is taken down, you'd have to modify that hostgroup's membership.. but I'd think that'd be part of regular network administration to enable/disable machines when they're taken down. (As opposed to a machine that just needs a reboot) > You have slower and faster > machines, how do you distribute them to the different groups? You can make two sub-groups if you want control over machine speed. eg: +furnace -- all the 'furnace' machines +furnace_fast -- just the fast ones in the furnace group +furnace_slow -- just the slow ones in the furnace group ..so if you have a job that needs to keep at least 2 cpus busy on the fast machines, then have that job ask for the +furnace_fast machines at a higher priority, eg: +furnace=10@100 +furnace_fast=2@900 > It's a possible solution but doesn't feel like THE solution :) A centralized 'license counter' is perhaps what you're wanting, but it has its own issues; random interactive use counts against licenses, a single machine would have to be responsible for keeping track of license counts, etc. > 3. Use something like the licpause function of Rush. Problems with > licpause: it pauses the JOB, not the frame/batch frames of the machine > that has the license problem. Yes; this is because the job really shouldn't try to pick up on more machines if the software it's running is out of licenses; it doesn't make sense to tie up newly available cpus with a job that will not be able to run. So the licpause gives newly available cpus a shot at other jobs when a job can't get more licenses. > And the normal license pause function of > the submit-nuke.pl will not work, because some of the plugins will not > raise an error exit code, so there is a license error, That should be OK; if you can identify all the license error messages, the script can check for these messages (even if the exit code is zero) to detect the license error, and handle it accordingly. If you supply me with the complete frame log showing the license error messages, I can tell you how to add those checks to the script. Or, send me both the error messages and the script, and I can make the change for you so you can see how to add your own. > So because of my very limited Perl knowledge I have two questions: > - How can I check (for example by doing something like a grep of the > logfile) for license problems inside of the submit-nuke.pl and raise a > different exitcode, so the normal licpause function will also work? There is a global LogCheck() function built into the .common.pl (which all the scripts load for 'common' functions) that can be called to 'grep' the log file for certain messages. This takes into account retries, so that error messages aren't retriggered by older messages due to retries in the same log. With the above complete frame logs showing the license errors I can show you what to change. > - what would be a good way to do something like the license pause on a > per-frame base instead of doing it per job? Any suggestions? You can do things like sleep() and retry the command again repeatedly until it works.. that's not hard. But that ties up the cpu until a license becomes available.. it might be better if the cpu becomes available to other jobs, in which case you can just do a sleep and exit(2) so that rush requeues the frame, allowing the scheduler to 'round robin' select some other job. (The sleep prevents the scheduler from 'spinning' the reque frame too quickly, in case there are no other jobs) -- Greg Ercolano, erco@(email surpressed) Seriss Corporation Rush Render Queue, http://seriss.com/rush/ Tel: (Tel# suppressed)ext.23 Fax: (Tel# suppressed) Cel: (Tel# suppressed) |