From: Craig Allison <craig@(email surpressed)> Subject: Job Done Command Not Working On Some Boxes Date: Mon, 13 Jul 2009 12:17:23 -0400 |
Msg# 1865 View Complete Thread (5 articles) | All Threads Last Next |
|
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: Job Done Command Not Working On Some Boxes Date: Mon, 13 Jul 2009 15:19:35 -0400 |
Msg# 1866 View Complete Thread (5 articles) | All Threads Last Next |
Craig Allison wrote: > I'm having a problem with certain boxes not running the JobDone Command > on complete of a render. I'm not getting any logs from the faulty boxes > to even suggest the command is being initiated. Hi Craig, I'd suggest looking in the rushd.log first for the machine acting as the job server for the job in question. Ie. login to the machine whose hostname is in the jobid, and look at the rush/var/rushd.log file for errors about the jobdonecommand. > Running Rush 102.42a9 on Leopard and Tiger, running a perl script on > JobDone. > > perl /mnt/vfxxserve3/Scripts/Senate_Donemail.pl > > Any ideas? Try sending me via private email: 1) the 'rush -ljf' info for this job 2) the contents of your rushd.log ..Be sure the job got 'done' since midnight, so that any errors would likely be in "today's" rushd.log. I'll follow up to the group on what we determine is the problem. Certainly you should be seeing a 'jobdonecommand.log' in the job's log directory. Only reason you wouldn't is if a) logs are disabled, b) the -nolog flag was specified for the jobdonecommand, c) the log directory was unwritable for some reason by the job server, d) a bug I don't know about, as 102.42a9 is very recent. -- Greg Ercolano, erco@(email surpressed) Seriss Corporation Rush Render Queue, http://seriss.com/rush/ Tel: (Tel# suppressed) Fax: (Tel# suppressed) Cel: (Tel# suppressed) |
From: Craig Allison <craig@(email surpressed)> Subject: Re: Job Done Command Not Working On Some Boxes Date: Mon, 13 Jul 2009 16:22:52 -0400 |
Msg# 1867 View Complete Thread (5 articles) | All Threads Last Next |
|
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: Job Done Command Not Working On Some Boxes Date: Mon, 13 Jul 2009 16:31:44 -0400 |
Msg# 1868 View Complete Thread (5 articles) | All Threads Last Next |
Craig Allison wrote: > Hey Greg > > I'm working late tonight so we're kind of on the same time for now : ) > > Checked the log on the host machine and for the fail jobs I'm getting > the following: > > jobdonecommand 'perl /mnt/vfxxserve3/Scripts/Senate_DoneMail.pl' failed > for jobid=senatemac89.326: RUSH: uid '0' outside configured range 100-65000 > > I've got forceuid/gid set to 501 on the user's machine in rush.conf, > where is it picking this one up from? Hi Craig, Hmm.. apparently it thinks it should be trying to run the command as root on that machine (senatemac89). From that machine, can you send me (via private email): a) The contents of: /usr/local/rush/etc/rush.conf b) The output of: rush -ljf senatemac89.326 It sounds like maybe there's a forceuid of zero on that machine, or the job's owner is root, or there's a bug I need to look into.. -- Greg Ercolano, erco@(email surpressed) Seriss Corporation Rush Render Queue, http://seriss.com/rush/ Tel: (Tel# suppressed) Fax: (Tel# suppressed) Cel: (Tel# suppressed) |
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: Job Done Command Not Working On Some Boxes Date: Fri, 17 Jul 2009 23:20:37 -0400 |
Msg# 1870 View Complete Thread (5 articles) | All Threads Last Next |
Newsgroup follow up. OK, after working with Craig earlier this week, I was able to determine the exact combo of events it took to replicate this. Turned out these specifics were needed to make this problem show up: o Rush version 102.42a9 on both machines o Submit from a Tiger workstation with "Submit Host:" pointing at a leopard jobserver o Submitting user has no account on the jobserver, just their workstation o Rush configured to force all renders to run as "render" user o "render" user has a valid account on all machines (workstation and jobserver) o Job submitted with a jobdonecommand With this combo, the renders run ok, but the /jobdonecommand/ fails to run on the leopard machine with this series of errors in the log: >07/17,20:05:03 SECURITY RUSH: uid '0' outside configured range 100-65000 >07/17,20:05:03 INFO jobdonecommand 'ls' failed for jobid=leopard.2: RUSH: uid '0' outside configured range 100-65000 Now that I've got the problem canned, and replicated here at my office, I should have it solved soon -- pretty sure now it's a bug in Rush. |