From: "Mr. Daniel Browne" <d.list@(email surpressed)> Subject: Batch Copy Script Date: Tue, 15 Dec 2009 21:07:31 -0500 |
Msg# 1917 View Complete Thread (4 articles) | All Threads Last Next |
Hi Greg, I'm writing a basic submit script to perform a unix cp copy operation on the farm. Is there a routine already available in Rush to perform shell substitution of the digits in a frame range, or would I have to do that myself? The alternative of course is to do a cp operation for each frame, but I was hoping to avoid the additional overhead. Thanks, -Dan ---------- Dan "Doc" Browne System Administrator Evil Eye Pictures d.list at evileyepictures.com Office: (415) 777-0666 |
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: Batch Copy Script Date: Wed, 16 Dec 2009 00:22:30 -0500 |
Msg# 1918 View Complete Thread (4 articles) | All Threads Last Next |
Mr. Daniel Browne wrote: > I'm writing a basic submit script to perform a unix cp copy = > operation on the farm. It sounds like you're submitting a job that copies frames, but want it to be done with one command, as opposed to one copy command per frame..? Since copying is pure I/O, I would think the best place to run a large frame copy command is only on the file server, to avoid network in/network out, so that the copy operation is entirely local. For this reason I'm not sure I understand the question; normally one wants to run things through the render queue to harness the power of parallel cpu use, where each machine works on a frame or batch of frames. Either that, or you want to throw a single operation at some available machine, and have it do the entire operation as a 'single frame rush job'. > Is there a routine already available in Rush to = > perform shell substitution of the digits in a frame range, or would I = > have to do that myself? Hmm, I might need a few more details. If you're making a submit script to do a copy, the scripting language you're using should work well for doing shell substitution. Do not try to submit to rush a simple 'cp' command, but rather, have the script submit itself, so that the script runs the cp command on each machine, so that you'll have all the facilities of the scripting language you're working with at your finger tips. (ie. shell wildcard expansion, sed/perl/awk access, etc) For some simple examples of writing scripts that "submit themselves", see: http://www.seriss.com/rush-current/rush/rush-submit.html For digits in a frame range, there are two variables rush supplies at execution time on all the machines; RUSH_FRAME and RUSH_PADFRAME, the latter being 0000 format, and the former having no padding at all. If you need some different padding, you can use printf() to change RUSH_FRAME into whatever padding you want, eg: perl -e 'printf("%06d", $ENV{RUSH_FRAME});' If you're trying to do shell substitution, I'm not sure, do you mean wild card expansions, like: cp /some/directory/foo.[0-9]*.tif /some/other/directory/ ..or do you really mean substitution, like sed/perl regex to turn e.g. foo.0001.tif into foo.0002.tif? > The alternative of course is to do a cp = > operation for each frame, but I was hoping to avoid the additional = > overhead. I wouldn't think the cp command would be much overhead, but I might be missing something. If you're trying to distribute the cp commands to a bunch of machines to parallelize the I/O, I'd think you'd want to do them either as single frames or in batches. Or better yet, batch several commands on the file server itself, eg: cp foo.0*.tif /some/other/dir & # copies 1000 frames (0000 - 0999) in bg cp foo.1*.tif /some/other/dir & # copies 1000 frames (1000 - 1999) in bg cp foo.2*.tif /some/other/dir & # etc.. ..which if the box has multiple procs and the network is idle, should go really fast. If you can follow up with more specifics, I can probably help you narrow down a specific technique. -- Greg Ercolano, erco@(email surpressed) Seriss Corporation Rush Render Queue, http://seriss.com/rush/ Tel: (Tel# suppressed) Fax: (Tel# suppressed) Cel: (Tel# suppressed) |
From: "Mr. Daniel Browne" <d.list@(email surpressed)> Subject: Re: Batch Copy Script Date: Wed, 16 Dec 2009 13:24:03 -0500 |
Msg# 1919 View Complete Thread (4 articles) | All Threads Last Next |
> > On Dec 15, 2009, at 9:22 PM, Greg Ercolano wrote: > > [posted to rush.general] > > Mr. Daniel Browne wrote: >> I'm writing a basic submit script to perform a unix cp copy = >> operation on the farm. > > It sounds like you're submitting a job that copies frames, > but want it to be done with one command, as opposed to one > copy command per frame..? Correct > > Since copying is pure I/O, I would think the best place to > run a large frame copy command is only on the file server, > to avoid network in/network out, so that the copy operation > is entirely local. Unfortunately the BlueArc architecture currently doesn't allow you do to operations like that within the NAS head. In fact, most of the head's shell commands do not support recursive operations. > > For this reason I'm not sure I understand the question; > normally one wants to run things through the render queue > to harness the power of parallel cpu use, where each machine > works on a frame or batch of frames. Either that, or you want > to throw a single operation at some available machine, and have > it do the entire operation as a 'single frame rush job'. > We had thought since we perform other operations on frames to and from volumes on the NAS head that we could do the same with copy operations. I admit, it sounds a bit zany. At the very least I do have a script that would allow me to hand off the copy operation to a single farm machine as opposed to splitting it into pieces. Theoretically this would be faster as the head is executing these operations in parallel with the help of its cache memory. > > If you're trying to do shell substitution, I'm not sure, do you > mean wild card expansions, like: > > cp /some/directory/foo.[0-9]*.tif /some/other/directory/ > >> The alternative of course is to do a cp = >> operation for each frame, but I was hoping to avoid the additional = >> overhead. > > I wouldn't think the cp command would be much overhead, > but I might be missing something. > > If you're trying to distribute the cp commands to a bunch > of machines to parallelize the I/O, I'd think you'd want > to do them either as single frames or in batches. > > Or better yet, batch several commands on the file server > itself, eg: > > cp foo.0*.tif /some/other/dir & # copies 1000 frames (0000 - 0999) in bg > cp foo.1*.tif /some/other/dir & # copies 1000 frames (1000 - 1999) in bg > cp foo.2*.tif /some/other/dir & # etc.. > > ..which if the box has multiple procs and the network is idle, > should go really fast. > > If you can follow up with more specifics, I can probably > help you narrow down a specific technique. Yes, this is what I wanted to go for. The problem obviously is that you can't wildcard a frame range as a whole; you have to wildcard each digit with its potential range of values. Tedious, but possible. I was just hoping you might have a library routine already in place to do this. I think I know what to do now; thanks Greg. Happy holidays. -Dan > > -- > Greg Ercolano, erco@(email surpressed) > Seriss Corporation > Rush Render Queue, http://seriss.com/rush/ > Tel: 626-576-0010x23 > Fax: 626-576-0020 > Cel: 310-266-8906 > > > ---------- > Dan "Doc" Browne > System Administrator > > Evil Eye Pictures > d.list at evileyepictures.com > Office: (415) 777-0666 > |
From: Greg Ercolano <erco@(email surpressed)> Subject: Re: Batch Copy Script Date: Wed, 16 Dec 2009 14:27:09 -0500 |
Msg# 1920 View Complete Thread (4 articles) | All Threads Last Next |
Mr. Daniel Browne wrote: >>> The alternative of course is to do a cp operation for each frame, >>> but I was hoping to avoid the additional overhead. >> >> I wouldn't think the cp command would be much overhead, >> but I might be missing something. >> >> If you're trying to distribute the cp commands to a bunch >> of machines to parallelize the I/O, I'd think you'd want >> to do them either as single frames or in batches. >> >> Or better yet, batch several commands on the file server >> itself, eg: >> >> cp foo.0*.tif /some/other/dir & # copies 1000 = frames (0000 - 0999) in bg >> cp foo.1*.tif /some/other/dir & # copies 1000 = frames (1000 - 1999) in bg >> cp foo.2*.tif /some/other/dir & # etc.. >> >> ..which if the box has multiple procs and the network is idle, >> should go really fast. > Yes, this is what I wanted to go for. The problem obviously is that you > can't wildcard a frame range as a whole; you have to wildcard each digit > with its potential range of values. Tedious, but possible. I was just > hoping you might have a library routine already in place to do this. If you're using perl, you can use perl's built in copy command and make a loop, eg: use File::Copy; [..] for ( $t=$sfrm; $t<$efrm; $t++ ) { my $src = sprintf("/some/src/path/foo.%04d.tif", $t); my $dst = sprintf("/some/dst/path/foo.%04d.tif", $t); copy($src,$dst); } ..however you might find the operating system's own copy command to be faster (these might be optimized to take advantage of threading) in which case: for ( $t=$sfrm; $t<$efrm; $t++ ) { my $src = sprintf("/some/src/path/foo.%04d.tif", $t); my $dst = sprintf("/some/dst/path/foo.%04d.tif", $t); system("cp $src $dst"); } ..and you could probably get 3 or 4 of those going at a time with use of fork()/wait() or threads. (I'd suggest throttling this to limit starting too many at a time, and using wait(), you can start the next one as soon as one gets done, so that there are always 3 or 4 going at a time, until you work your way through the entire loop) For instance, I have a script that I wrote a while ago: http://seriss.com/people/erco/unixtools/prun ..the blurb description: ********************************************************************************* Runs unix commands in parallel Supply a list of commands to run, one per line, and it runs them all in parallel. You can limit the number of parallel processes with the -M flag. Even keeps track of all the stdout/stderr logs, serializing them for easy reading. Hit ^C to stop all processes. Great for multiple rsh, rcp or rdists to run on a whole network quickly. Very small, simple program no one has time to write, but everyone always needs when managing multiple hosts. ********************************************************************************* So it takes a file of one liner unix commands, and runs them in parallel with a throttle, backgrounding several at a time, each time one command gets done it moves further onto the next in the list (like rush's frame queue) keeping the throttle maxed. It was handy for running commands many copy commands in parallel and keeping the output syncrhonized. Might be helpful with or without Rush..! I hope it still works.. it looks like I wrote it last century, as it still uses pre-perl5 syntax. -- Greg Ercolano, erco@(email surpressed) Seriss Corporation Rush Render Queue, http://seriss.com/rush/ Tel: (Tel# suppressed) Fax: (Tel# suppressed) Cel: (Tel# suppressed) |