From: "Mr. Daniel Browne" <d.list@(email surpressed)>
Subject: Batch Copy Script
   Date: Tue, 15 Dec 2009 21:07:31 -0500
Msg# 1917
View Complete Thread (4 articles) | All Threads
Last Next
Hi Greg,

	I'm writing a basic submit script to perform a unix cp copy operation on the farm. Is there a routine already available in Rush to perform shell substitution of the digits in a frame range, or would I have to do that myself? The alternative of course is to do a cp operation for each frame, but I was hoping to avoid the additional overhead.

Thanks,

-Dan


----------
Dan "Doc" Browne
System Administrator

Evil Eye Pictures
d.list at evileyepictures.com
Office: (415) 777-0666


   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: Batch Copy Script
   Date: Wed, 16 Dec 2009 00:22:30 -0500
Msg# 1918
View Complete Thread (4 articles) | All Threads
Last Next
Mr. Daniel Browne wrote:
> 	I'm writing a basic submit script to perform a unix cp copy =
> operation on the farm.

	It sounds like you're submitting a job that copies frames,
	but want it to be done with one command, as opposed to one
	copy command per frame..?

	Since copying is pure I/O, I would think the best place to
	run a large frame copy command is only on the file server,
	to avoid network in/network out, so that the copy operation
	is entirely local.

	For this reason I'm not sure I understand the question;
	normally one wants to run things through the render queue
	to harness the power of parallel cpu use, where each machine
	works on a frame or batch of frames. Either that, or you want
	to throw a single operation at some available machine, and have
	it do the entire operation as a 'single frame rush job'.

> Is there a routine already available in Rush to =
> perform shell substitution of the digits in a frame range, or would I =
> have to do that myself?

	Hmm, I might need a few more details.

	If you're making a submit script to do a copy, the scripting
	language you're using should work well for doing shell substitution.

	Do not try to submit to rush a simple 'cp' command, but rather,
	have the script submit itself, so that the script runs the cp command
	on each machine, so that you'll have all the facilities of the
	scripting language you're working with at your finger tips.
	(ie. shell wildcard expansion, sed/perl/awk access, etc)

	For some simple examples of writing scripts that
	"submit themselves", see:
	http://www.seriss.com/rush-current/rush/rush-submit.html

	For digits in a frame range, there are two variables rush supplies
	at execution time on all the machines; RUSH_FRAME and RUSH_PADFRAME,
	the latter being 0000 format, and the former having no padding at all.

	If you need some different padding, you can use printf() to change
	RUSH_FRAME into whatever padding you want, eg:

		perl -e 'printf("%06d", $ENV{RUSH_FRAME});'

	If you're trying to do shell substitution, I'm not sure, do you
	mean wild card expansions, like:

		cp /some/directory/foo.[0-9]*.tif /some/other/directory/

	..or do you really mean substitution, like sed/perl regex
	to turn e.g. foo.0001.tif into foo.0002.tif?

> The alternative of course is to do a cp =
> operation for each frame, but I was hoping to avoid the additional =
> overhead.

	I wouldn't think the cp command would be much overhead,
	but I might be missing something.

	If you're trying to distribute the cp commands to a bunch
	of machines to parallelize the I/O, I'd think you'd want
	to do them either as single frames or in batches.

	Or better yet, batch several commands on the file server
	itself, eg:

		cp foo.0*.tif /some/other/dir &		# copies 1000 frames (0000 - 0999) in bg
		cp foo.1*.tif /some/other/dir &		# copies 1000 frames (1000 - 1999) in bg
		cp foo.2*.tif /some/other/dir &		# etc..

	..which if the box has multiple procs and the network is idle,
	should go really fast.

	If you can follow up with more specifics, I can probably
	help you narrow down a specific technique.

-- 
Greg Ercolano, erco@(email surpressed)
Seriss Corporation
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)

   From: "Mr. Daniel Browne" <d.list@(email surpressed)>
Subject: Re: Batch Copy Script
   Date: Wed, 16 Dec 2009 13:24:03 -0500
Msg# 1919
View Complete Thread (4 articles) | All Threads
Last Next
> 
> On Dec 15, 2009, at 9:22 PM, Greg Ercolano wrote:
> 
> [posted to rush.general]
> 
> Mr. Daniel Browne wrote:
>> 	I'm writing a basic submit script to perform a unix cp copy =
>> operation on the farm.
> 
> 	It sounds like you're submitting a job that copies frames,
> 	but want it to be done with one command, as opposed to one
> 	copy command per frame..?
Correct

> 
> 	Since copying is pure I/O, I would think the best place to
> 	run a large frame copy command is only on the file server,
> 	to avoid network in/network out, so that the copy operation
> 	is entirely local.

Unfortunately the BlueArc architecture currently doesn't allow you do to operations like that within the NAS head. In fact, most of the head's shell commands do not support recursive operations.
> 
> 	For this reason I'm not sure I understand the question;
> 	normally one wants to run things through the render queue
> 	to harness the power of parallel cpu use, where each machine
> 	works on a frame or batch of frames. Either that, or you want
> 	to throw a single operation at some available machine, and have
> 	it do the entire operation as a 'single frame rush job'.
> 
We had thought since we perform other operations on frames to and from volumes on the NAS head that we could do the same with copy operations. I admit, it sounds a bit zany. At the very least I do have a script that would allow me to hand off the copy operation to a single farm machine as opposed to splitting it into pieces. Theoretically this would be faster as the head is executing these operations in parallel with the help of its cache memory.

> 
> 	If you're trying to do shell substitution, I'm not sure, do you
> 	mean wild card expansions, like:
> 
> 		cp /some/directory/foo.[0-9]*.tif /some/other/directory/
> 
>> The alternative of course is to do a cp =
>> operation for each frame, but I was hoping to avoid the additional =
>> overhead.
> 
> 	I wouldn't think the cp command would be much overhead,
> 	but I might be missing something.
> 
> 	If you're trying to distribute the cp commands to a bunch
> 	of machines to parallelize the I/O, I'd think you'd want
> 	to do them either as single frames or in batches.
> 
> 	Or better yet, batch several commands on the file server
> 	itself, eg:
> 
> 		cp foo.0*.tif /some/other/dir &		# copies 1000 frames (0000 - 0999) in bg
> 		cp foo.1*.tif /some/other/dir &		# copies 1000 frames (1000 - 1999) in bg
> 		cp foo.2*.tif /some/other/dir &		# etc..
> 
> 	..which if the box has multiple procs and the network is idle,
> 	should go really fast.
> 
> 	If you can follow up with more specifics, I can probably
> 	help you narrow down a specific technique.
Yes, this is what I wanted to go for. The problem obviously is that you can't wildcard a frame range as a whole; you have to wildcard each digit with its potential range of values. Tedious, but possible. I was just hoping you might have a library routine already in place to do this.

I think I know what to do now; thanks Greg. Happy holidays.

-Dan

> 
> -- 
> Greg Ercolano, erco@(email surpressed)
> Seriss Corporation
> Rush Render Queue, http://seriss.com/rush/
> Tel: 626-576-0010x23
> Fax: 626-576-0020
> Cel: 310-266-8906
> 
> 
> ----------
> Dan "Doc" Browne
> System Administrator
> 
> Evil Eye Pictures
> d.list at evileyepictures.com
> Office: (415) 777-0666
> 

   From: Greg Ercolano <erco@(email surpressed)>
Subject: Re: Batch Copy Script
   Date: Wed, 16 Dec 2009 14:27:09 -0500
Msg# 1920
View Complete Thread (4 articles) | All Threads
Last Next
Mr. Daniel Browne wrote:
>>> The alternative of course is to do a cp operation for each frame,
>>> but I was hoping to avoid the additional overhead.
>> 
>> 	I wouldn't think the cp command would be much overhead,
>> 	but I might be missing something.
>>
>> 	If you're trying to distribute the cp commands to a bunch
>> 	of machines to parallelize the I/O, I'd think you'd want
>> 	to do them either as single frames or in batches.
>> 
>> 	Or better yet, batch several commands on the file server
>> 	itself, eg:
>> 
>> 		cp foo.0*.tif /some/other/dir &		# copies 1000 = frames (0000 - 0999) in bg
>> 		cp foo.1*.tif /some/other/dir &		# copies 1000 = frames (1000 - 1999) in bg
>> 		cp foo.2*.tif /some/other/dir &		# etc..
>>
>> 	..which if the box has multiple procs and the network is idle,
>> 	should go really fast.

> Yes, this is what I wanted to go for. The problem obviously is that you
> can't wildcard a frame range as a whole; you have to wildcard each digit
> with its potential range of values. Tedious, but possible. I was just
> hoping you might have a library routine already in place to do this.

	If you're using perl, you can use perl's built in copy command
	and make a loop, eg:

use File::Copy;
[..]
    for ( $t=$sfrm; $t<$efrm; $t++ ) {
        my $src = sprintf("/some/src/path/foo.%04d.tif", $t);
        my $dst = sprintf("/some/dst/path/foo.%04d.tif", $t);
        copy($src,$dst);
    }

	..however you might find the operating system's own copy command
	to be faster (these might be optimized to take advantage of threading)
	in which case:

    for ( $t=$sfrm; $t<$efrm; $t++ ) {
        my $src = sprintf("/some/src/path/foo.%04d.tif", $t);
        my $dst = sprintf("/some/dst/path/foo.%04d.tif", $t);
        system("cp $src $dst");
    }

	..and you could probably get 3 or 4 of those going at a time
	with use of fork()/wait() or threads. (I'd suggest throttling
	this to limit starting too many at a time, and using wait(),
	you can start the next one as soon as one gets done, so that
	there are always 3 or 4 going at a time, until you work your
	way through the entire loop)

	For instance, I have a script that I wrote a while ago:
	http://seriss.com/people/erco/unixtools/prun

	..the blurb description:

*********************************************************************************
Runs unix commands in parallel

Supply a list of commands to run, one per line, and it runs them all in parallel.
You can limit the number of parallel processes with the -M flag. Even keeps track
of all the stdout/stderr logs, serializing them for easy reading. Hit ^C to stop
all processes.

Great for multiple rsh, rcp or rdists to run on a whole network quickly.
Very small, simple program no one has time to write, but everyone always needs
when managing multiple hosts.
*********************************************************************************

	So it takes a file of one liner unix commands, and runs them in
	parallel with a throttle, backgrounding several at a time,
	each time one command gets done it moves further onto the next
	in the list (like rush's frame queue) keeping the throttle maxed.

	It was handy for running commands many copy commands in parallel
	and keeping the output syncrhonized. Might be helpful
	with or without Rush..!

	I hope it still works.. it looks like I wrote it last century,
	as it still uses pre-perl5 syntax.

-- 
Greg Ercolano, erco@(email surpressed)
Seriss Corporation
Rush Render Queue, http://seriss.com/rush/
Tel: (Tel# suppressed)
Fax: (Tel# suppressed)
Cel: (Tel# suppressed)