Sunday 15 May 2011

GNU Parallel: distribute files from one source to remote hosts while distributing destination files -



GNU Parallel: distribute files from one source to remote hosts while distributing destination files -

scenario: s3 bucket has 1000 files. have 2 machines. each of these machines has 2 drives /dev/sda , /dev/sdb. constraints: no 1 single drive can fit 1000 files. , no 1 machine can fit 1000 files. desired outcome: distribute 1000 files across 4 drives on 2 machines using gnu parallel.

i tried things like:

parallel --xapply --joblog out.txt -s:,r echo {1} {2} ::: "/dev/sda" "/dev/sdb" ::: {0..10}

but get:

seq host starttime jobruntime send receive exitval signal command 2 : 1414040436.607 0.037 0 0 0 0 echo /dev/sda 1 4 : 1414040436.615 0.030 0 0 0 0 echo /dev/sda 3 6 : 1414040436.623 0.024 0 0 0 0 echo /dev/sda 5 8 : 1414040436.632 0.015 0 0 0 0 echo /dev/sda 7 10 : 1414040436.640 0.006 0 0 0 0 echo /dev/sda 9 1 r 1414040436.603 0.088 0 0 0 0 echo /dev/sdb 0 3 r 1414040436.611 0.092 0 0 0 0 echo /dev/sdb 2 5 r 1414040436.619 0.095 0 0 0 0 echo /dev/sdb 4 7 r 1414040436.628 0.095 0 0 0 0 echo /dev/sdb 6 9 r 1414040436.636 0.096 0 0 0 0 echo /dev/sdb 8 11 r 1414040436.645 0.094 0 0 0 0 echo /dev/sdb 10

where 'r' remote host ip. how distribute files (i have names in file) s3 4 drives? give thanks you.

gnu parallel starting new job when old has finished: divides jobs servers on fly , not beforehand.

what looking way beforehand.

your --xapply approach seems sound, need forcefulness gnu parallel distribute evenly hosts. current approach dependent on how fast each host finishes, , not work in general.

so like:

parallel echo {1}//{2} ::: sda sdb ::: server1 server2 | parallel --colsep '//' --xapply echo re-create {3} {1} on {2} :::: - filenames.txt

or:

parallel --xapply echo re-create {3} {1} on {2} ::: sda sda sdb sdb ::: server1 server2 server1 server2 :::: filenames.txt

gnu-parallel

No comments:

Post a Comment