Sunday, 15 July 2012

matlab - Multiplication of corresponding 2d slices of two arrays and inversion of array slices -



matlab - Multiplication of corresponding 2d slices of two arrays and inversion of array slices -

i have 2 arrays a , b of same dimension 1000 x 3 x 20 x 20. want generate 3rd array c of dimension 3 x 3 x 20 x 20 outcome of matrix multiplication of corresponding slices of a , b, i.e. c(:,:,i,j) = a(:,:,i,j)'*b(:,:,i,j). need transform array c new array d inverting corresponding 3 x 3 matrices, i.e. d(:,:,i,j) = inv(c(:,:,i,j)). again, it's clear how loops. there way awoid looping on 400 items?

edit: benchmarking code compare performance of different solutions -

%// inputs n1 = 50; n2 = 200; = rand(n1,3,n2,n2); b = rand(n1,3,n2,n2); %// a. cpu loopy code tic c = zeros(3,3,n2,n2); ii = 1:n2 jj = 1:n2 c(:,:,ii,jj) = a(:,:,ii,jj)'*b(:,:,ii,jj); %//' end end toc %// b. vectorized code (using squeeze) tic c1 = squeeze(sum(bsxfun(@times,permute(a,[2 1 5 3 4]),permute(b,[5 1 2 3 4])),2)); toc %// c. vectorized code (avoiding squeeze) tic c2 = sum(bsxfun(@times,permute(a,[2 5 3 4 1]),permute(b,[5 2 3 4 1])),5); toc %// d. gpu vectorized code tic = gpuarray(a); b = gpuarray(b); c3 = sum(bsxfun(@times,permute(a,[2 5 3 4 1]),permute(b,[5 2 3 4 1])),5); c3 = gather(c3); toc

runtime results -

elapsed time 0.287511 seconds. elapsed time 0.250663 seconds. elapsed time 0.337628 seconds. elapsed time 1.259207 seconds.

code

%// part - 1 c = sum(bsxfun(@times,permute(a,[2 5 3 4 1]),permute(b,[5 2 3 4 1])),5); %// part - 2: utilize matlab file-exchange tool multinv d = multinv(c);

the function code multinv available here , claims pretty efficient.

for first part, can seek out -

c = squeeze(sum(bsxfun(@times,permute(a,[2 1 5 3 4]),permute(b,[5 1 2 3 4])),2));

this 1 seems re-arranging elements not "disruptively" 1 mentioned in code above, downside need squeeze might slow downwards bit. leave , encourage benchmark , select improve one.

why bsxfun + gpu?

i have increased loop limits, real test between loopy code , vectorized code. so, here modified code part 1 -

%// inputs n1 = 50; n2 = 200; = rand(n1,3,n2,n2); b = rand(n1,3,n2,n2); %// a. cpu loopy code tic c = zeros(3,3,n2,n2); ii = 1:n2 jj = 1:n2 c(:,:,ii,jj) = a(:,:,ii,jj)'*b(:,:,ii,jj); %//' end end toc %// b. gpu vectorized code tic = gpuarray(a); b = gpuarray(b); c1 = sum(bsxfun(@times,permute(a,[2 5 3 4 1]),permute(b,[5 2 3 4 1])),5); c1 = gather(c1); toc

the runtime results @ scheme -

elapsed time 0.310056 seconds. elapsed time 0.172499 seconds.

so, see!

arrays matlab vectorization matrix-multiplication

No comments:

Post a Comment