c++ - Dynamic memory slow down on Intel Xeon Phi -
i creating simple matrix multiplication procedure, operating on intel xeon phi architecture.the procedure looks (parameters a, b, c), , timing doesn't include initialization:
//start timing for(int = 0; < size; i++){ for(int k = 0; k < size; k++) { register type aik = a[i][k]; for(int j = 0; j < size; j++) { c[i][j] += aik * b[k][j]; } } } //end timing
i using restrict, aligned info , on. however, if matrices allocated using dynamic memory (posix_memalign), computation incurs in severe slow down, i.e. type=float , 512x512 matrices takes ~0.55s in dynamic case while in other case ~0.25. on different architecture (intel xeon e5), there slow down, barely noticeable (about 0.002 s).
any help apreciated!
what happens timing differences if create matrix different size? (e.g. 513x513)
the reason why inquire think might seeing effect due exceeding cache way associativity , evicting elements of c[i][] l2 loop on b in loop on k. if b , c aligned , sizes powers of 2, might cache super-alignment causing issue.
if b , c on stack or otherwise not aligned, don't see effect fewer addresses powerfulness of 2 aligned.
c++ performance matrix xeon-phi
No comments:
Post a Comment