Breedlove: cuda kernel for add(a,b,c) using texture objects for a & b - works correctly for 'increment operation' add(a,b,a)? -

Tuesday, 15 January 2013

cuda kernel for add(a,b,c) using texture objects for a & b - works correctly for 'increment operation' add(a,b,a)? -

i want implement cuda function 'add(a,b,c)' adding (component-wise) 2 one-channel floating-point images 'a' , 'b' , storing result in floating-point image 'c'. 'c = + b'. function implemented first binding texture objects 'atex' , 'btex' pitch-linear images 'a' , 'b', , accessing image 'a' , 'b' within kernel only via texture objects 'atex' , 'btex'. sum stored in 'c' via simple write global memory. happens if phone call function incrementing 'a' 'b' - phone call 'add(a,b,a)' ? because now, image 'a' used in kernel on 2 places - 'a' read in value via texture object 'atex', , store values in 'a' via write global memory. possible usage of 'add' function leads wrong results ?

the gpu's texture not coherent. means global memory write particular location of global memory underlying texture may or may not reflected during subsequent texture access same location. there read-after-write hazard in such scenario.

if, however, code performs global memory write particular location of global memory underlying texture, and location subsequently never read via texture during lifetime of kernel, there no read-after-write hazard, , code behave expected: updated info in global memory can accessed subsequent kernel in manner desired, including texture access, texture cache cleared upon kernel launch.

i have used approach speed in-place operations little strides texture read path provided higher load performance. illustration blas-1 operation [d|s|z|c]scal in cublas, scales each array element scalar.

cuda textures

Breedlove

Tuesday, 15 January 2013

cuda kernel for add(a,b,c) using texture objects for a & b - works correctly for 'increment operation' add(a,b,a)? -

No comments:

Post a Comment