Cuda, determine the last block on SM -
in short: possible determine if block lastly (and if first) on particular sm?
details: have problem, each block create quite complex calculation, results in array of 2k elements, , want sum these elements. have 3k blocks. if atomic add @ end of each block global memory array, slow badly. do:
use shared array sum values in each sm if block first in sm (there no block running yet on particular sm) initialize shared array (clear 0) do calculation, , add together result shared array if it's lastly block in sm , atomic add together shared array values global array.is possible? or other solution?
it's not possible.
shared memory allocated per block. lifetime of shared memory begins when block begins , ends when block ends. shared memory of other blocks on sm separate, , it's not legal or valid assume happen in same place.
each block should it's own reduction, , write it's values global memory. if want avoid atomics, have each block write it's own values separate locations in shared memory, , have lastly block in grid perform final calculations. possible next method outlined in threadfence reduction sample code
you have each block loop on multiple info sets. in case, each block able accumulate results several info sets shared memory, before writing intermediate results global memory.
cuda
No comments:
Post a Comment