parallel processing - Difference between MP and SP (or is it Core) in CUDA parallelism? -
so, after reading useful posts, came conclusion that,
each grid of blocks executes on single device. (e.g device has 9 mp)
each block of threads executes on single multiprocessor. (e.g 1 mp has 8 sp/cores).
each grouping of threads (called warps) executes on sp/core.now, if assume
i have total of 72 cores (9*8) , i phone call kernel 9 blocks , 8 threads,total 72 threads. all 72 threads run in parallel ?however, if phone call more that, not run in parallel ?
each grid of blocks executes on single device. (e.g device has 9 mp)
yes!
each block of threads executes on single multiprocessor. (e.g 1 mp has 8 sp/cores). yes!
each grouping of threads (called warps) executes on sp/core. no! typically warp (32 threads on current hardware) distributed amongst cores.
now, if assume have total of 72 cores (9*8) , phone call kernel 9 blocks , 8 threads,total 72 threads. 72 threads run in parallel ? yes! won't fast...
however, if phone call more that, not run in parallel ? they run in parallel. gpu achieves performance via over-subscription. core cannot finish instruction in 1 cycle, , takes many cycles homecoming (this known latency). having more threads cores can issue instruction in gap between instruction beingness started, , executing. way can peak performance out of gpu - having many threads per core - , fundamental gpu programming. there limits on how many threads sm can have on it, typically want many possible, several times many threads have cores.
from programmers point of view threads within grid run in parallel. hardware point of view every sm runs in parallel every other sm. each sm can have multiple warps instructions executing on each cycle. each instruction executed in parallel across cores of sm. each core can have many operations in pipeline.
the subtle difference between programmer's point of view , hardware's point of view, there resource based dependencies in hardware. these dependancies aren't visible programmer.
parallel-processing cuda gpu
No comments:
Post a Comment