multithreading - cudaDeviceSynchronize() waits to finish only in current CUDA context or in all contexts? -
i utilize cuda 6.5 , 4 x gpus kepler.
i utilize multithreading, cuda runtime api , access cuda contexts different cpu threads (by using openmp - not matter).
when phone call cudadevicesynchronize();
wait kernel(s) finish in current cuda context selected latest phone call cudasetdevice()
, or in cuda contexts?
if wait kernel(s) finish in cuda contexts, wait in cuda contexts used in current cpu thread (in illustration cpu thread_0 wait gpus: 0 , 1) or cuda contexts (cpu thread_0 wait gpus: 0, 1, 2 , 3)?
following code:
// using openmp requires set: // msvs option: -xcompiler "/openmp" // gcc option: –xcompiler –fopenmp #include <omp.h> int main() { // execute 2 threads different: omp_get_thread_num() = 0 , 1 #pragma omp parallel num_threads(2) { int omp_threadid = omp_get_thread_num(); // cpu thread 0 if(omp_threadid == 0) { cudasetdevice(0); kernel_0<<<...>>>(...); cudasetdevice(1); kernel_1<<<...>>>(...); cudadevicesynchronize(); // kernel<>() wait? // cpu thread 1 } else if(omp_threadid == 1) { cudasetdevice(2); kernel_2<<<...>>>(...); cudasetdevice(3); kernel_3<<<...>>>(...); cudadevicesynchronize(); // kernel<>() wait? } } homecoming 0; }
when phone call cudadevicesynchronize(); wait kernel(s) finish in current cuda context selected latest phone call cudasetdevice(), or in cuda contexts?
cudadevicesynchronize()
synchronize host set gpu, if multiple gpus in utilize , need synchronized, cudadevicesynchronize()
has called separately each one.
here minimal example:
cudasetdevice(0); cudadevicesynchronize(); cudasetdevice(1); cudadevicesynchronize(); ...
so, reply cudadevicesynchronize()
syncs streams in current cuda context
source: pawel pomorski, slides of "cuda on multiple gpus". linked here.
multithreading cuda gpgpu nvidia
No comments:
Post a Comment