Wednesday, 15 April 2015

multithreading - cudaDeviceSynchronize() waits to finish only in current CUDA context or in all contexts? -



multithreading - cudaDeviceSynchronize() waits to finish only in current CUDA context or in all contexts? -

i utilize cuda 6.5 , 4 x gpus kepler.

i utilize multithreading, cuda runtime api , access cuda contexts different cpu threads (by using openmp - not matter).

when phone call cudadevicesynchronize(); wait kernel(s) finish in current cuda context selected latest phone call cudasetdevice(), or in cuda contexts?

if wait kernel(s) finish in cuda contexts, wait in cuda contexts used in current cpu thread (in illustration cpu thread_0 wait gpus: 0 , 1) or cuda contexts (cpu thread_0 wait gpus: 0, 1, 2 , 3)?

following code:

// using openmp requires set: // msvs option: -xcompiler "/openmp" // gcc option: –xcompiler –fopenmp #include <omp.h> int main() { // execute 2 threads different: omp_get_thread_num() = 0 , 1 #pragma omp parallel num_threads(2) { int omp_threadid = omp_get_thread_num(); // cpu thread 0 if(omp_threadid == 0) { cudasetdevice(0); kernel_0<<<...>>>(...); cudasetdevice(1); kernel_1<<<...>>>(...); cudadevicesynchronize(); // kernel<>() wait? // cpu thread 1 } else if(omp_threadid == 1) { cudasetdevice(2); kernel_2<<<...>>>(...); cudasetdevice(3); kernel_3<<<...>>>(...); cudadevicesynchronize(); // kernel<>() wait? } } homecoming 0; }

when phone call cudadevicesynchronize(); wait kernel(s) finish in current cuda context selected latest phone call cudasetdevice(), or in cuda contexts?

cudadevicesynchronize() synchronize host set gpu, if multiple gpus in utilize , need synchronized, cudadevicesynchronize() has called separately each one.

here minimal example:

cudasetdevice(0); cudadevicesynchronize(); cudasetdevice(1); cudadevicesynchronize(); ...

so, reply cudadevicesynchronize() syncs streams in current cuda context

source: pawel pomorski, slides of "cuda on multiple gpus". linked here.

multithreading cuda gpgpu nvidia

No comments:

Post a Comment