Wednesday 15 January 2014

concurrency - OpenCL multiple command queue for Concurrent NDKernal Launch -



concurrency - OpenCL multiple command queue for Concurrent NDKernal Launch -

i m trying run application of vector addition, need launch multiple kernels concurrently, concurrent kernel launch in lastly question advised me utilize multiple command queues. m defining array

context = clcreatecontext(null, 1, &device_id, null, null, &err); for(i=0;i<num_ker;++i) { queue[i] = clcreatecommandqueue(context, device_id, 0, &err); }

i m getting error "command terminated signal 11" around above code.

i m using loop launching kernels , en-queue info too

for(i=0;i<num_ker;++i) { err = clenqueuendrangekernel(queue[i], kernel, 1, null, &globalsize, &localsize, 0, null, null); }

the thing m not sure m going wrong, saw somewhere can create array of command queues, thats why m using array. information, when m not using loop, manually defining multiple command queues, works fine.

i read lastly question, , think should first rethink want , if opencl way of doing it.

opencl api masive parallel processing , info crunching. each kernel (or queued task) operates parallelly on many info values @ same time, hence outperforming serial cpu processing many orders of magnitude.

the typical utilize case opencl 1 kernel running millions of work items. more advance applications may need multiple sequences of different kernels, , special syncronizations between cpu , gpu.

but concurrency never requirement. (otherwise, single core cpu not able perform task, , thats never case. slower, ok, still possible run it)

even if 2 tasks need run @ same time. time taken same concurrently or not:

not concurrent case:

kernel 1: * kernel 2: - gpu core 1: *****----- gpu core 2: *****----- gpu core 3: *****----- gpu core 4: *****-----

concurrent case:

kernel 1: * kernel 2: - gpu core 1: ********** gpu core 2: ********** gpu core 3: ---------- gpu core 4: ----------

in fact, non concurrent case preferred, since @ to the lowest degree first task completed , farther processing can continue.

what want do, far understand, run multiple kernels @ same time. kernels run concurrently. example, run 100 kernels (same kernel or different) , run them @ same time.

that not fit opencl model @ all. , in fact in may way slower cpu single thread.

if each kernel independent others, core (simd or cpu) can allocated 1 kernel @ time (because have 1 pc), though run 1k threads @ same time. in ideal scenario, convert opencl device in pool of few cores (6-10) consume serially kernels queued. , supposing api supports , device well, not case. in worst case have single device runs single kernel , 99% wasted.

examples of stuff can done in opencl:

data crunching/processing. multiply vectors, simulate particles, etc.. image processing, border detection, filtering, etc. video compresion, edition, generation raytracing, complex lite math, etc. sorting

examples of stuff not suitable opencl:

atending async request (http, trafic, interactive data) procesing low amounts of data procesing info need different procesing each type of it

from point of view, real utilize case of using multiple kernels latter, , no matter performance horrible in case. improve utilize multithread pool instead.

concurrency opencl gpu gpgpu multi-gpu

No comments:

Post a Comment