Monday 15 September 2014

recursion - Recursive function using pthreads in C -



recursion - Recursive function using pthreads in C -

i have next piece of code

#include "stdio.h" #include "stdlib.h" #include <string.h> #define maxbins 8 void swap_long(unsigned long int **x, unsigned long int **y){ unsigned long int *tmp; tmp = x[0]; x[0] = y[0]; y[0] = tmp; } void swap(unsigned int **x, unsigned int **y){ unsigned int *tmp; tmp = x[0]; x[0] = y[0]; y[0] = tmp; } void truncated_radix_sort(unsigned long int *morton_codes, unsigned long int *sorted_morton_codes, unsigned int *permutation_vector, unsigned int *index, int *level_record, int n, int population_threshold, int sft, int lv){ int binsizes[maxbins] = {0}; unsigned int *tmp_ptr; unsigned long int *tmp_code; level_record[0] = lv; // record level of node if(n<=population_threshold || sft < 0) { // base of operations case. node leaf memcpy(permutation_vector, index, n*sizeof(unsigned int)); // re-create pernutation vector memcpy(sorted_morton_codes, morton_codes, n*sizeof(unsigned long int)); // re-create morton codes return; } else{ // find kid each point belongs int j = 0; for(j=0; j<n; j++){ unsigned int ii = (morton_codes[j]>>sft) & 0x07; binsizes[ii]++; } // scan prefix int offset = 0, = 0; for(i=0; i<maxbins; i++){ int ss = binsizes[i]; binsizes[i] = offset; offset += ss; } for(j=0; j<n; j++){ unsigned int ii = (morton_codes[j]>>sft) & 0x07; permutation_vector[binsizes[ii]] = index[j]; sorted_morton_codes[binsizes[ii]] = morton_codes[j]; binsizes[ii]++; } //swap index pointers swap(&index, &permutation_vector); //swap code pointers swap_long(&morton_codes, &sorted_morton_codes); /* phone call function recursively split lower levels */ offset = 0; for(i=0; i<maxbins; i++){ int size = binsizes[i] - offset; truncated_radix_sort(&morton_codes[offset], &sorted_morton_codes[offset], &permutation_vector[offset], &index[offset], &level_record[offset], size, population_threshold, sft-3, lv+1); offset += size; } } }

i tried create block

int j = 0; for(j=0; j<n; j++){ unsigned int ii = (morton_codes[j]>>sft) & 0x07; binsizes[ii]++; }

parallel substituting following

int rc,j; pthread_t *thread = (pthread_t *)malloc(nthreads*sizeof(pthread_t)); belong *belongs = (belong *)malloc(nthreads*sizeof(belong)); pthread_mutex_init(&bin_mtx, null); (j = 0; j < nthreads; j++){ belongs[j].n = nthreads; belongs[j].n = n; belongs[j].tid = j; belongs[j].sft = sft; belongs[j].binsizes = binsizes; belongs[j].mcodes = morton_codes; rc = pthread_create(&thread[j], null, belong_wrapper, (void *)&belongs[j]); } (j = 0; j < nthreads; j++){ rc = pthread_join(thread[j], null); }

and defining these outside recursive function

typedef struct{ int n, n, tid, sft; int *binsizes; unsigned long int *mcodes; }belong; pthread_mutex_t bin_mtx; void * belong_wrapper(void *arg){ int n, n, tid, sft, j; int *binsizes; unsigned int ii; unsigned long int *mcodes; n = ((belong *)arg)->n; n = ((belong *)arg)->n; tid = ((belong *)arg)->tid; sft = ((belong *)arg)->sft; binsizes = ((belong *)arg)->binsizes; mcodes = ((belong *)arg)->mcodes; (j = tid; j<n; j+=n){ ii = (mcodes[j] >> sft) & 0x07; pthread_mutex_lock(&bin_mtx); binsizes[ii]++; pthread_mutex_unlock(&bin_mtx); } }

however takes lot more time serial 1 execute... why happening? should change?

since you're using single mutex guard updates binsizes array, you're still doing updates array sequentially: 1 thread can phone call binsizes[ii]++ @ given time. you're still executing function in sequence incurring overhead of creating , destroying threads.

there several options can think of (there more):

do @chris suggests , create each thread update 1 portion of binsizes. might not viable depending on properties of calculation you're using compute ii.

create multiple mutexes representing different partitions of binsizes. example, if binsizes has 10 elements, create 1 mutex elements 0-4, , elements 5-9, utilize them in thread so:

if (ii < 5) { mtx_index = 0; } else { mtx_index = 1; } pthread_mutex_lock(&bin_mtx[mtx_index]); binsizes[ii]++; pthread_mutex_unlock(&bin_mtx[mtx_index]);

you generalize thought size of binsizes , range: potentially have different mutex each array element. of course of study you're opening overhead of creating each of these mutexes, , possibility of deadlock if tries lock several of them @ 1 time etc...

finally, abandon thought of parallelizing block altogether: other users have mentioned using threads way subject level of diminishing returns. unless binsizes array large, might not see huge benefit parallelization if "do right".

c recursion pthreads

No comments:

Post a Comment