Saturday 15 January 2011

performance - c++ speed comparison iterator vs index -



performance - c++ speed comparison iterator vs index -

i writing linalg library in c++, educational purposes , personal use. part of implemented custom matrix class custom row , column iterators. while providing nice feature work std::algorithm , std::numeric functions, performed speed comparing matrix multiplication between index , iterator/std::inner_product approach. results differ significantly:

// used later on custom iterator template<class u> struct everynth { bool operator()(const u& ) { homecoming m_count++ % n == 0; } everynth(std::size_t i) : m_count(0), n(i) {} everynth(const everynth& element) : m_count(0), n(element.n) {} private: int m_count; std::size_t n; }; template<class t, std::size_t rowsize, std::size_t colsize> class matrix { private: // info stored in mvector, modified std::vector mvector<t> matrix; std::size_t row_dim; std::size_t column_dim; public: // other constructors, 1 matrix in computation explicit matrix(mvector<t>&& s): matrix(s), row_dim(rowsize), column_dim(colsize){ } // other code... typedef boost::filter_iterator<everynth<t>, typename std::vector<t>::iterator> filteriter; // returns iterator skips elements in range // if "to" specified, has set value // @ param "j" - j'th column requested // @ param "from" - starts @ from'th element // @ param "to" - goes from'th element "to'th" element filteriter begin_col( std::size_t j, std::size_t = 0, std::size_t = rowsize ){ homecoming boost::make_filter_iterator<everynth<t> >( everynth<t>( cols() ), matrix.begin() + index( from, j ), matrix.begin() + index( to, j ) ); } // specifies end of iterator // iterator can not "jump" past lastly element undefines behaviour filteriter end_col( std::size_t j, std::size_t = rowsize ){ homecoming boost::make_filter_iterator<everynth<t> >( everynth<t>( cols() ), matrix.begin() + index( to, j ), matrix.begin() + index( to, j ) ); } filteriter begin_row( std::size_t i, std::size_t = 0, std::size_t = colsize ){ homecoming boost::make_filter_iterator<everynth<t> >( everynth<t>( 1 ), matrix.begin() + index( i, ), matrix.begin() + index( i, ) ); } filteriter end_row( std::size_t i, std::size_t = colsize ){ homecoming boost::make_filter_iterator<everynth<t> >( everynth<t>( 1 ), matrix.begin() + index( i, ), matrix.begin() + index( i, ) ); } // other code... // allows access element of matrix index expressed // in terms of rows , columns // @ param "r" - r'th row of matrix // @ param "c" - c'th column of matrix std::size_t index(std::size_t r, std::size_t c) const { homecoming r*cols()+c; } // brackets operator // homecoming elements stored in matrix // @ param "r" - r'th row in matrix // @ param "c" - c'th column in matrix t& operator()(std::size_t r, std::size_t c) { assert(r < rows() && c < matrix.size() / rows()); homecoming matrix[index(r,c)]; } const t& operator()(std::size_t r, std::size_t c) const { assert(r < rows() && c < matrix.size() / rows()); homecoming matrix[index(r,c)]; } // other code... // end of class };

now in main function in run following:

int main(int argc, char *argv[]){ matrix<int, 100, 100> = matrix<int, 100, 100>(range<int>(10000)); std::clock_t begin = clock(); double b = 0; for(std::size_t = 0; < a.rows(); i++){ (std::size_t j = 0; j < a.cols(); j++) { std::inner_product(a.begin_row(i), a.end_row(i), a.begin_column(j),0); } } // double b = 0; // for(std::size_t = 0; < a.rows(); i++){ // (std::size_t j = 0; j < a.cols(); j++) { // (std::size_t k = 0; k < a.rows(); k++) { // b += a(i,k)*a(k,j); // } // } // } std::clock_t end = clock(); double elapsed_secs = double(end - begin) / clocks_per_sec; std::cout << elapsed_secs << std::endl; std::cout << "--- end of test ---" << std::endl; std::cout << std::endl; homecoming 0; }

for std::inner_product/iterator approach takes:

bash-3.2$ ./main 3.78358 --- end of test ---

and index (// out) approach:

bash-3.2$ ./main 0.106173 --- end of test ---

which 40 times faster iterator approach. see in code slow downwards iterator computation much? should mention tried out both methods , produce right results.

thank ideas.

what have understand matrix operations well-understood, , compilers @ doing optimizations of things involved in matrix operations.

consider c = ab, c mxn, mxq, b qxn.

double a[m][q], b[q][n], c[m][n]; for(unsigned = 0; < m; i++){ (unsigned j = 0; j < n; j++) { double temp = 0.0; (unsigned k = 0; k < q; k++) { temp += a[i][k]*b[k][j]; } c[i][j] = temp; } }

(you not believe how tempted write above in fortran iv.)

the compiler looks @ this, , notices happening is walking through , c stride of 1 , b stride of q. eliminates multiplications in subscript calculations , straight indexing.

at point, inner loop of form:

temp += a[r1] * b[r2]; r1 += 1; r2 += q;

and have loops around (re)initialize r1 , r2 each pass.

that absolute minimum computation can do straightforward matrix multiplication. cannot less that, because have multiplications , additions , index adjustments.

all can add together overhead.

that's iterator , std::inner_product() approach does: adds metric tonnes of overhead.

c++ performance iterator

No comments:

Post a Comment