Breedlove: performance - c++ speed comparison iterator vs index -

Saturday, 15 January 2011

performance - c++ speed comparison iterator vs index -

i writing linalg library in c++, educational purposes , personal use. part of implemented custom matrix class custom row , column iterators. while providing nice feature work std::algorithm , std::numeric functions, performed speed comparing matrix multiplication between index , iterator/std::inner_product approach. results differ significantly:

// used later on custom iterator template<class u> struct everynth {     bool operator()(const u& ) {  homecoming m_count++ % n == 0; }     everynth(std::size_t i) : m_count(0), n(i) {}     everynth(const everynth& element) : m_count(0), n(element.n) {}  private:     int m_count;     std::size_t n; };  template<class t,           std::size_t rowsize,           std::size_t colsize>   class matrix {  private:      //   info stored in mvector, modified std::vector     mvector<t> matrix;      std::size_t row_dim;                       std::size_t column_dim;  public:      // other constructors, 1 matrix in computation     explicit matrix(mvector<t>&& s): matrix(s),                                       row_dim(rowsize),                                       column_dim(colsize){     }          // other code...      typedef boost::filter_iterator<everynth<t>,                                     typename std::vector<t>::iterator> filteriter;      // returns iterator skips elements in range     // if "to" specified, has set value     // @ param "j" - j'th column requested     // @ param "from" - starts @ from'th element     // @ param "to" - goes from'th element "to'th" element     filteriter  begin_col( std::size_t j,                            std::size_t = 0,                             std::size_t = rowsize ){          homecoming boost::make_filter_iterator<everynth<t> >(             everynth<t>( cols() ),              matrix.begin() + index( from, j ),              matrix.begin() + index( to, j )             );     }      // specifies end of iterator     // iterator can not "jump" past  lastly element undefines behaviour     filteriter end_col( std::size_t j,                          std::size_t = rowsize ){          homecoming boost::make_filter_iterator<everynth<t> >(             everynth<t>( cols() ),              matrix.begin() + index( to, j ),              matrix.begin() + index( to, j )             );     }      filteriter  begin_row( std::size_t i,                            std::size_t = 0,                            std::size_t = colsize ){           homecoming boost::make_filter_iterator<everynth<t> >(             everynth<t>( 1 ),              matrix.begin() + index( i, ),              matrix.begin() + index( i, )             );     }      filteriter  end_row( std::size_t i,                          std::size_t = colsize ){          homecoming boost::make_filter_iterator<everynth<t> >(             everynth<t>( 1 ),              matrix.begin() + index( i, ),              matrix.begin() + index( i, )             );     }      // other code...      // allows access element of matrix index expressed     // in terms of rows , columns     // @ param "r" - r'th row of matrix     // @ param "c" - c'th column of matrix     std::size_t index(std::size_t r, std::size_t c) const {          homecoming r*cols()+c;      }      // brackets operator     //  homecoming elements stored in matrix     // @ param "r" - r'th row in matrix     // @ param "c" - c'th column in matrix     t& operator()(std::size_t r, std::size_t c) {          assert(r < rows() && c < matrix.size() / rows());          homecoming matrix[index(r,c)];      }      const t& operator()(std::size_t r, std::size_t c) const {         assert(r < rows() && c < matrix.size() / rows());           homecoming matrix[index(r,c)];      }      // other code...      // end of class };

now in main function in run following:

int main(int argc, char *argv[]){       matrix<int, 100, 100> = matrix<int, 100, 100>(range<int>(10000));       std::clock_t begin = clock();     double b = 0;     for(std::size_t = 0; < a.rows(); i++){         (std::size_t j = 0; j < a.cols(); j++) {                 std::inner_product(a.begin_row(i), a.end_row(i),                                     a.begin_column(j),0);                 }     }      // double b = 0;     // for(std::size_t = 0; < a.rows(); i++){     //     (std::size_t j = 0; j < a.cols(); j++) {     //         (std::size_t k = 0; k < a.rows(); k++) {     //             b += a(i,k)*a(k,j);     //         }     //     }     // }       std::clock_t end = clock();     double elapsed_secs = double(end - begin) / clocks_per_sec;     std::cout << elapsed_secs << std::endl;      std::cout << "--- end of test ---" << std::endl;      std::cout << std::endl;      homecoming 0; }

for std::inner_product/iterator approach takes:

bash-3.2$ ./main 3.78358 --- end of test ---

and index (// out) approach:

bash-3.2$ ./main 0.106173 --- end of test ---

which 40 times faster iterator approach. see in code slow downwards iterator computation much? should mention tried out both methods , produce right results.

thank ideas.

what have understand matrix operations well-understood, , compilers @ doing optimizations of things involved in matrix operations.

consider c = ab, c mxn, mxq, b qxn.

double a[m][q], b[q][n], c[m][n]; for(unsigned = 0; < m; i++){   (unsigned j = 0; j < n; j++) {     double temp = 0.0;     (unsigned k = 0; k < q; k++) {       temp += a[i][k]*b[k][j];     }     c[i][j] = temp;   } }

(you not believe how tempted write above in fortran iv.)

the compiler looks @ this, , notices happening is walking through , c stride of 1 , b stride of q. eliminates multiplications in subscript calculations , straight indexing.

at point, inner loop of form:

temp += a[r1] * b[r2]; r1 += 1; r2 += q;

and have loops around (re)initialize r1 , r2 each pass.

that absolute minimum computation can do straightforward matrix multiplication. cannot less that, because have multiplications , additions , index adjustments.

all can add together overhead.

that's iterator , std::inner_product() approach does: adds metric tonnes of overhead.

c++ performance iterator

Breedlove

Saturday, 15 January 2011

performance - c++ speed comparison iterator vs index -

No comments:

Post a Comment