Breedlove: python - Faster alternatives to numpy.argmax/argmin which is slow -

Monday, 15 February 2010

python - Faster alternatives to numpy.argmax/argmin which is slow -

i using lot of argmin , argmax in python.

unfortunately, function slow.

i have done searching around, , best can find here:

http://lemire.me/blog/archives/2008/12/17/fast-argmax-in-python/

def fastest_argmax(array): array = list( array ) homecoming array.index(max(array))

unfortunately, solution still half fast np.max, , think should able find fast np.max.

x = np.random.randn(10) %timeit np.argmax( x ) 10000 loops, best of 3: 21.8 per loop  %timeit fastest_argmax( x )     10000 loops, best of 3: 20.8 per loop

as note, applying pandas dataframe groupby

e.g.

%timeit grp2[ 'odds' ].agg( [ fastest_argmax ] ) 100 loops, best of 3: 8.8 ms per loop  %timeit grp2[ 'odds' ].agg( [ np.argmax ] ) 100 loops, best of 3: 11.6 ms per loop

where info looks this:

grp2[ 'odds' ].head() out[60]:  event_id   selection_id         104601100  4367029       682508    3.05                          682509    3.15                          682510    3.25                          682511    3.35            5319660       682512    2.04                          682513    2.08                          682514    2.10                          682515    2.12                          682516    2.14            5510310       682520    4.10                          682521    4.40                          682522    4.50                          682523    4.80                          682524    5.30            5559264       682526    5.00                          682527    5.30                          682528    5.40                          682529    5.50                          682530    5.60            5585869       682533    1.96                          682534    1.97                          682535    1.98                          682536    2.02                          682537    2.04            6064546       682540    3.00                          682541    2.74                          682542    2.76                          682543    2.96                          682544    3.05 104601200  4916112       682548    2.64                          682549    2.68                          682550    2.70                          682551    2.72                          682552    2.74            5315859       682557    2.90                          682558    2.92                          682559    3.05                          682560    3.10                          682561    3.15            5356995       682564    2.42                          682565    2.44                          682566    2.48                          682567    2.50                          682568    2.52            5465225       682573    1.85                          682574    1.89                          682575    1.91                          682576    1.93                          682577    1.94            5773661       682588    5.00                          682589    4.40                          682590    4.90                          682591    5.10            6013187       682592    5.00                          682593    4.20                          682594    4.30                          682595    4.40                          682596    4.60 104606300  2489827       683438    4.00                          683439    3.90                          683440    3.95                          683441    4.30                          683442    4.40            3602724       683446    2.16                          683447    2.32 name: odds, length: 65, dtype: float64

it turns out np.argmax is blazingly fast, only native numpy arrays. foreign data, time spent on conversion:

in [194]: print platform.architecture() ('64bit', 'windowspe')  in [5]: x = np.random.rand(10000) in [57]: l=list(x) in [123]: timeit numpy.argmax(x) 100000 loops, best of 3: 6.55 per loop in [122]: timeit numpy.argmax(l) 1000 loops, best of 3: 729 per loop in [134]: timeit numpy.array(l) 1000 loops, best of 3: 716 per loop

i called function "inefficient" because first converts list, iterates through 2 times (effectively, 3 iterations + list construction).

i going suggest iterates once:

def imax(seq):     it=iter(seq)     im=0     try: m=it.next()     except stopiteration: raise valueerror("the sequence empty")     i,e in enumerate(it,start=1):         if e>m:             m=e             im=i      homecoming im

but, version turns out faster because iterates many times in c, rather python, code. c much faster - considering fact great deal of time spent on conversion, too:

in [158]: timeit imax(x) 1000 loops, best of 3: 883 per loop in [159]: timeit fastest_argmax(x) 1000 loops, best of 3: 575 per loop  in [174]: timeit list(x) 1000 loops, best of 3: 316 per loop in [175]: timeit max(l) 1000 loops, best of 3: 256 per loop in [181]: timeit l.index(0.99991619010758348)  #the greatest number in case, @ index 92 100000 loops, best of 3: 2.69 per loop

so, key knowledge speeding farther know format info in sequence natively (e.g. whether can omit conversion step or use/write functionality native format).

btw, you're speedup using aggregate(max_fn) instead of agg([max_fn]).

python numpy

Breedlove

Monday, 15 February 2010

python - Faster alternatives to numpy.argmax/argmin which is slow -

No comments:

Post a Comment