python - Multiprocessing on pandas dataframe. Confusing behavior based on input size -
i trying implement df.apply function parallelized across chunks of dataframe. wrote next test code see how much gain (versus info copying etc):
from multiprocessing import pool functools import partial import pandas pd import numpy np import time def df_apply(df, f): homecoming df.apply(f, axis=1) def apply_in_parallel(df, f, n=5): pool = pool(n) df_chunks = np.array_split(df, n) apply_f = partial(df_apply, f=f) result_list = pool.map(apply_f, df_chunks) homecoming pd.concat(result_list, axis=0) def f(x): homecoming x+1 if __name__ == '__main__': n = 10^8 df = pd.dataframe({"a": np.zeros(n), "b": np.zeros(n)}) print "parallel" t0 = time.time() r = apply_in_parallel(df, f, n=5) print time.time() - t0 print "single" t0 = time.time() r = df.apply(f, axis=1) print time.time() - t0
weird behavior: n=10^7 works n=10^8 gives me error
traceback (most recent phone call last): file "parallel_apply.py", line 27, in <module> r = apply_in_parallel(df, f, n=5) file "parallel_apply.py", line 14, in apply_in_parallel result_list = pool.map(apply_f, df_chunks) file "/usr/lib64/python2.7/multiprocessing/pool.py", line 227, in map homecoming self.map_async(func, iterable, chunksize).get() file "/usr/lib64/python2.7/multiprocessing/pool.py", line 528, in raise self._value attributeerror: 'numpy.ndarray' object has no attribute 'apply'
does know going on here? i'd appreciate feedback on way of parallelizing. expecting functions take more time inc or sum per individual row , millions of rows.
thanks!
n = 10^8
results 2
, n = 10^7
results 13
, because operator ^
xor (not power). 2 row length df can not splitted 5 chunks. utilize instead: n = 10**4
, n = 10**5
. these values see difference in time. careful values greater n = 10**6
(at value parallel time @ 30 sec, single time @ 167 sec). , utilize pool.close()
@ end (before return
) in apply_in_parallel()
automatically close workers in pool.
python pandas multiprocessing python-multiprocessing
No comments:
Post a Comment