Breedlove: python - random sampling with pandas dataframe -

Wednesday, 15 July 2015

python - random sampling with pandas dataframe -

i'm relatively new pandas (and python... , programming) , i'm trying montecarlo simulation, have not beingness able find solution takes reasonable amount of time

the info stored in info frame called "ytdsales" has sales per day, per product

date          product_a     product_b     product_c     product_d     ...   product_xx 01/01/2014         1000           300            70         34500     ...          780    02/01/2014          400           400            70            20     ...           10    03/01/2014         1110           400          1170            60     ...           50    04/01/2014           20           320             0         71300     ...           10           ... 15/10/2014         1000           300            70         34500     ...         5000

and want simulate different scenarios, using rest of year (from oct 15 year end) historical distribution each product had. illustration info presented fill rest of year sales between 20 , 1100.

what i've done following

# creates range of "future dates" last_historical = ytdsales.index.max() year_end = dt.datetime(2014,12,30) dateseoy = pd.date_range(start=last_historical,end=year_end).shift(1)  # function obtains random sales number per product, between max , min f = lambda x:np.random.randint(x.min(),x.max())  # create "future" dates , fill output of f in dateseoy:     ytdsales.loc[i]=ytdsales.apply(f)

the solution works, takes 3 seconds, lot if plan 1,000 iterations... there way not iterate?

thanks

use size alternative np.random.randint sample of needed size @ once. 1 approach consider briefly follows.

allocate space you'll need new array have index values dateseoy, columns original dataframe, , nan values. concatenate onto original data.

now know length of each random sample you'll need, utilize size keyword in numpy.random.randint sample @ once, per column, instead of looping.

overwrite info batch sampling.

here's like:

new_df = pandas.dataframe(index=dateseoy, columns=ytdsales.columns)  num_to_sample = len(new_df)  f = lambda x: np.random.randint(x[1].min(), x[1].max(), num_to_sample)  output = pandas.concat([ytdsales, new_df], axis=0)  output[len(ytdsales):] = np.asarray(map(f, ytdsales.iteritems())).t

along way, take create totally new dataframe, concatenating old 1 new "placeholder" one. inefficient big data.

another way approach setting enlargement you've done in for-loop solution.

i did not play around approach long plenty figure out how "enlarge" batches of indexes @ once. but, if figure out, can "enlarge" original info frame nan values (at index values dateseoy), , apply function ytdsales instead of bringing output @ all.

python performance pandas montecarlo

Breedlove

Wednesday, 15 July 2015

python - random sampling with pandas dataframe -

No comments:

Post a Comment