Breedlove: python - How can I keep the intersection of a panel between dates using Pandas? -

Thursday, 15 August 2013

python - How can I keep the intersection of a panel between dates using Pandas? -

i've got panel of cost info has multiple ids each date.

date          id      cost  2012-06-08   1234     6.09              2345     5.08              3456     1.23 2012-06-09   1234     6.10              3456     1.25

i need maintain rows ids same consecutive dates. i'm trying calculate returns portfolio changes every month , coherent way take intersection of securities consecutive dates , take difference of sum of prices. tried filter dataframe iterating through dates, wasn't fruitful. here's effort ('hol' original dataframe , 'dates' list of unique dates in 'hol'):

newd = pd.dataframe() in range(1, len(dates)+1):     newd.append(hol[hol['id'][dates[i-1].isin(         list(set(hol['id'][dates[i-1]]).intersection(             set(hol['id'][dates[i]]))

please help!

one thing exploit dataframe.shift() method in order find differences. if combine groupby, when grouping on ids end results see want them. trick though, need dataframe has date/id pair of every unique date , every unique id in order work.

the process follows:

create df info have create 'balanced panel' of info data frame contains every date/id combination possible df have. have cost values appropriate, , na values not. group new dataframe on id, , utilize shift() method differences in stock prices using apply method +drop na row, akin keeping observations have consecutive days.

so extended info following:

import pandas pd import datetime numpy import nan na  d = [datetime.datetime(2012, 6, 8).date(), datetime.datetime(2012, 6, 8).date(), datetime.datetime(2012, 6, 8).date(),      datetime.datetime(2012, 6, 9).date(), datetime.datetime(2012, 6, 9).date(), datetime.datetime(2012, 6, 9).date(),      datetime.datetime(2012, 6, 10).date(), datetime.datetime(2012, 6, 10).date(), datetime.datetime(2012, 6, 10).date()]  id = [1234, 2345, 3456, 1234, 3456, 4567, 1234, 2345, 4567]  cost = [6.09, 5.08, 1.23, 6.10, 1.25, 9.9, 6.0, 5.10, 10.0,]  df = pd.dataframe({'date' : d, 'id' : id, 'price' : price})

then follows:

#now create balanced panel of   info based on df df2 = pd.dataframe({'date'  : [date x in xrange(len(df.id.unique())) date in df.date.unique()],                     'id'    : [id x in xrange(len(df.date.unique())) id in df.id.unique()]})  #set index both dataframes                   df = df.set_index(['date', 'id']) df2 = df2.set_index(['date', 'id'])  #create  cost column in df2 na relevant observations missing in df. df2['price'] = pd.series([df.loc[row, 'price'] if row in df.index else na row in df2.index], index = df2.index)  #sort df2 index df2 = df2.sort_index()  #group   info , apply function find differences in  cost shifting   info 1 place df2.groupby(level = 1, as_index = false).apply(lambda x: x.price - x.price.shift()).dropna()

gives me next output:

   date        id 0  2012-06-09  1234    0.01    2012-06-10  1234   -0.10 2  2012-06-09  3456    0.02 3  2012-06-10  4567    0.10

which seems want?

python pandas filter panel intersection

Breedlove

Thursday, 15 August 2013

python - How can I keep the intersection of a panel between dates using Pandas? -

No comments:

Post a Comment