python - How can I keep the intersection of a panel between dates using Pandas? -
i've got panel of cost info has multiple ids each date.
date id cost 2012-06-08 1234 6.09 2345 5.08 3456 1.23 2012-06-09 1234 6.10 3456 1.25
i need maintain rows ids same consecutive dates. i'm trying calculate returns portfolio changes every month , coherent way take intersection of securities consecutive dates , take difference of sum of prices. tried filter dataframe iterating through dates, wasn't fruitful. here's effort ('hol' original dataframe , 'dates' list of unique dates in 'hol'):
newd = pd.dataframe() in range(1, len(dates)+1): newd.append(hol[hol['id'][dates[i-1].isin( list(set(hol['id'][dates[i-1]]).intersection( set(hol['id'][dates[i]]))
please help!
one thing exploit dataframe.shift()
method in order find differences. if combine groupby, when grouping on ids end results see want them. trick though, need dataframe has date/id pair of every unique date , every unique id in order work.
the process follows:
create df info have create 'balanced panel' of info data frame contains every date/id combination possible df have. have cost values appropriate, , na values not. group new dataframe on id, , utilizeshift()
method differences in stock prices using apply
method +drop na row, akin keeping observations have consecutive days. so extended info following:
import pandas pd import datetime numpy import nan na d = [datetime.datetime(2012, 6, 8).date(), datetime.datetime(2012, 6, 8).date(), datetime.datetime(2012, 6, 8).date(), datetime.datetime(2012, 6, 9).date(), datetime.datetime(2012, 6, 9).date(), datetime.datetime(2012, 6, 9).date(), datetime.datetime(2012, 6, 10).date(), datetime.datetime(2012, 6, 10).date(), datetime.datetime(2012, 6, 10).date()] id = [1234, 2345, 3456, 1234, 3456, 4567, 1234, 2345, 4567] cost = [6.09, 5.08, 1.23, 6.10, 1.25, 9.9, 6.0, 5.10, 10.0,] df = pd.dataframe({'date' : d, 'id' : id, 'price' : price})
then follows:
#now create balanced panel of info based on df df2 = pd.dataframe({'date' : [date x in xrange(len(df.id.unique())) date in df.date.unique()], 'id' : [id x in xrange(len(df.date.unique())) id in df.id.unique()]}) #set index both dataframes df = df.set_index(['date', 'id']) df2 = df2.set_index(['date', 'id']) #create cost column in df2 na relevant observations missing in df. df2['price'] = pd.series([df.loc[row, 'price'] if row in df.index else na row in df2.index], index = df2.index) #sort df2 index df2 = df2.sort_index() #group info , apply function find differences in cost shifting info 1 place df2.groupby(level = 1, as_index = false).apply(lambda x: x.price - x.price.shift()).dropna()
gives me next output:
date id 0 2012-06-09 1234 0.01 2012-06-10 1234 -0.10 2 2012-06-09 3456 0.02 3 2012-06-10 4567 0.10
which seems want?
python pandas filter panel intersection
No comments:
Post a Comment