Thursday 15 April 2010

python - pandas reindex with data frame -



python - pandas reindex with data frame -

i have dataframe multiindex 3 levels, instance:

col1 col2 ... chrom pos label chr1 43 stra ... ... ... strb ... ... ... 66 strc ... ... ... strb ... ... ... chr2 29 strd ... ... ... ... ... ... ... ... ...

and series multiindex first 2 levels of dataframe index:

val chrom pos chr1 43 v1 66 v2 chr2 29 v3 ... ... ...

i add together column series dataframe, repeating values v1, v2... every index first 2 levels match, this:

col1 col2 new ... chrom pos label chr1 43 stra ... ... v1 ... strb ... ... v1 ... 66 strc ... ... v2 ... strb ... ... v2 ... chr2 29 strd ... ... v3 ... ... ... ... ... ... ... ...

note series has no missing rows, is, (chrom,pos) in dataframe in series. have working solution:

pandas.series(variant_db.index.map(lambda i: cov_per_sample[sample].loc[i[:2]]), index=variant_db.index)

but, because of lambda, quite slow big info (hundreds of thousands of rows). tried much faster:

df['new'] = s.reindex(df.index, method='ffill')

but in way there many nans in df['new'], should not happen. using method='bfill' nans in different positions, rows nans in both cases, using both not work.

i way using library function only, efficiency. can help?

you can seek simple solution big info performance:

df1=pandas.dataframe([ {'chrom':'chr1','pos':43,'label':'stra'}, {'chrom':'chr1','pos':43,'label':'strb'}, {'chrom':'chr1','pos':66,'label':'strc'}, {'chrom':'chr1','pos':66,'label':'strb'}, {'chrom':'chr2','pos':29,'label':'strd'}]) df2=pandas.dataframe([ {'chrom':'chr1','pos':43,'val':'v1'}, {'chrom':'chr1','pos':66,'val':'v2'}, {'chrom':'chr2','pos':29,'val':'v3'}]) i,r in df2.iterrows(): df1.ix[(df1['chrom']==r['chrom']) & (df1['pos']==r['pos']),'new']=r['val']

or using indexes:

df1=pandas.dataframe([ {'chrom':'chr1','pos':43,'label':'stra','col':''}, {'chrom':'chr1','pos':43,'label':'strb','col':''}, {'chrom':'chr1','pos':66,'label':'strc','col':''}, {'chrom':'chr1','pos':66,'label':'strb','col':''}, {'chrom':'chr2','pos':29,'label':'strd','col':''}]).set_index(['chrom','pos','label']) df2=pandas.dataframe([ {'chrom':'chr1','pos':43,'val':'v1'}, {'chrom':'chr1','pos':66,'val':'v2'}, {'chrom':'chr2','pos':29,'val':'v3'}]).set_index(['chrom','pos']) i,r in df2.iterrows(): df1.ix[(i[0],i[1]),'new']=r['val']

python pandas

No comments:

Post a Comment