python - Check existence of column names -
i have info frame df
contains many field names series of years.
field year description 1993 bar0 a01arb92 bar1 a01svb92 bar2 a01fam92 bar3 a08 bar4 a01bea93
then, every year, have stata file has id
column , additional columns, (or all) of field names mentioned in df
. example, 1993.dta
id a01arb92 a01svb92 a08 a01bea93 0 1 1 1 1 0 1 1 1 2
i need check every year if fields listed in df
exist (as columns) in corresponding file. save result in original info frame. there nice way without iterating on every single field?
expected output:
field exists year description 1993 bar0 a01arb92 1 bar1 a01svb92 1 bar2 a01fam92 0 bar3 a08 1 bar4 a01bea93 1
for example, if every field a01fam92
exists in 1993 file column.
here way utilizing fact pandas automatically fill nan missing indices.
first prepare data. may have done step.
df1 = pd.read_csv(r'c:\temp\test1.txt', sep=' ') df1 out[30]: year description field 0 1993 bar0 a01arb92 1 1993 bar1 a01svb92 2 1993 bar2 a01fam92 3 1993 bar3 a08 4 1993 bar4 a01bea93 df1 = df1.set_index(['year', 'description', 'field']) df2 = pd.read_csv(r'c:\temp\test2.txt', sep=' ') df2 out[33]: year description field 0 1993 bar0 a01arb92 1 1993 bar1 a01svb92 2 1993 bar3 a08 3 1993 bar4 a01bea93 df2 = df2.set_index(['year', 'description', 'field'])
next, create new columns in df2 , utilize pandas re-create on columns previous dataframe. fill nan missing values. utilize fillna
assign value of 0.
df2['exists'] = 1 df1['exists'] = df2['exists'] df1 out[37]: exists year description field 1993 bar0 a01arb92 1 bar1 a01svb92 1 bar2 a01fam92 nan bar3 a08 1 bar4 a01bea93 1 df1.fillna(0) out[38]: exists year description field 1993 bar0 a01arb92 1 bar1 a01svb92 1 bar2 a01fam92 0 bar3 a08 1 bar4 a01bea93 1
python pandas
No comments:
Post a Comment