Breedlove: python - Check existence of column names -

Saturday, 15 June 2013

python - Check existence of column names -

i have info frame df contains many field names series of years.

                                                   field year description                                                1993 bar0                                       a01arb92      bar1                                       a01svb92      bar2                                       a01fam92      bar3                                       a08      bar4                                       a01bea93

then, every year, have stata file has id column , additional columns, (or all) of field names mentioned in df. example, 1993.dta

id a01arb92 a01svb92 a08 a01bea93 0 1 1 1 1 0 1 1 1 2

i need check every year if fields listed in df exist (as columns) in corresponding file. save result in original info frame. there nice way without iterating on every single field?

expected output:

                                                   field   exists year description                                                1993 bar0                                       a01arb92        1      bar1                                       a01svb92        1      bar2                                       a01fam92        0      bar3                                       a08             1      bar4                                       a01bea93        1

for example, if every field a01fam92 exists in 1993 file column.

here way utilizing fact pandas automatically fill nan missing indices.

first prepare data. may have done step.

df1 = pd.read_csv(r'c:\temp\test1.txt', sep=' ')  df1 out[30]:     year description     field 0  1993        bar0  a01arb92 1  1993        bar1  a01svb92 2  1993        bar2  a01fam92 3  1993        bar3       a08 4  1993        bar4  a01bea93  df1 = df1.set_index(['year', 'description', 'field'])  df2 = pd.read_csv(r'c:\temp\test2.txt', sep=' ')  df2 out[33]:     year description     field 0  1993        bar0  a01arb92 1  1993        bar1  a01svb92 2  1993        bar3       a08 3  1993        bar4  a01bea93  df2 = df2.set_index(['year', 'description', 'field'])

next, create new columns in df2 , utilize pandas re-create on columns previous dataframe. fill nan missing values. utilize fillna assign value of 0.

df2['exists'] = 1  df1['exists'] = df2['exists']  df1 out[37]:                             exists year description field            1993 bar0        a01arb92       1      bar1        a01svb92       1      bar2        a01fam92     nan      bar3        a08            1      bar4        a01bea93       1  df1.fillna(0) out[38]:                             exists year description field            1993 bar0        a01arb92       1      bar1        a01svb92       1      bar2        a01fam92       0      bar3        a08            1      bar4        a01bea93       1

python pandas

Breedlove

Saturday, 15 June 2013

python - Check existence of column names -

No comments:

Post a Comment