Friday 15 March 2013

python - expand each row to multiple rows in pandas using dataframe.apply (similar to MapReduce) -



python - expand each row to multiple rows in pandas using dataframe.apply (similar to MapReduce) -

here's simplified version of problem. have dataframe has start , end locations of trips. want end dataframe has each station number of arrivals , departures.

i familiar mapreduce-like workflows, in map phase can take in 1 row , output multiple rows, , aggregate on rows in cut down phase.

here's code have now, not work.

import pandas pd import numpy np def expand_row(row): homecoming pd.series( { 'station': [row['start_station'], row['end_station']], 'departures': [1, 0], 'arrivals': [0, 1], }, ) trips = pd.dataframe({ 'start_station': ['a', 'c'], 'end_station': ['b', 'a'], }) expanded = df.apply(expand_row, axis=1) aggregated = expanded.groupby('station').aggregate(np.sum)

what want final dataframe is

desired_df = pd.dataframe({ 'station': ['a', 'b', 'c'], 'departures': [1, 0, 1], 'arrivals': [1, 1, 0] }) desired_df.index = desired_df.pop('station')

many thanks.

import pandas pd trips = pd.dataframe({ 'start_station': ['a', 'c'], 'end_station': ['b', 'a'], }) trips.apply(pd.value_counts).fillna(0)

the result is:

end_station start_station 1 1 b 1 0 c 0 1

python pandas

No comments:

Post a Comment