python - expand each row to multiple rows in pandas using dataframe.apply (similar to MapReduce) -
here's simplified version of problem. have dataframe has start , end locations of trips. want end dataframe has each station number of arrivals , departures.
i familiar mapreduce-like workflows, in map phase can take in 1 row , output multiple rows, , aggregate on rows in cut down phase.
here's code have now, not work.
import pandas pd import numpy np def expand_row(row): homecoming pd.series( { 'station': [row['start_station'], row['end_station']], 'departures': [1, 0], 'arrivals': [0, 1], }, ) trips = pd.dataframe({ 'start_station': ['a', 'c'], 'end_station': ['b', 'a'], }) expanded = df.apply(expand_row, axis=1) aggregated = expanded.groupby('station').aggregate(np.sum)
what want final dataframe is
desired_df = pd.dataframe({ 'station': ['a', 'b', 'c'], 'departures': [1, 0, 1], 'arrivals': [1, 1, 0] }) desired_df.index = desired_df.pop('station')
many thanks.
import pandas pd trips = pd.dataframe({ 'start_station': ['a', 'c'], 'end_station': ['b', 'a'], }) trips.apply(pd.value_counts).fillna(0)
the result is:
end_station start_station 1 1 b 1 0 c 0 1
python pandas
No comments:
Post a Comment