python - Fastest Count of Row Dependent Date Ranges -
i have info set looks (end_time 7 hours after start_time):
value start_time end_time 1 2014-10-14 05:00:00 2014-10-14 12:00:00 2 2014-10-14 08:00:00 2014-10-14 15:00:00 3 2014-10-14 14:00:00 2014-10-14 21:00:00 4 2014-10-14 06:00:00 2014-10-14 13:00:00 5 b 2014-10-14 05:00:00 2014-10-14 12:00:00 6 b 2014-10-14 06:00:00 2014-10-14 13:00:00 i want add together new column counts number of rows same value start_time within start_time , end_time of row. result this:
value start_time end_time count 1 2014-10-14 05:00:00 2014-10-14 12:00:00 2 2 2014-10-14 08:00:00 2014-10-14 15:00:00 1 3 2014-10-14 14:00:00 2014-10-14 21:00:00 0 4 2014-10-14 06:00:00 2014-10-14 13:00:00 2 5 b 2014-10-14 05:00:00 2014-10-14 12:00:00 1 6 b 2014-10-14 06:00:00 2014-10-14 13:00:00 0 currently have:
for in range(0, len(df['value'])): df['count'][i] = df[(df['start_time'] >= df['start_time'][i]) & (df['start_time'] <= df['end_time'][i]) & (df['value'] == df['value'][i])].shape[0] i have big number of rows , turns out slow , includes in count every row needs subtracted 1.
is there faster way calculation?
thanks!
in sentiment way can accomplish if start_time increases. dispatch complexity @ insertion time keeping ordered rows. sorted list of rows, testing if next ones within [start_time, end_time] easy, since element not in bound you'll know next elements won't either.
if can't maintain sorted list @ insertion, think there no more efficient way sort list.
python pandas
No comments:
Post a Comment