Friday 15 August 2014

python - Fastest Count of Row Dependent Date Ranges -



python - Fastest Count of Row Dependent Date Ranges -

i have info set looks (end_time 7 hours after start_time):

value start_time end_time 1 2014-10-14 05:00:00 2014-10-14 12:00:00 2 2014-10-14 08:00:00 2014-10-14 15:00:00 3 2014-10-14 14:00:00 2014-10-14 21:00:00 4 2014-10-14 06:00:00 2014-10-14 13:00:00 5 b 2014-10-14 05:00:00 2014-10-14 12:00:00 6 b 2014-10-14 06:00:00 2014-10-14 13:00:00

i want add together new column counts number of rows same value start_time within start_time , end_time of row. result this:

value start_time end_time count 1 2014-10-14 05:00:00 2014-10-14 12:00:00 2 2 2014-10-14 08:00:00 2014-10-14 15:00:00 1 3 2014-10-14 14:00:00 2014-10-14 21:00:00 0 4 2014-10-14 06:00:00 2014-10-14 13:00:00 2 5 b 2014-10-14 05:00:00 2014-10-14 12:00:00 1 6 b 2014-10-14 06:00:00 2014-10-14 13:00:00 0

currently have:

for in range(0, len(df['value'])): df['count'][i] = df[(df['start_time'] >= df['start_time'][i]) & (df['start_time'] <= df['end_time'][i]) & (df['value'] == df['value'][i])].shape[0]

i have big number of rows , turns out slow , includes in count every row needs subtracted 1.

is there faster way calculation?

thanks!

in sentiment way can accomplish if start_time increases. dispatch complexity @ insertion time keeping ordered rows. sorted list of rows, testing if next ones within [start_time, end_time] easy, since element not in bound you'll know next elements won't either.

if can't maintain sorted list @ insertion, think there no more efficient way sort list.

python pandas

No comments:

Post a Comment