Breedlove: python - Fastest Count of Row Dependent Date Ranges -

Friday 15 August 2014

python - Fastest Count of Row Dependent Date Ranges -

i have info set looks (end_time 7 hours after start_time):

        value               start_time              end_time 1                    2014-10-14 05:00:00    2014-10-14 12:00:00 2                    2014-10-14 08:00:00    2014-10-14 15:00:00 3                    2014-10-14 14:00:00    2014-10-14 21:00:00 4                    2014-10-14 06:00:00    2014-10-14 13:00:00 5         b            2014-10-14 05:00:00    2014-10-14 12:00:00 6         b            2014-10-14 06:00:00    2014-10-14 13:00:00

i want add together new column counts number of rows same value start_time within start_time , end_time of row. result this:

        value               start_time              end_time             count           1                    2014-10-14 05:00:00    2014-10-14 12:00:00          2 2                    2014-10-14 08:00:00    2014-10-14 15:00:00          1 3                    2014-10-14 14:00:00    2014-10-14 21:00:00          0 4                    2014-10-14 06:00:00    2014-10-14 13:00:00          2 5         b            2014-10-14 05:00:00    2014-10-14 12:00:00          1 6         b            2014-10-14 06:00:00    2014-10-14 13:00:00          0

currently have:

for in range(0, len(df['value'])):     df['count'][i] = df[(df['start_time'] >= df['start_time'][i]) & (df['start_time'] <= df['end_time'][i]) & (df['value'] == df['value'][i])].shape[0]

i have big number of rows , turns out slow , includes in count every row needs subtracted 1.

is there faster way calculation?

thanks!

in sentiment way can accomplish if start_time increases. dispatch complexity @ insertion time keeping ordered rows. sorted list of rows, testing if next ones within [start_time, end_time] easy, since element not in bound you'll know next elements won't either.

if can't maintain sorted list @ insertion, think there no more efficient way sort list.

python pandas

Breedlove

Friday 15 August 2014

python - Fastest Count of Row Dependent Date Ranges -

No comments:

Post a Comment