Sunday 15 March 2015

Calculating mean for class variables in python dataframe -



Calculating mean for class variables in python dataframe -

i have dataframe of session log-in data. each entry associated class (e, c, g, m). rows this:

1: [session_start_time session_end_time class_id problems_completed student_id student_account_created student_previous_logins_total student_previous_class_logins duration] 2: [1/6/12 16:28 1/6/12 16:55 e 37 91 10/26/11 0:00 76 27 1/1/04 0:27] 3: [1/11/12 13:18 1/11/12 13:58 m 33 172 1/10/12 0:00 5 3 1/1/04 0:40]

i trying calculate average "duration" each class (e, c, g, etc.). having problem finding right command calculate average per class, rather mean of whole column.

i not sure info format/structure mean source info in, since nowadays not exact python representation. let's assume rows lists of strings (or can converted them):

rows = [ [ '1/6/12 16:28', '1/6/12 16:55', 'e' ], [ '1/11/12 13:18', '1/11/12 13:58', 'm' ], [ '1/13/12 13:20', '1/13/12 13:24', 'm' ] ]

then, here's 1 way compute mean class:

from collections import counter datetime import datetime def parse(s, format="%x %h:%m"): """ homecoming parsed datetime in given format. """ homecoming datetime.strptime(s, format) total_items = counter() total_duration = counter() start, end, kind in rows: duration = parse(end) - parse(start) total_items[kind] += 1 total_duration[kind] += duration.total_seconds() means = { k: total_duration[k] / total_items[k] k in total_items } print means

this uses collections.counters track both count of each class in log , duration. duration must computed, first parsing date/time string representation internal format datetime.datetime. 1 time counters accumulated, dictionary comprehension computes mean per kind (what phone call "class" that's technical python construct, phone call kind).

the resulting means stores computed values. means['m'] gives mean of 'm' entries, , forth.

while parse function work few info samples showed in question, date/time parsing pretty finicky. instead of using strptime method here, recommend using more expansive , inclusive parser, such found in dateutil module. if wanted utilize that, delete or rename parse function found here, , substitute:

from dateutil.parser import parse

that provides drop-in replacement much broader range of accepted formats.

python class mean

No comments:

Post a Comment