Saturday 15 August 2015

Map a Ruby array of timeseries values to a weighted interval -



Map a Ruby array of timeseries values to a weighted interval -

i have ruby array of arrays represents series of observations of metric that's recorded on time. each inner array has 2 elements:

a time instance in utc describing when observation recorded the integer value of observation

for example, might have like:

[ [<time: 2014-01-15 @ 18:00>, 100], [<time: 2014-01-16 @ 06:00>, 200], [<time: 2014-01-16 @ 12:00>, 300], [<time: 2014-01-16 @ 23:00>, 400], [<time: 2014-01-17 @ 12:00>, 500], [<time: 2014-01-18 @ 03:00>, 600], [<time: 2014-01-18 @ 06:00>, 700], ]

the problem @ hand turn array of weighted values each date:

[ [<date: 2014-01-15>, 100], [<date: 2014-01-16>, 229], ... ]

the value each day in above array obtained next procedure:

break day series of intervals delimited each observation , boundaries of day.

for example, since jan 16th has observations @ 06:00, 12:00, , 23:00, broken intervals of 00:00-06:00, 06:00-12:00, 12:00-23:00, , 23:00-00:00.

the value of each interval equal value of observation @ origin of interval, or lastly observation made if it's start of day.

for example, value of 06:00-12:00 interval on jan 16th 200, since value of 200 recorded @ 06:00.

the value of 00:00-06:00 interval on jan 15th 100, since value of 100 lastly observation recorded @ point day started.

the weighted value of each interval equal value multiplied fraction of lengths of intervals in day occupied.

for example, weighted value of 06:00-12:00 interval on jan 16th 50 (200 * 0.25).

the final weighted value of each day sum of weighted values of intervals, coerced integer.

for example, weighted value jan 16th 229, because:

(100*(6/24) + 200*(6/24) + 300*(11/24) + 400*(1/24)).to_i = 229

the first point in array special case: day starts there, rather @ 00:00, jan 15th has 1 interval: 18:00-00:00 value of 100, weighted value 100.

any suggestions on how started tackling this?

i've assumed there no days no entries.

i found convenient first transform array of time objects. rules used transformation follows (arb refers arbitrary value, may equal val):

for first day, replace single element [dt, val] 3 elements: [dt1, val], dt1 same date @ time 00:00:00 [dt2, arb], dt2 same date @ time 23:59:59 [dt3, val], dt3 1 day later @ time 00:00:00 for lastly day, if [dt, val] lastly element day, add together element [dt1, arb], dt same date @ time 23:59:59. for every day other first , last, if [dt, val] lastly element day, add together 2 elements: [dt1, arb], dt1 same date @ time 23:59:59 [dt2, val], dt2 1 day later @ time 00:00:00

suppose next initial array. clarity, i've used strings (allowing me replace "23:59:59" "24:00"):

arr = [ ["2014-01-15 18:00", 100], ["2014-01-16 06:00", 200], ["2014-01-16 12:00", 300], ["2014-01-16 23:00", 400], ["2014-01-17 12:00", 500], ["2014-01-18 03:00", 600], ["2014-01-18 06:00", 700] ]

after applying above rules, obtain:

arr1 = [ ["2014-01-15 00:00", 100], ["2014-01-15 24:00", 100], ["2014-01-16 00:00", 100], ["2014-01-16 06:00", 200], ["2014-01-16 12:00", 300], ["2014-01-16 23:00", 400], ["2014-01-16 24:00", 400], ["2014-01-17 00:00", 400], ["2014-01-17 12:00", 500], ["2014-01-17 24:00", 500], ["2014-01-18 00:00", 500], ["2014-01-18 03:00", 600], ["2014-01-18 06:00", 700], ["2014-01-18 24:00", 700] ]

or elements grouped date,

arr1 = [ ["2014-01-15 00:00", 100], ["2014-01-15 24:00", 100], ["2014-01-16 00:00", 100], ["2014-01-16 06:00", 200], ["2014-01-16 12:00", 300], ["2014-01-16 23:00", 400], ["2014-01-16 24:00", 400], ["2014-01-17 00:00", 400], ["2014-01-17 12:00", 500], ["2014-01-17 24:00", 500], ["2014-01-18 00:00", 500], ["2014-01-18 03:00", 600], ["2014-01-18 06:00", 700], ["2014-01-18 24:00", 700] ]

code implement these rules should straightforward. 1 time have arr1, create enumerator enumerable#chunk:

enum = arr1.chunk { |a| a.first[0,10] } #=> #<enumerator: #<enumerator::generator:0x000001010e30d8>:each>

let's see elements of enum:

enum.to_a #=> [["2014-01-15", [["2014-01-15 00:00", 100], ["2014-01-15 24:00", 100]]], # ["2014-01-16", [["2014-01-16 00:00", 100], ["2014-01-16 06:00", 200], # ["2014-01-16 12:00", 300], ["2014-01-16 23:00", 400], # ["2014-01-16 24:00", 400]]], # ["2014-01-17", [["2014-01-17 00:00", 400], ["2014-01-17 12:00", 500], # ["2014-01-17 24:00", 500]]], # ["2014-01-18", [["2014-01-18 00:00", 500], ["2014-01-18 03:00", 600], # ["2014-01-18 06:00", 700], ["2014-01-18 24:00", 700]]]]

now need map each element (one per date) weighted average of vals (noting don't utilize first element of each element of enum):

enum.map { |_,arr| (arr.each_cons(2) .reduce(0.0) { |t,((d1,v1),(d2,_))| t + min_diff(d2,d1)*v1 }/1440.0).round(2) } #=> [100.0, 229.17, 450.0, 662.5]

using helper:

def min_diff(str1, str2) 60*(str1[-5,2].to_i - str2[-5,2].to_i) + str1[-2,2].to_i - str2[-2,2].to_i end

putting together:

arr1.chunk { |a| a.first[0,10] } .map { |_,arr| (arr.each_cons(2) .reduce(0.0) { |t,((d1,v1),(d2,_))| t + min_diff(d2,d1)*v1 }/1440.0).round(2) } #=> [100.0, 229.17, 450.0, 662.5]

along helper min_diff.

ruby arrays time

No comments:

Post a Comment