Map a Ruby array of timeseries values to a weighted interval -
i have ruby array of arrays represents series of observations of metric that's recorded on time. each inner array has 2 elements:
atime instance in utc describing when observation recorded the integer value of observation for example, might have like:
[ [<time: 2014-01-15 @ 18:00>, 100], [<time: 2014-01-16 @ 06:00>, 200], [<time: 2014-01-16 @ 12:00>, 300], [<time: 2014-01-16 @ 23:00>, 400], [<time: 2014-01-17 @ 12:00>, 500], [<time: 2014-01-18 @ 03:00>, 600], [<time: 2014-01-18 @ 06:00>, 700], ] the problem @ hand turn array of weighted values each date:
[ [<date: 2014-01-15>, 100], [<date: 2014-01-16>, 229], ... ] the value each day in above array obtained next procedure:
break day series of intervals delimited each observation , boundaries of day.
for example, since jan 16th has observations @ 06:00, 12:00, , 23:00, broken intervals of 00:00-06:00, 06:00-12:00, 12:00-23:00, , 23:00-00:00.
the value of each interval equal value of observation @ origin of interval, or lastly observation made if it's start of day.
for example, value of 06:00-12:00 interval on jan 16th 200, since value of 200 recorded @ 06:00.
the value of 00:00-06:00 interval on jan 15th 100, since value of 100 lastly observation recorded @ point day started.
the weighted value of each interval equal value multiplied fraction of lengths of intervals in day occupied.
for example, weighted value of 06:00-12:00 interval on jan 16th 50 (200 * 0.25).
the final weighted value of each day sum of weighted values of intervals, coerced integer.
for example, weighted value jan 16th 229, because:
(100*(6/24) + 200*(6/24) + 300*(11/24) + 400*(1/24)).to_i = 229
the first point in array special case: day starts there, rather @ 00:00, jan 15th has 1 interval: 18:00-00:00 value of 100, weighted value 100.
any suggestions on how started tackling this?
i've assumed there no days no entries.
i found convenient first transform array of time objects. rules used transformation follows (arb refers arbitrary value, may equal val):
[dt, val] 3 elements: [dt1, val], dt1 same date @ time 00:00:00 [dt2, arb], dt2 same date @ time 23:59:59 [dt3, val], dt3 1 day later @ time 00:00:00 for lastly day, if [dt, val] lastly element day, add together element [dt1, arb], dt same date @ time 23:59:59. for every day other first , last, if [dt, val] lastly element day, add together 2 elements: [dt1, arb], dt1 same date @ time 23:59:59 [dt2, val], dt2 1 day later @ time 00:00:00 suppose next initial array. clarity, i've used strings (allowing me replace "23:59:59" "24:00"):
arr = [ ["2014-01-15 18:00", 100], ["2014-01-16 06:00", 200], ["2014-01-16 12:00", 300], ["2014-01-16 23:00", 400], ["2014-01-17 12:00", 500], ["2014-01-18 03:00", 600], ["2014-01-18 06:00", 700] ] after applying above rules, obtain:
arr1 = [ ["2014-01-15 00:00", 100], ["2014-01-15 24:00", 100], ["2014-01-16 00:00", 100], ["2014-01-16 06:00", 200], ["2014-01-16 12:00", 300], ["2014-01-16 23:00", 400], ["2014-01-16 24:00", 400], ["2014-01-17 00:00", 400], ["2014-01-17 12:00", 500], ["2014-01-17 24:00", 500], ["2014-01-18 00:00", 500], ["2014-01-18 03:00", 600], ["2014-01-18 06:00", 700], ["2014-01-18 24:00", 700] ] or elements grouped date,
arr1 = [ ["2014-01-15 00:00", 100], ["2014-01-15 24:00", 100], ["2014-01-16 00:00", 100], ["2014-01-16 06:00", 200], ["2014-01-16 12:00", 300], ["2014-01-16 23:00", 400], ["2014-01-16 24:00", 400], ["2014-01-17 00:00", 400], ["2014-01-17 12:00", 500], ["2014-01-17 24:00", 500], ["2014-01-18 00:00", 500], ["2014-01-18 03:00", 600], ["2014-01-18 06:00", 700], ["2014-01-18 24:00", 700] ] code implement these rules should straightforward. 1 time have arr1, create enumerator enumerable#chunk:
enum = arr1.chunk { |a| a.first[0,10] } #=> #<enumerator: #<enumerator::generator:0x000001010e30d8>:each> let's see elements of enum:
enum.to_a #=> [["2014-01-15", [["2014-01-15 00:00", 100], ["2014-01-15 24:00", 100]]], # ["2014-01-16", [["2014-01-16 00:00", 100], ["2014-01-16 06:00", 200], # ["2014-01-16 12:00", 300], ["2014-01-16 23:00", 400], # ["2014-01-16 24:00", 400]]], # ["2014-01-17", [["2014-01-17 00:00", 400], ["2014-01-17 12:00", 500], # ["2014-01-17 24:00", 500]]], # ["2014-01-18", [["2014-01-18 00:00", 500], ["2014-01-18 03:00", 600], # ["2014-01-18 06:00", 700], ["2014-01-18 24:00", 700]]]] now need map each element (one per date) weighted average of vals (noting don't utilize first element of each element of enum):
enum.map { |_,arr| (arr.each_cons(2) .reduce(0.0) { |t,((d1,v1),(d2,_))| t + min_diff(d2,d1)*v1 }/1440.0).round(2) } #=> [100.0, 229.17, 450.0, 662.5] using helper:
def min_diff(str1, str2) 60*(str1[-5,2].to_i - str2[-5,2].to_i) + str1[-2,2].to_i - str2[-2,2].to_i end putting together:
arr1.chunk { |a| a.first[0,10] } .map { |_,arr| (arr.each_cons(2) .reduce(0.0) { |t,((d1,v1),(d2,_))| t + min_diff(d2,d1)*v1 }/1440.0).round(2) } #=> [100.0, 229.17, 450.0, 662.5] along helper min_diff.
ruby arrays time