Appropriate scheduling algorithms for harvesting Facebook pages? -
i want schedule harvesting facebook pages @ appropriate intervals. pages have more content (the simpsons thousands of comments , likes per post), others have less content (unsealed files, few hundred comments , likes per post), , still other pages have harvested every few minutes because real time event going on (such during hockey match, on colorado avalanche).
i'm trying find appropriate algorithms schedule these different types of pages. @ moment, utilize simplistic algorithm: harvest n pages on m hours. schedule harvest every (m * 60 * 60) / n
seconds. schedule real time pages using same algorithm, except it's time shifted schedule @ origin of period, , every x minutes until end of event.
this worked until started suffering bufferbloat: queue holds requests harvest pages empties when harvester ready. don't "drop packets", hence requests queue behind other pages , prevent latest requests harvesting.
the statistics maintain track of , can utilize during scheduling decisions are:
the time scheduled each page harvest; the actual time each page started harvesting; the volume of info harvested on each page; whether or not page needed real time harvesting.this problem seems network scheduler algorithm. on right track? other algorithms should investigate?
algorithm scheduling
No comments:
Post a Comment