Tuesday 15 February 2011

twitter - Specify exact time to start and end the collection of tweets using Python Tweepy? -



twitter - Specify exact time to start and end the collection of tweets using Python Tweepy? -

i having quite bit of technical issues. python script below works (when time in yyyy-mm-dd' format. during extremely heavy tweet activities, illustration more 500,000 tweets collected day, computer runs out of memory , have forcefulness stop program.

i can work around looking @ time of lastly tweets in stopped csv file, in case it's @ time 18:44:00. have tried many time format (for illustration 'yyyy-mm-dd hh:mm:ss' format below) none works.

import tweepy import time import csv ckey = "" csecret = "" atoken = "" asecret = "" oauth_keys = {'consumer_key':ckey, 'consumer_secret':csecret, 'access_token_key':atoken, 'access_token_secret':asecret} auth = tweepy.oauthhandler(oauth_keys['consumer_key'], oauth_keys['consumer_secret']) api = tweepy.api(auth) # stream first "xxx" tweets related "car", filter out ones without geo-enabled # reference of search (q) operator: https://dev.twitter.com/rest/public/search # mutual parameters: changeable here startsince = '2014-09-18 00:00:00' enduntil = '2014-09-18 18:44:00' suffix = '_18sep2014.csv' ############################ ### lung cancer starts ##### searchterms2 = '"lung cancer" or "lung cancers" or "lungcancer" or "lungcancers" or \ "lung tumor" or "lungtumor" or "lung tumors" or "lungtumors" or "lung neoplasm"' # items 0 500,000 (which *should* cover tweets) # increment 4,000 each cycle (because 5000-6000 on twitter rate limit) # wait 20 min before next request (becaues twitter request wait time 15min) counter2 = 0 tweet in tweepy.cursor(api.search, q=searchterms2, since=startsince, until=enduntil).items(999999999): # changeable here try: ''' print "name:", tweet.author.name.encode('utf8') print "screen-name:", tweet.author.screen_name.encode('utf8') print "tweet created:", tweet.created_at''' placeholder = [] placeholder.append(tweet.author.name.encode('utf8')) placeholder.append(tweet.author.screen_name.encode('utf8')) placeholder.append(tweet.created_at) prefix = 'tweetdata_lungcancer' wholefilename = prefix + suffix open(wholefilename, "ab") f: # changeable here writefile = csv.writer(f) writefile.writerow(placeholder) counter2 += 1 if counter2 == 4000: time.sleep(60*20) # wait 20 min everytime 4,000 tweets extracted counter2 = 0 go on except tweepy.tweeperror: time.sleep(60*20) go on except ioerror: time.sleep(60*2.5) go on except stopiteration: break

python-2.7 twitter tweepy

No comments:

Post a Comment