Breedlove: twitter - Specify exact time to start and end the collection of tweets using Python Tweepy? -

Tuesday, 15 February 2011

twitter - Specify exact time to start and end the collection of tweets using Python Tweepy? -

i having quite bit of technical issues. python script below works (when time in yyyy-mm-dd' format. during extremely heavy tweet activities, illustration more 500,000 tweets collected day, computer runs out of memory , have forcefulness stop program.

i can work around looking @ time of lastly tweets in stopped csv file, in case it's @ time 18:44:00. have tried many time format (for illustration 'yyyy-mm-dd hh:mm:ss' format below) none works.

import tweepy import time import csv  ckey = "" csecret = "" atoken = "" asecret = ""  oauth_keys = {'consumer_key':ckey, 'consumer_secret':csecret,     'access_token_key':atoken, 'access_token_secret':asecret} auth = tweepy.oauthhandler(oauth_keys['consumer_key'], oauth_keys['consumer_secret']) api = tweepy.api(auth)  # stream first "xxx" tweets related "car", filter out ones without geo-enabled # reference of search (q) operator: https://dev.twitter.com/rest/public/search  #  mutual parameters: changeable here startsince = '2014-09-18 00:00:00' enduntil = '2014-09-18 18:44:00' suffix = '_18sep2014.csv'  ############################ ### lung cancer starts ##### searchterms2 = '"lung cancer" or "lung cancers" or "lungcancer" or "lungcancers" or \     "lung tumor" or "lungtumor" or "lung tumors" or "lungtumors" or "lung neoplasm"'  # items 0 500,000 (which *should* cover tweets) #  increment 4,000 each cycle (because 5000-6000 on twitter rate limit) # wait 20 min before next request (becaues twitter request wait time 15min)  counter2 = 0 tweet in tweepy.cursor(api.search, q=searchterms2,      since=startsince, until=enduntil).items(999999999): # changeable here      try:         '''         print "name:", tweet.author.name.encode('utf8')         print "screen-name:", tweet.author.screen_name.encode('utf8')         print "tweet created:", tweet.created_at'''          placeholder = []         placeholder.append(tweet.author.name.encode('utf8'))         placeholder.append(tweet.author.screen_name.encode('utf8'))         placeholder.append(tweet.created_at)          prefix = 'tweetdata_lungcancer'         wholefilename = prefix + suffix              open(wholefilename, "ab") f: # changeable here             writefile = csv.writer(f)             writefile.writerow(placeholder)          counter2 += 1          if counter2 == 4000:             time.sleep(60*20) # wait 20 min everytime 4,000 tweets extracted              counter2 = 0              go on      except tweepy.tweeperror:         time.sleep(60*20)          go on      except ioerror:         time.sleep(60*2.5)          go on      except stopiteration:         break

python-2.7 twitter tweepy

Breedlove

Tuesday, 15 February 2011

twitter - Specify exact time to start and end the collection of tweets using Python Tweepy? -

No comments:

Post a Comment