Visualising Twitter activity – a work in progress part II: Getting the tweets

In part I I covered the basic idea, and the hardware. Now I’m ready to turn my attention on actually getting my hands on some tweets.

Twitter has an API, an Application Program Interface, that allows programs to talk directly with the Twitter-servers. In order to gain access, I need to have an authenticated Twitter account. I already have that, it was not easy, as Twitter requires a phonenumber to be associated with the account. That can only happen if the phone company supports receiving SMS’es from Twitter. My phone company did not until recently.

That hurdle tackled, I go to the Twitter Apps site. I have to be logged in, and click “Create New App”. After I’ve filled out the required information, I’m provided with four long random strings. I’m not going to show them here. Lets just say that they are long, and contains both digits and letters, both upper and lower case. I get a Consumer Key and a Consumer Secret, basically a login to Twitter. And I get an Access Token and an Access Token Secret – that are specific to the given application I’m gonna write. Those four strings are my credentials, that I’m going to use to gain access.

Next step. I could write a program that sends a request to the Twitter API, and receives data. Or rather, I could’nt, because I’m not that good a programmer. As luck would have it, I’m not the first person in this situation.

But I’m getting ahead of myself. The first question should actually be: Which language should I write this in?

The de facto preferred language on the Raspberry Pi is Python. Python is not the fastest language in the world – its an interpreted program rather than a compiled. But it has a very large userbase, the fact that it is interpreted makes development quick. And it is very popular in data science circles. Propably because it is pretty accessible to people that are not really programmers.

I could write a program – script actually – from the bottom up, that handles all the connectivity with Twitter. Or rather I can’t. I’m not that good a programmer. But someone else has already written a library, a collection of code, that handles all the really difficult parts. Python comes with a LOT of libraries. One of the libraries is called Twython, a portmanteau of Twitter and Python. The intelligent reader of this post, can guess what Twython does.

I do not need to understand everything myself. Others have done that, and on this page, I find an example of how to get tweets with Twython in Python. The code is as follows:

from twython import TwythonStreamer

class TweetStreamer(TwythonStreamer):
    def on_success(self, data):
        if 'text' in data:
            print data['text'].encode('utf-8')

    def on_error(self, status_code, data):
        print status_code
        self.disconnect()
# replace these with the details from your Twitter Application
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''

streamer = TweetStreamer(consumer_key, consumer_secret,
                         access_token, access_token_secret)

streamer.statuses.filter(track = 'python')

OK. I enter my credentials from Twitter, and runs the code. sudo python collect.py.

And tweets containing the word “python” is streaming down my screen:

Actually it was a bit more difficult. The Twitter API requires a safe connection, and none of the requisite libraries handling safe connections were installed. That took quite a lot of googling. A pro-tip: Read the error messages you get. Some of them contains links to solutions.

I must admit that I’m not quite sure what is actually happening. Object-based programming is not my strong suit. Never mind, I just need to understand enough.

The line

streamer.statuses.filter(track = 'python')

Tells the API that I’m interested in tweets containing the word “python”

And this part of the script:

    def on_success(self, data):
        if 'text' in data:
            print data['text'].encode('utf-8')

Does stuff to the tweet I receive. The tweet is returned from Twitter in a format called JSON. And it is saved in the variable called “data”. There are different fields in the format, one of them is “text”. I can access the content of that field by calling data[‘text’]. I may need to do something about the encoding. People tweet in a lot of different languages, so tweets are encoded in the character set utf-8. As you can see in the screenshot, my computer does not really know what to do with exotic characters.

There are a lot of other fields containing data in the tweets. But the main objective is achieved – I can get tweets on my screen in realtime. I you tweet something containing the word “python”, I will get it on my screen moments after.

The second part of the mission is accomplished. On to the third. But first – we’re going to the mall for a burger.