The internet is really a cat-picture delivery system. One thing I have observed on twitter, is that images of cute kittens, in a library setting, receives a LOT of attention. Retweets, likes, all those endorphin inducing things.
Images of cute puppies give the same results. But I have the distinct impression, that people prefer cute kittens over puppies. That, however, is anecdotal evidence. Is there a way to measure it?
What we are interested in is the number of impressions. That is available from analytics.twitter.com. But-but-but. If the puppy-tweet is from november 1st and the kitten-tweet is from december 1st. And I am collecting the impressions on december 2nd, that wont give me a fair comparison. The puppy will have had far longer to collect impressions than the kitten. What I need, is to follow the interactions with the two tweets over time.
A problem with this is, that the twitter API does not give me access to those numbers. I will need another way to get them. I have written a Python script, that collect the csv-files from Twitter analytics for a number of months back. It can be found on my github page. Basically I use the Selenium package to let Python control a browser, that automagically download the statistics. If I do that once every hour, I will be able to graph the number of interactions over time.
I would like a rather comprehensive set of data. Not only kitties and puppies, but also cute hedgehodges, kangaroos etc. It would also be interesting to see if the time the tweet is made makes a difference. And just a single tweet of a cat is not a lot, there should probably be several tweets with each species. We are talking quite a lot of tweets. And quite a lot of data.
I am going to use a google spreadsheet as the backend. That will make it possible to graph the results as they come in.
One thing to consider working with Google Spreadsheets, is the inherent limitations. A workbook has an upper limit of 2.000.000 cells. I am going to gather rather a lot of data.
A quick back-of-envelope calculation:
Once every hour, I am going to download statistics for five months. I probably wont tweet more than 150 times pr month. And I am going to do this for three months, with an average of 30 days in each month. And 24 hours in every day. And for each tweet, I am going to collect 14 variables.
That results in: 14*24*30*3*150*5 = 22.680.000
A bit to much. How to reduce that? First of all, not every tweet I make will be relevant. I am still going to tweet about the benefits of reading instructions before deciding that you can’t figure out how to do something. If I only collect the tweets relevant to this study, it will greatly reduce the amount of data. In the above calculation I collect information on a total of 450 tweets. I will probably only need data on 30 tweets pr month. What also helps, is that the first day, I will get 24 data points on one tweet. On day two, I will get 24 datapoints on two tweets etc. That will probably save me.
The steps will be as follows:
- Locate a suitable collection of images of cute animals in a library setting.
- Plan when to tweet them – take into consideration the time of the tweets.
- Collect – regularly – the data as pr the scripts mentioned above
- Analyse the data.