Wednesday, 18 February 2009

Tracking conferences (at Dev8D) with python, twitter and tags

There was so much going on at (#dev8d) that it might be foolish for me to attempt to write up what happened.

So, I'll focus on a small, but to my mind, crucial aspect of it - tag tracking with a focus on Twitter.

The Importance of Tags

First, the tag (#)dev8d was cloudburst over a number of social sites - Flickr(dev8d tagged photos), Twitter(dev8d feed), blogs such as the JISCInvolve Dev8D site, and so on. This was not just done for publicity, but as a means to track and re-assemble the various inputs to and outputs from the event.

Flickr has some really nice photos on it, shared by people like Ian Ibbotson (who caught an urban fox on camera during the event!) While there was an 'official' dev8d flickr user, I expect the most unexpected and most interesting photos to be shared by other people who kindly add on the dev8d tag so we can find them. For conference organisers, this means that there is a pool of images that we can choose from, each with their own provenance so we can contact the owner if we wanted to re-use, or re-publish. Of course, if the owner puts a CC licence on them, it makes things easier :)

So, asserting a tag or label for an event is a useful thing to do in any case. But, this twinned with using a messaging system like Twitter or, means that you can coordinate, share, and bring together an event. There was a projector in the Basecamp room, which was either the bar, or one of the large basement rooms at Birkbeck depending on the day. Initially, this was used to run through the basic flow of events, which was primarily organised through the use of a wiki, to which all of us and the attendees were members.

Projecting the bird's eye view of the event

I am not entirely sure whose idea it was initially to use the projector to follow the dev8d tag on twitter, auto-refreshing itself every minute, but it would be one or more of the following: Dave Flanders(@dfflanders), Andy McGregor(@andymcg) and Dave Tarrant(@davetaz) who is aka BitTarrant due to his network wizardry keeping the wifi going despite Birkbeck's network's best efforts at stopping any form of useful networking going.

The funny thing about the feed being there, was that it felt perfectly natural from the start. Almost like a mix of notice board, event liveblog and facebook status updates, but the overall effect was like it was the bird's eye view of the entire event, which you could dip into and out of at will, follow up on talks you weren't even attending, catch interesting links that people posted, and just follow the whole event while doing your own thing.

Then things got interesting.

From what I heard, a conversation in the bar about developer happiness (involving @rgardler?) lead to Sam Easterby-Smith (@samscam) to create a script that dug through the dev8d tweets looking for n/m (like 7/10) and to use that as a mark of happyness e.g.
" @samscam #dev8d I am seriously 9/10 happy HOW HAPPY ARE YOU? " (Tue, 10 Feb 2009 11:17:15)

And computed the average happyness and overall happyness of those who tweeted how they were doing!

Of course, being friendly, constructive sorts, we knew the best way to help 'improve' his happyometer was to try to break it by sending it bad input... *ahem*.
" @samscam #dev8d based on instant discovery of bugs in the Happier Pipe am now only 3/5 happy " (Tue, 10 Feb 2009 23:05:05)
BUT things got fixed, and the community got involved and interested. It caused talk and debate, got people wondering how that it was done, how they could do the same thing and how to take it further.

At which point, I thought it might be fun to 'retweet' the happyness ratings as they change, to keep a running track of things. And so, a purpose for @randomdev8d was born:

How I did this was fairly simple: I grabbed his page every minute or so, used BeautifulSoup to parse the HTML, got the happyness numbers out and compared it to the last ones the script had seen. If there was a change, it tweeted it and seconds later, the projected tweet feed updated to show the new values - a disproportionate feedback loop, the key to involvement in games; you do something small like press a button or add 4/10 to a message, and you can affect the stock-market ticker of happyness :)

If I had been able to give my talk on the python code day, the code to do this would contain zero surprises, because I covered 99% of this - so here's my 'slides'[pdf] - basically a snapshot screencast.

Here's the crufty code though that did this:
import time
import simplejson, httplib2, BeautifulSoup
h = httplib2.Http()
happy = httplib2.Http()
o = 130.9
a = 7.7
import urllib

while True:
print "Checking happiness...."
(resp, content) = happy.request('')
soup = BeautifulSoup.BeautifulSoup(content)
overallHappyness = soup.findAll('div')[2].contents
avergeHappyness = soup.findAll('div')[4].contents
over = float(overallHappyness[0])
ave = float(avergeHappyness[0])
print "Overall %s - Average %s" % (over, ave)
omess = "DOWN"
if over > o:
omess = "UP!"
amess = "DOWN"
if ave > a:
amess= "UP!"
if over == o:
omess = "SAME"
if ave == a:
amess = "SAME"
if not (o == over and a == ave):
print "Change!"
o = over
a = ave
tweet = "Overall happiness is now %s(%s), with an average=%s(%s) #dev8d (from" % (overallHappyness[0], omess, avergeHappyness[0], amess)
data = {'status':tweet}
body = urllib.urlencode(data)
(rs,cont) = h.request('', "POST", body=body)
print "No change"
(Available from with syntax highlighting - NB this was written beat-poet style, written from A to B with little concern for form. The fact that it works is a miracle, so comment on the code if you must.)

The grand, official #Dev8D survey!

... which was anything but official, or grand. The happyness-o-meter idea lead BitTarrant and I to think "Wouldn't it be cool to find out what computers people have brought here?" Essentially, finding out what computer environment developers choose to use is a very valuable thing - developers choose things which make our lives easier, by and large, so finding out which setups they use by preference to develop or work with could guide later choices, such as being able to actually target the majority of environments for wifi, software, or talks.

So, on the Wednesday morning, Dave put out the call on @dev8d for people to post the operating systems on the hardware they brought to this event, in the form of OS/HW. I then busied myself with writing a script that hit the twitter search api directly, and parsed it itself. As this was a more intended script, I made sure that it kept track of things properly, pickling its per-person tallys. (You could post up multiple configurations in one or more tweets, and it kept track of it per-person.) This script was a little bloated at 86 lines, so I won't post it inline - plus, it also showed that I should've gone to the regexp lesson, as I got stuck trying to do it with regexp, gave up, and then used whitespace-tokenising... but it worked fine ;)

Survey code:

Survey results:

Linux was the majority at 42% closely followed by Apple at 37% with MS-based OS at 18% with a stellar showing of one user of OpenSolaris (4%)!

Hardware type:
66% were laptops, with 25% of the machines there being classed as netbooks. 8% of the hardware there were iPhones too, and one person claimed to have brought Amazon EC2 with them ;)

The post hoc analysis

Now then, having gotten back to normal life, I've spent a little time grabbing stuff from twitter and digging through them. Here is the list of the 1300+ tweets with the #dev8d tag in them published via google docs, and here is some derived things posted by Tony Hirst(@psychemedia) and Chris Wilper(@cwilper) seconds after I posted this:

Tagcloud of twitterer's: [java needed]

Tagcloud of tweeted words: [java needed]

And a column of all the tweeted links:

This lead me to dig through them and republish the list of tweets, but try to unminimise the urls and try to grab the <title> tag of the html page it goes to, which you can find here:

(Which incidently, lead me to spot that there was one link to "YouTube - Rick Astley - Never Gonna Give You Up" which means the hacking was all worthwhile :))

Graphing Happyness

For one, I've re-analysed the happyness tweets and posted up the following:
It is easier to understand the averages as graphs over time of course! You could also use Tony Hirst's excellent write up here about creating graphs from google forms and spreadsheets. I'm having issues embedding the google timeline widget here, so you'll have to make do with static graphs.

Average happyness over the course of the event - all tweets counted towards the average.

Average happyness, but only the previous 10 tweets counted towards the average making it more reflective of the happyness at that time.

If you are wondering about the first dip, that was when we all tried to break Sam's tracker by sending it bad data, a lot of 0 happyness's were recorded therefore :) As for the second dip, well, you can see that from the log of happyness, yourselves :)


Christopher Gutteridge said...

I ran into an interesting problem. For some reason, at some point, my twitter account has been flagged as a possible spam, maybe, it's not really clear to me.

The result of this is that I don't show up in searches of twitter so all my comments were arbitrarily absent, and my happiness had no effect on the average.

It's an interesting issue, that I was disenfranchised by a disinterested service provider, with no obvious means to undo it without creating a new account.

I've had google pull my adwords service from a site, a few years back, also with no adequate explanation. Such brutal decisions are the only cost-effective way to police a free service, but if Google pulled my account I would lose much data (calendar, some google docs) and my G1 phone uses my google account.

I'd be dead in a week...

Anonymous said...

potentially a very good reason why we should all jump to OSS <- what's position on this stuff? I mean google has more servers filled with spam than anything else right? Or maybe you really just are that dogy Chris! 8D !!!

Mia Ridge said...

One side-effect of using the event tag for things like the happiness rating and equipment survey - the signal:noise ratio for the #dev8d tag changed. I found it a lot harder to find the content tweets and probably missed some good stuff - maybe another time variations could be used for the off-shoots?

Ben O'Steen said...

@mia one thing I was considering was an anti-tag - e.g. #!dev8d - so that searches for 'dev8d' would hit it, but '#dev8d' shouldn't.

The other tweak to mention is that booleans work on the twitter search:

'#dev8d -from:randomdev8d' would get all the #dev8d posts, but exclude those from randomdev8d.

Likewise, to get all the replies to a person, you can search for 'to:username', handy to track people responding to a person.