## Monday, December 06, 2010

### PageRank

In terms of getting some idea of a person's 'influence' on Twitter, working out (a variation on) PageRank is more effective that counting followers and followers of followers, but much less effective than using HP Labs' modified HITS algorithm.

But to count followers of followers I'd need to put in more effort than it's worth, and to do HITS I'd need to scrape an arse load of tweets to count who retweets who and how often.

PageRank

So PageRank basically measures how many people you're connected, as well as who those people are (connectors and all that). It's most associated with Google who use it as part of their search result ranking algorithm.

In this case we replace pages with users, and for links we say Alice following Bob is equivalent to page A linking to another page B.

Another Updated Graph

Working out PageRank was easier than the other two for one simple reason - I already had a lot of the data I needed, from previous network graphs. Obviously I had to update it first, and this is the graph as it stands (without me)

Which has become almost inexplicable more complex between the last one and this one. But there you go.

Results

Anyway, I followed the algebraic approach from the wiki article (looking like the easiest for what I had) and the results, in rank order, are as follows:
[When looking at the ranks, it might help to make sense of them more if you refer to the graph above.]

You have to bare in mind this particular PageRank calculation is only for my friend network, and assumes it's isolated from the rest of Twitter for simplicity. To get everyone's proper PageRank you'd have to analyse the whole of Twitter.

WTF?

The only question now is, what do the results actually mean? I'm not entirely clear on that.

In the case of websites,
PageRank is a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page
I think in the Twitter case it's more to do with how tweets spread and who in the network is more (or less) likely to see them.

If I ever work it out I'll let you know. Otherwise, feel free to offer your own explanation.

Oatzy.