Questions Nobody Asked..: September 2010

Wednesday, September 29, 2010

The Internet Vs. Anti-Piracy

Okay, so this started a few weeks back, when anti-piracy firm AiPlex Software were hire to have torrents of some Bollywood films taken down from indexing/tracking sites (including The Pirate Bay) - which don't host infringing files, but provide links to them.

They made their requests, and when the sites in question didn't comply, AiPlex went a step further than any other anti-piracy company had, and DDoSed the sites (temporarily).

Now setting aside for a second the fact that DDoSing is illegal, this was not the wisest move they could've made.

Anonymous

"This is unacceptable to Anonymous. The time has come to show these fuckers that we will not tolerate this."

"Operation Payback is a Bitch" was devised as a means of retalitation by Anonymous, in close association with infamous image-board site, 4chan (among others).

Anonymous are an interesting group. As internet activists, they may well stand for justice, but their definition of justice isn't always in line with, say, the law's - it's very vigilantist. The important thing to keep in mind though, is that Anonymous is akin to a flash mob, and therefore any given 'mob' is going to vary in membership, and even overall ideology, etc.

But in general, Anonymous will act in favour of whatever gives them lulz, or makes a lamentable person suffer.

This can be seen in their attacks on Scientology, on Sarah Palin, their tracking down of Dusty the cat's tormentor and cat-bin-lady or, perhaps most unsettling of all, their tormenting of an 11 year old girl (who, to be fair, probably had it coming). To name but a few.

Long story short, anyone who knows anything about 4chan and Anonymous will know that no sane person would ever risk incurring their wrath.

DDoS

Denial of Service occurs when a server is overloaded with requests. When this happens, the server in question will either slow to a crawl, and eventually become unavailable to new requests. Or, the server will reset itself. In cases where the site is hosted on the same servers a company's email, backups, assorted storage, etc. this can be quite inconveniencing.

NB/ Facebook's recent outage was an example of a self-induced denial of service.

A Distributed Denial of Service attack (DDoS) occurs when a group of individuals use a program - such as the Low Orbit Ion Cannon (LOIC) - to make hundreds, or even thousands of requests per second.

In the case of the Australian anti-copyright firm AFACT, their site was hosted on a cluster server, so when that site was taken out, all (supposedly) 8,000 other sites on the cluster went down as well, including small business and government websites.

In general, DDoSing is generally more useful in making a statement, since the result is usually only temporary. But sometimes.. well, we'll get to that...

Again, this was simply overwhelming a server by flooding it with requests. This was done mostly by Script Kiddies running LOICs - which is as simple as entering the target IP and hitting "IMMA CHARGIN MAH LAZER".

To call this hacking is being far too generous; and from a legal point of view, inaccurate.

Operation Payback

A call to arms was posted on 4chan, and various other sites, demanding retaliation.

This started with revenge upon AiPlex, along with MPAA and RIAA. Then they went after notorious UK anti-piracy law firm and all-round bastards, ACS:Law, who have been guilty of sending out vast numbers of 'menacing' letters demanding money for copyright infringement.

The first attack on them took the site down for a couple of hours.

Head of the company, Andrew Crossley, made the remark:

"It was only down for a few hours. I have far more concern over the fact of my train turning up 10 minutes late or having to queue for a coffee than them wasting my time with this sort of rubbish."

Now I don't know about you, but antagonising your attackers seems like a poor move. Anonymous attacked again, this time knocking out the server and causing it to reset.

But here's what happened this time:

“Their site came back online [after the DDoS attack] – and on their frontpage was accidentally a backup file of the whole website (default directory listing, their site was empty), including emails and passwords,”

Basically, instead of showing the homepage (as it should have), it showed the file directory, on which was found a zip file containing all the emails, etc., all unencrypted.

Unsurprisingly, the backups were then put of The Pirate Bay, where they've been shared by lots and lots of people, out for revenge.

As of this time, there have been no reports of victim's sensitive data being used maliciously. Rather, downloaders are more interested in destroying ACS. And, in fact, some of the people who have downloaded the emails have tried to contact and alert the victims whose personal information has been exposed.

Oh, and attacks on various other anti-piracy websites are still on going. See here for more information or to participate.

Data Protection Breach

First of all, there's the personal information that's been leaked, including that of thousands of Bskyb customers. This is the part of the story being reported by various outlets.

And if anything, it's that which will get ACS in trouble, since this is a fairly major breach of the Data Protection Act - given that the data was unencrypted and was revealed by a clumsy server, rather than hacking (and PI acknowledge this wasn't hacking).

Word is, the punishment will be around £500,000 worth of fine, plus disciplinary action from the Solicitor Regulation Authority - and this wouldn't be the first time they'd faced disciplinary action from the SRA. So even if the company isn't completely destroyed by all this, they'll still have to pay up more than twice their takings from 'infringers' (~£220,000) - and Crossley might just have to give up that Jeep Compass 2.4CVT he bragged about.

Which is nice. But there was other interesting stuff in there.

Money

It is demonstrably true that ACS cares more about making money for itself than enforcing copyright or protecting artists.

The emails show that ACS were taking approximately 50% of the money retrieved from those accused. And in fact, only about 30% of the money retrieved went to the copyright owners. Which, to me, seems a bit off, in terms of fighting for the rights owners.

No. ACS:Law have jumped on copyright infringement as a way to make a quick buck for themselves, and frankly they deserve everything they get.

A lot of this has been classed as "legal blackmail". We see letters and emails from a vast number of people who, quite obviously, have been wrongfully accused - including old people who are very confused by the claims - that are still paying up because they don't want to be taken to court. And in a lot of the cases the victims having to ask if they can pay in installments, since they can't afford to pay the lump sum.

NB/ Consumer group Which?, local councils and judges (amongst others) have reported receiving large numbers of complaints from people who have been harassed by ACS.

From what I understand, the victims are being accused of infringing individual (porn) movies or individual songs, and in each case ACS are offering a settlement payment of £495 - just below the "psychological barrier" of £500.

This being the 'claimed' damages resulting from sharing one movie - therefore implying that each infringer shared the move with, on average, ~49 people. How do they justify this figure? They don't, and indeed can't (see below).

£495 is also low enough that it wouldn't be worth an accused individual disputing the claim in court, given the legal fees would be much higher than that amount - supposedly around £10,000.

IPs and ISPs

In terms of tracking down file-sharers, ACS pays 'monitoring companies', who will find by various methods, what IP address is sharing a given file, and at what time. ACS then contacts ISPs asking for the physical addresses (supposedly) linked with the 'infringing' IPs.

The main ISP that's gotten very upset by the email leak is BskyB, though there are others, including BT and PlusNet.

ISPs Virgin Media and TalkTalk, on the other hand, have both refused to give out such information.

To quote a commenter on Slashdot:

"ACS:Law were using Norwich Pharmacal civil orders against the ISPs; they basically demand information relevant to a future court case from a third party, in this case the ISP. Sky broadband chose not to contest these court orders, and just supinely handed over the data. Nor did they notify their subscribers that such an order was taking place, so they could fight it if they chose.

In fact, ACS:Law were combining these requests into huge tranches of data - one such recent was 25,000 BT Broadband IP addresses, expected to ID 15,000 subscribers.

Virgin and Talk Talk refused to go along with these orders without a fight - potentially forcing ACS:Law to do a Norwich Pharmacal order per individual IP, which would be ruinously expensive - so the leaked emails reveal that ACS:Law specifically did not target them."

If only other ISPs had the balls to say no as well...

The Revelation

If you'd seen the numbers, you'd be wondering why ACS don't persue those who dispute the infringement claims, or just out right ignore them. In fact, only 30% of those sent letters settle and pay the £495.

The reason - ACS are aware of how flimsy their case would be, and how easy it would be to contest the claim of infringement.

So their problem is two-fold. First, an IP address isn't always very useful. First of all, IP addresses can be spoofed, or a smart user can hide behind a proxy - both meaning the IP address obtain will not match the one at the physical address of the infringer.

Or conversely, if the owner of a given IP address uses an unsecured wi-fi connection, for example, then again someone's going to get wrongfully accused.

But secondly - and this really is interesting - in one email, Crossley's own legal adviser says the following:

"establishing damages beyond the value of the gross profit of one copy of the work is problematic."

Basically, because of the way the way the monitoring works, they can only prove that the 'infringer' was downloading a copy of the infringing file to their computer.

In other words - while they can assume a user was 'sharing' the file, they can't prove prove it. And they certainly can determine the number of people (if any) the user in question was 'sharing' with.

A law firm could (and do) demand greater damages. But if an infringer could prove, for example, that they 'leached' the file - that is, downloading without uploading (sharing) to other users - then the only damages they can claim is for one copy of the file.

To Quote Ars Technica*:

"under UK law, damages are fixed at "economic loss, either realised or potential." When it comes to music tracks, the loss equals "the approximate market value of the track as a single download—79p.""

They would have been fools to take anyone to court, knowing that a defendant could potentially pull that defense, considering how the legal fees would far out-weight that pay-off. And on top of that, if they lose one case, they'd no longer be able to demand £495 - the end of their little extortion business.

Warning

Don't think this renders pirating essentially risk-free. It doesn't. Especially not if you're in America.

Since, in America, copyright holders don't need to prove actual economic harm to demand (outrageously large) statutory damages, the RIAA and MPAA are still milking (alleged) infringers for every last penny they have - and then some.

NB/ The RIAA even tried to sue a deceased woman, who didn't even own a computer while alive(!).

But if you are ever accused of infringement and believe it to be wrongful, tell them so. If you need more advice, visit Being Threatened.

If, on the other hand, the accusations are true, use your best judgement. If you think the settlement is just, pay it. If not, you can either try ignoring it, or seek legal advise in so far as disputing the amount demanded.

And in fact, if you're going to pay up the value of the files you downloaded, you may as well do so by buying physical copies, rather than giving the likes of ACS ~50% of that money.

But if you are summoned to court, you'd better show up AND get yourself a good solicitor, or you could find yourself paying up to the thousands.

Or, you know, you could just not do it in the first place..

Oatzy.

* This quote and lots more details about what bastards ACS:Law are can be found here.

Tuesday, September 21, 2010

"Your Album"

Here's how how the game works:

1. Go to Flickr and sort by last seven days. Pick the fifth image that shows up. This is your album art.

2. Go to Wikipedia and click random article. This is your band name.

3. Go to Wikiquote and click random page. Use the last 3-5 words of the first quote. This is your album title.

4. Assemble your album.

Here's mine :

[Flickr image source]

Give it a go!

Oatzy.

[edit] One courtesy of Annie here.

The Twitter Virus

Background

So here's what's happened - Twitter has been 'hacked'. Now personally, I don't like such ambiguous use of the word hacked, since it tends to imply that some sort of infiltration or breaking in has taken place. But that's just me.

No. What actually happened is that someone discovered that URLs can be posted that include JavaScript (and in particular the "onMouseOver" function), which was executed when you hovered over said link. Then the Script Kiddies got their hands on it, and all hell broke loose

This is an example of 'code injection' or 'cross-site scripting'- that is, code can be posted to a website - be it by a comment, a status update or whatever - and the site will execute it as if it were part of the site's own code.

For most sites, it's not possible, because comments, etc. are 'sanitised' so that code like this is removed or just displayed as plain-text, so it can't be executed. For example, on Den of Geek they strip away all HTML tags from comments, which has the drawback of disallowing formatting, but gets the job done. But these things can slip through the net from time to time.

One other recent example was on YouTube. YouTube would normally validate it's code by stripping away the "<script>" tag which should have prevented the problem. Except, someone realised that if you start the comment with "<script><script>", only the first tag is stripped away, so the code is still executed.

And this was used to cause all sorts of havoc - redirecting people to porn sites, or adding banners and pop-ups to videos (mostly on Justin Bieber videos), and so on.

In YouTube's case it took about an hour to spot the problem and two more to fix it (apparently). And apparently Twitter has been fixed now (approx. 2hours later). So kudos to them.

The Code

What's interesting about this is how involuntary activating it can be. All you have to do is hover over the link to execute it. Which, to be fair, is both simple and elegant (regardless of how inelegant the code itself looks).

The general form looks like this:

[some URL]/@"onmouseover="[some javascript]

In it's simplest form, the code might be something like this (via Sophos.com):

Which only uses the 'alert' function to create an annoying pop-up with some random message.

As for code that will redirect you to some other site, I can't find an example, but one way of doing it might take the general form but add something like:

window.location.href = [redirect URL]

Which is all very straight forward, very annoying and fairly boring. There's probably other havoc you can reek that will temporarily redesign a person's homepage, graffiti it, or whatever.

The Virus

Now these are personal favourites - the self-retweeting tweets. Literally a self-replicating Twitter status virus.

Here's some example code:

They both basically do the same thing, though the latter does it more elegantly.

The first one finds the 'first text area' - i.e. the status input box - fills it with it's own URL - this.innerHTML - hits the update button for you - ('.status-update-form').submit() - and then darkens the screen with modal-overlay which means that the page itself is unreachable (without reloading) and clicking anywhere reactivate the exploit.

[Yeah, I wanted to see what it did :p]

For the second one, it gets the element with the tag Id="status". Here's a section of Twitter source code:

So that would be the status input box as with the first example. It's just a nicer way of going about it, in my opinion. But anyway.

And in this case, rather than just straight copying itself, it actually RTs the named user's last tweet (usually the code itself), which again is a nicer way of doing it. And it gives the original poster the credit they deserve (albeit with the potential risk of having their account suspended).

Video of the exploit in action here.

Another variant blacks out the actual status, like so:

And what that means is (a) you don't know what's going on under there, and (b) your curiosity is more likely to get the better of you.

The code is the same as the first of the above, except instead of the the modal-overlay you have "style="color:#000;background:#000; - basically, 'make the status black text on black background'.

One Last Thing

Obviously, the exploit wasn't limited to "onMouseOver". Any JavaScript could've been used (so long as it was 140 characters or less). But nonetheless, the code would need a trigger, and mouseover was one of the best ways of doing it. Others apparently managed to make it activate by moving the cursor anywhere on screen. So yeah.

Personally I'd be interested to see how far this spread. And indeed, given the 'contagiousness' of it, it'd be quite useful for modelling how this - and indeed everything else - spread through Twitter (as I've previously talked about).

Again, a great part of this was the celebs and the connectors; including an early victim, Sarah Brown (Gordon's wife), who has over 1mil followers. And as I said, passing it on isn't as voluntary as RTing.

Why should we have been worried? Imagine the blackout version of the above, but with an added redirect to some malicious site. Yeah. Not that that should be a problem anymore.

I was going to say, if you want to have a play, learn a little JavaScript and have a go. But I guess you can't now. Shame.

Wonder if anyone's tried XSS on Facebook yet...

[Update]

The exploit has definitely been nullified. Twitter explain what happened here.

One of the discoverers of the exploit seems to be @Zzap [source], who had no malicious intent for it. More curiosity. And he, himself, was inspired by the now suspended "RainbowTwtr" - screenshot of how they used it here.

This guy from Kaspersky gives a brief analysis, including a graph of the exploit's growth over time (reaching 93 tweets per second at it's peak). ThreatPost also has some analysis.

The exploitative tweets themselves don't seem to have been deleted by Twitter, but mouseover now does nothing (other than what a link is mean to do).

Anyway. As you were..

Oatzy.

[I'll be honest, I'm not an expert of any sort on JavaScript. But I know enough to get by.]

Wednesday, September 08, 2010

The Exact Change Problem

[Reblog: Originally titled 'Pimp My Change']

There's an old problem in discrete mathematics, known as the Subset Sum Problem which goes as follows:

Given a set of integers, does the sum of some non-empty subset equal exactly zero? For example, given the set { −7, −3, −2, 5, 8}, the answer is YES because the subset { −3, −2, 5} sums to zero. [wiki]

Or in another context, imagine you're at a restaurant and for some unknown and perverse reason, you want the total cost of your meal to add up to, say, £10. The problem is to pick items from the menu so as to satisfy this condition.

In terms of computer science, this problem is NP-Complete; that is as the problem gets bigger the time taken to find a solution increases non-polynomially (e.g. exponentially).

In other words, if you wanna solve it (even with a computer) you've gotta be willing to wait several thousand years for an answer.

So lets now make the problem slightly different:

Given a certain monetary note [£5/10/20], find a subset of items in a shop such that the change given contains, in exact change, £1.30.

But, there does hide in this seemingly equivalent problem a few interesting caveats:

1) The change given needn't be exactly £1.30

2) If the change isn't exactly £1.30, you have to account for the fact change will usually be given in the smallest number of coins - e.g. £2 is more likely to be given as a £2 coin rather than two £1 coins (or any other arrangement).

3) By taking into account change you already have, the problem is slightly altered. But the solution is found in essentially the same way.

So why is any of this important?

While I'm waiting for my awesome new all access special pass to be delivered I have to pay for the bus, meaning £1.30 both ways on a bus that accepts exact change only.

Of course by making certain assumptions, as those above, it's fairly easy to work out, given a starting amount (a), how much you need to spend (s) to get £1.30 exactly. The hard part is finding items such that (a - s) = £1.30.

This is the essence of the Subset Sum Problem.

But in this instance, there is that loophole that the change needn't be exactly £1.30 - it just has to contain that amount.

So, for example, lets say we start with a £5 note. And lets say, since change will likely be given in the smallest number of coins, our change needs to contain at least one of each of 10p, 20p and £1 coins.

To get 10p & 20p - we need to buy something with a pence value in 11-20p or 61-70p

To get £1 - given the 30p will be taken from one of the five pounds, we have £4 to play with. So we need to buy something with a pound value of £1/3

For other values of c and a, the solutions to the above are relatively similar.

So from the above - if we start with £5 - we have a range of 40 (=(10+10) x 2) possible sums to add up to, cutting the problem down to a more manageable size.

So we don't have an exact solution, but a simplification to the problem instead. And in most cases that's good enough.

But one other thing I should perhaps mention, is that the change needn't all be collected in one transaction - i.e. it can be collected in smaller parts, such as 5p s or 50p s, etc.

Of course, I could just give the smallest amount I have exceeding £1.30. But that would mean giving away money unnecessarily. Or, you know, I could ask a cashier kindly to change the money for me... I dunno.

[Follow Up]

Firstly, if you're on campus, a quick and dirty solution to the change problem is to go to CostCutter and buy a Ginsters Tortilla Wrap [your choice of filling] at a cost of £3.19. A total rip-off, I know.

But the change will be £1.81, most likely given as £1, 50p, 20p, 10p and 1p. And this price lays within the range outlined in the previous blog. Simple.

Another thing to note, whilst playing this 'game', is that you're restricted in what you buy by what you like, what you're willing to spend to get the change and what it would be insane to buy just for the change - i.e. vast numbers of really small items like 1p sweets (if such things still exist).

And conversely, what you choose to buy may also affect what else you buy, if anything. So for example, if you bought the tortilla wrap, you may be weary about getting anything else, in case it ruined the change.

And one final thing to note is that if you plan to use the bus two or more times in one day, it may in fact be easier to get a day-rider, at £2.60, given that £2 is easier to get than single £1s and 60p is generally easier to get than two lots of 30p.

So yeah, just a little something to ponder. Hope it was easy enough to follow.

Till next time,

Oatzy.

Tuesday, September 07, 2010

Some Random Infographics

This is just a collection, as the title suggests, of random 'infographics' that I'm made at some point or other, that don't warrant blogs on their own. Enjoy.

The X-Factor Volcano

Using TwitScoop I got this graph of tweet volume for last Saturday (times are approximate)

The show starts at 7:45, around that dip before the main 'mountain'. And probably ends around the back end of that

Would you..?

This was mostly just playing with a type of graph on ManyEyes - graphing (some of) the lyrics of Hero by Enrique Igelsias

Student Loans (So Far)

This basically graphs what I owe already, from two years of university study

The big dark blue are tuition fees; the light blue, maintenance loans. And the little, medium blue circles - interest!

Total? Nearly £15,000! Joy.

Write Her a Song

I got a list from some website, of songs about girls (where a first name is explicitly given). I counted occurrences of names and made this word cloud of popular names

So Mary, Jane, Maria, Jenny and Sally are most popular names.

Obviously, this isn't comprehensive, since I only used a relatively small sample of songs. But you get the idea.

The Big Bang

My favourite. And this is the Doctor Who episode we're talking about (season 5 finale).

Basically, I plotted out the character's timelines (a la XKCD), using a website I can't seem to find, in an attempt to make the plot clearer. It didn't turn out exactly as I wanted because the web-app had a mind of it's own. But I'm still happy with it.

(click to enlarge)

And that's you lot. Til next time,

Oatzy.

Spread Through a Network

As a sort of quick follow-up to yesterday's post. I've re-edited the graph from last time to demonstrate how a 'virus' might spread through that network.

For this, we assume a 0.5 contact rate - that is, a person in contact with an infective has a 50:50 chance of becoming infected in the next 'turn'. An Infective will recover after one turn.

In the diagrams, then, red circle means infected; orange circle means exposed and able to catch the infection; blue circle means recovered and not able to catch the infection again. White circles are susceptible to infection, but not exposed to it.

So in this case, that all important central connector gets infected (eventually) and as a result, 73% of the total network gets infected.

If it hadn't, only 40% of the network would've been infected - 53% if all the nodes on that side were infected.

Of course, this is only one possible course for the infection. And if the contact rate were lower, it probably wouldn't have spread through as much of the network.

Anyway. It's just a nice way of showing how modelling epidemics on an asymmetrical network works.

Good.

Oatzy.

Monday, September 06, 2010

Social Viruses

In online social networks, ideas can spread between users like viruses. Literally. And this means that mathematical models of epidemics can be applied directly to the spread of memes, tweets/RTs, links, news, viral marketing, etc.

25 Random Things

The first example is the "25 things" meme that was going around Facebook a short while back. How it worked was, a user would write `25 random things` about themself then 'tag' 25 friends.

In the wake of this phenomenon, Slate.com took a survey of a number of users, asking them when they first encountered the meme and if and when they participated.

This information was then sent to a biology professor, who used it to model the spread of the meme through Facebook, using the (previously mentioned) SIR model.

Now what's particularly special and nice about the "25 things" is that, because of the tagging, the contact rate is 25 - i.e. each 'infected' individual tried to pass the 'infection' on to 25 other people.

And what the modelling found was that on average 1.27 of those 25 people would become 'infected' themselves. Which is a nice example and a nice demonstration of how the spread of memes resembles the spread of viruses.

Brighten Up Your Day..

..by moving at least one of Tony Blair's books to the crime section in your local bookshop

In particular here, I'm looking at the way tweets and memes spread through Twitter - such as the "Crime-Section Blair" and the "Status Copy".

One of the main differences between these two is that one was spread (primarily) by RTing, while the other was spread by direct copy/pasting. But for all intents and purposes, that difference doesn't have a significant effect on spreading.

The key difference between Twitter and Facebook is to do with contact rate and clustering.

In the "25 things" meme, the contact rate was a constant 25. But even in general Facebook users are, on average, in contact with about 150 people (friends). And generally, there's not that much variation from this average.

With Twitter, on the other hand, the people you might come into contact with varies WIDELY.

I tried looking up figures for average number of followers per user, but couldn't find one definitive answer. For demonstrative purposes, I'm going with the one from the Guardian - 126.

Now I currently have about 100 followers, which is close to the 'average'. Stephen Fry, on the other hand, has 1,756,00 followers - 5 orders of magnitude larger, and a significant attenuation from the 'average'. And this degree of variation among users has a massive effect when trying to model spread on Twitter.

An Analogy

So take, for example, some air-borne virus - say, flu.

If 'patient zero' stays relatively contained - equivalent of staying at home - the only people likely to catch the virus are those immediately around them.

If on the other hand 'patient zero' goes to a mass-populated area - say a shopping centre, or a hospital - the virus will find more victims. And those victims will spread out and pass it on to more people, and the virus will spread more throughout the general population - an epidemic.

In this analogy then, the 'house' is equivalent to a relatively small and contained network of friends - for example, me and my small group of followers. A tweet is unlikely to spread much outside of my network.

Celebrities, on the other hand are the man in the shopping centre, or a 'care-giver' at a hospital - they expose the 'virus' to a large number of people. And anyone who's then 'infected' will carry that on to their own network. And then it may even spread out more from there.

So for a 'tweet epidemic', a tweet/RT has to come into contact with one of these mass connectors (as previously discussed).

Sometimes I Just Want To Copy Someone Else's Status,..

..Word for Word, and See If They Notice.

The major different between the "Blair" meme and the "Copy" meme, is that in the former, 'patient zero' was a 'connector' - highly influential with lots of followers. So it's not that surprising that it spread.

[And it was also funny, which made it more 'contagious'.]

The problem with the "Copy" meme is that, by it's nature, it's harder to trace back to it's original source. Know Your Meme cites the first appearance* as being on August 26th by @Tim_Waters - a Leeds man with 507 followers. Which isn't a significant number.

But not long after his initial tweet, it reached @elspethjane (3,100 followers) and a bit later Dolly Parton (722,000 followers) - so by this point it had reached epidemic, and was spreading like wildfire.

If you're a frequent Twitter (or even Facebook) user, no doubt you saw it at least once.

Anyway, all this complicates things - trying to model if and how a tweet will reach a connector. One could assume that in the case of the "Copy" meme, this came down to it's high 'contagiousness' - that it survived long enough to spread to one or more connectors.

Visual Demonstration

These are based on a previous graph of my Twitter follows. For this, we pretend that this small network is completely isolated.

In this first one, the red circle in the middle posts a tweet. The purple circles are the people who see that tweet:

So the tweet is already reaching 7 people. Now if the node labelled 2 ReTweets, then only one more person will see it (total audience 8).

If node 1 RTs, then the tweet reaches an extra 3 people (10 people total). So in this case, 1 is more of a connector than 2.

For a better example of the importance of connectors, we have the below:

Again, red tweets, purple are audience. So in this case, there's only an audience of 2. And if the node at the bottom RTs, it won't matter.

If the circled node RTs, though, the tweet will reach 2 more people - one of whom is a mass-connector, who could spread it to an additional 5 people, which could lead to it spreading even further.

So ultimately, the number of people following you is only as important as who those people are. The red circle may only have 2 'followers', but via the circled node, they could still reach all or most of the entire population, with little difficulty.

Modelling Spread

So it comes down to this - on Facebook, information or memes will spread fairly uniformly, following the SIR model.

And at it's most basic level, the spread of tweets can be sufficiently modeled by the SIR approach as well. Especially if the tweet doesn't encounter a mass-connector, and instead spreads through a population whose followers count is sufficiently close to that ~150 average.

For example, if I post something worth RTing, at best it's going to go maybe 2 generations out. And this could be described by the SIR equations. Though given degree of spread, it'd hardly be worth it.

The limitation of the SIR equations though, is that it assumes homogenous mixing of susceptible and infected - i.e. that everyone in the population has equal odds of being infected. But as with the analogy, on Twitter you get more clustering, so that's not necessarily the case.

Instead, you have to use this approach, where you plot out a situation-specific network graph of the system - like the above diagrams - and work from that. And given that Twitter has in excess of 4 million users, that's a little beyond my resources and inclination**.

But basically, if a tweet hits a connector, it will spread like swine-flu hysteria.

Oatzy.

* TechCrunch found an earlier example on the 19th of August - a Bieber fan with 809 followers. It's hard to say if Tim wrote the (almost identical) tweet independent of her, or if she was the true 'patient zero'.

** This guy crawled Twitter and got connections for most of the users (at the time of crawling). You can find this data, and his analysis, here.

Sunday, September 05, 2010

Six Degrees, and How I Met You

In his book, "The Tipping Point", Malcolm Gladwell outlines the three key factors that will create the tipping point for an epidemic - the mass spread of something; whether it be virulent, informational or whatever.

The first of these factors is the Law of Few - the idea that a small proportion of a population will have a disproportionately large effect on it.

In discussing this, Gladwell brings up Stanley Milgram's "Small World Experiment" - i.e. the Six Degrees of Separation phenomenon.

Intuition tends to lead us to believe that it can't possibly be so that everyone can be linked by so few connection. But in fact, what you find is that what makes the six-degrees possible is the existence of 'connectors' - a (relatively) small collection of people who are connected to a disproportionately large number of people.

And it's because of these connectors that we can find short routes (six degrees or less) between any two individuals - despite how disconnected they may seem.

In the case of Twitter, if we assume that a connection between a pair of individuals only has to be one way, the degree of separation between users is surprisingly small.

"the average path length is 4.12 with 93.5% of people within 5 or fewer hops of everyone else" [source]

This is almost entirely on account of celebrities. I mean, over 5million people have a maximum separation of degree 2, thanks to being followers of Bieber. But the less said about him the better.

The Point is, celebrities ruin everything.

To demonstrate the "Law of Few", Gladwell offers the following exercise - list 40 of your closest friends. Work backwards through them to determine through whom you met each of them.

What you will tend to find is that a majority of those links pass through 1 (or a very small number) of those friends - the connectors.

Obviously, I had to try this for myself. I decided to go with my Twitter follows; it follows on from previous posts, I've already done a lot of the leg work in those posts, and because you people are likely to care more if you're involved.

So I tried to recall through whom I met everyone (some before or outside of Twitter). I'll admit, it may not be perfect - my memory is only so good.

[Edit] - Interactive ManyEyes version here

So for example: I met Alex in primary school. Through him I met Sally, and through Sally I met Amy. Through Amy I met Joe (PkmnTrainerJ), and through Joe I met Aerliss, CatfoodJackson and ZeRootOfAllEvil.

It's also important to note that this ignores how other people in my network were introduced to each other, and ignores anyone that I may have introduced to someone else.

For an example of the first point: to the best of my knowledge Alex met Sally through his then girlfriend Lottie (though I don't know how they each met Lottie). Sally met Amy through George (and again, I'm not entirely sure how they each met George). Amy met Joe on Fanpop. And I have no idea how Joe met everyone else. But feel free to fill in any gaps.

And for an example of the second point: I met Craig (shinelikestars6) randomly on Twitter (to the best of my recollection), and introduced him to Sally.

So it's a complex network overall. But all that matters for the above is how I met everyone.

So you should be able to see from the graph that the major connector is Amy - being responsible for me meeting 9 of the people in this network. Joe would get second place, and joint third would go to Andrew (abooth202) and Craig.

And for all I know, other people on the graph may be connectors for other people - this would possibly be reflected in their follows/followers numbers. They're just not represented as such on my graph.

And while I may appear to be a connector myself, that's more to do with the fact this graph is centred on me. At most, I've introduced maybe two pairs of people, if that.

But going back to the six degrees, everyone in my network is connected to each other by at most two degrees of separation (by way of me) - despite the fact that some of them may not be aware of some of the others' existence.

And a good demonstration of this within my network is that missgiggly in Australia is only 2 degrees away from Aerliss in Scotland. And as a side note, I came to know both of them mostly by virtue of us all being Doctor Who fans.

So as Gladwell summarizes:

"These people who link us up with the world, who introduce us to our social circles - these people on whom we rely more heavily than we realize - are Connectors, people with a special gift for bringing the world together."

For more on the subject, read his book. I'd recommend it.

And if you want to make your own graph and would like help, feel free to ask.

Oatzy.

Saturday, September 04, 2010

Forced Trending Topics

Case Study: #MyChemicalRomance

Origin

So I tried to work out why this was trending. The reasons were most likely to be related to them releasing two new songs and one of the band members having a birthday recently.

But there's very little talk of these, specifically. Instead, what you see a lot of is people tweeting along the lines of "get #MyChemicalRomance trending", "#MyChemicalRomance is trending!", and so on.

I traced this type of tweet back as far as I could and found this posted by @MCR_FANS:

@DantyGeewayMCR has 53 followers, so realistically, she was unlikely to get it trending by just tweeting "lets get #MyChemicalRomance trending"

Instead @DantyGeewayMCR gave @MCR_FANS the idea through direct contact with them. And as you would expect, they supported the cause and tweeted the above.

Spread

With 2,483 followers, all of whom you'd expect to be My Chemical Romance fans, @MCR_FANS acts as a sort of central 'connector'* for the spread of the idea. And if you look through their tweets you see how and why they were such a mass propagator of this 'attack' - determined to get it trending to an obsessive and almost troubling degree.

And on top of that, while those two and a half thousand followers would possibly be enough on their own, it's likely that the followers have followers (who don't also follow @MCR_FANS), that are also MCR fans - so the idea basically spreads outwards from this connector until it reach a significant number of fans.

And then, as you would expect from fans, they did their very best to get #MyChemicalRomance trending around the world. And as further proof you should never under-estimate the determination of fans, they did it!

Incidentally, it's probably similar occurrences that got 'Justin Bieber', almost perpetually trending. Or at least, until Twitter tweaked the 'TT' algorithm to stop trends persisting beyond a certain period of time (to keep the TTs 'fresh'). This, unfortunately, led to the 'Bustin Jieber' approach, and other derivatives - but you've got to hand it to these kids, they're not completely daft. Sadly.

Twitter's Rules

Now, what's also interesting about that original tweet is the "Just ONCE IN A TWEET!" part. This is because Twitter has put in place certain methods for improving quality/filtering out crap in search results.

In the case of using the same tag multiple times in the same tweet, these kinds of tweets are treated as spam, so are filtered and not shown in search results or counted towards TTs.

More on this and other things that will get you filtered (or even suspended!) can be found here, here and here. TL;DR Twitter don't tolerate spamming, abusing TTs, or using TTs to beg for follows.

It's also further proof that these people know what they're doing, or need to do to get something trending.

Incidental Tweets

What all this demonstrates is that there may be no (real) reason for something to be trending, other than because people want it to be. The songs and the birthday were likely just the sparks that gave birth to the idea, but not the actual source.

So when something like this - or indeed, most things - start trending, you start to get unforced uses of the tag and general remarks about the topic - "OMG #MyChemicalRomance trending!! <3", "lol as if I used to be a #MyChemicalRomance listening emo kid" et al. Which adds to the popularity of the tag.

It also, sadly, leads to a lot of "why is [x] trending?" type tweets. And this only adds fuel to the fire and further obscures the origin of the trend - leading to more 'WTF', and so on and so forth until people lose interest and the trend starts to die away.

Top Tweets

Another thing to look out for are "Top Tweets". These are tweets with a significant number of RTs - therefore considered popular. I don't know what the lower limit is though.

In this case, one such top tweet was from @AmyLovesMCR. And what's interesting about this is that she got to being a top tweet, despite her relatively modest 280 followers. What was her Top Tweet?

A fairly uninformative tweet, and one that leads you to wonder why it got so many RTs. Or maybe I just don't understand that level of fandom. But like with the 'trend origin' tweet, it most likely comes down to a RT through a connector - possibly even @MCR_FANS again.

In fact, none of the top tweets are particularly informative as to why #MyChemicalRomance was trending. This isn't always the case, but more often than not, it is. And again, this sort of thing isn't exactly helpful and leads to more confusion and 'WTF tweets'. But that's not really Twitter's fault.

A Useless Graph

Finally, here's the graph of #MyChemicalRomance [courtesy of Trendistic]

Notice, though, that while September 2nd is the date of the 'origin' tweet, the trend doesn't seem to appear until the 4th.

And the fact that it's basically non-existent until after midnight on the 4th leads me to believe that that's just because the data the graph's based on only goes back that far, which would make the graph fairly useless. But I could be wrong.

Summary, and How To Force a Trend

So if all of this has taught me anything, it's that when aggressive fandom meets social media, the results can be potentially dangerous.

And more so, it's a troubling forewarning for what a determined, politically motivated group could achieve - an example of this is already happening on Digg. And frankly, I think we should all fear it greatly. But I'm probably just paranoid. Maybe.

So if you want to get something trending, the most important step is get it to a connector that will support your cause and expose it to a wider audience of supporters.

But when choosing your connector, don't aim too high - Stephen Fry has a massive reach, but is unlikely to help/RT unless it's a really good cause. And the bigger celebrities are unlikely to notice you anyway, since they get a lot of @replies. Ideally, you want to do like the MCR fans - aim for a relevant fan club with at least 1,000 followers.

You have to be very determined, as do all of the participants. So unless what you're trying to trend is something people are likely to get passionate about, your odds are poor. And while you're trying to force the trend, remember to play by Twitter's rules, or you'll be filtered or block.

And finally, there's no harm in being just a tiny bit inexplicable to generate a 'WTF noise' boost.

Oatzy.

* inspired by "The Tipping Point" by Malcolm Gladwell

Friday, September 03, 2010

Modelling the Zombie Apocalypse

..Or why, when the zombie apocalypse comes, we're all screwed.

Maths of Epidemics

The spread of an infection through a population can be modeled with variations on the Kermack-McKendrick Model. In this approach the population is divided into three groups:

1) Susceptible - not yet infected, but able to catch the infection
2) Infective - infected individuals, capable of passing on the infection to susceptibles
3) Removed - after some period of time, infectives will recover (or die). These are usually considered immune to catching the infection again.

You then have a system of differential equations to describe how these groups vary over time, and hence how the infection spreads.

How do you model a Zombie Apocalypse?

In the paper "WHEN ZOMBIES ATTACK!: MATHEMATICAL MODELLING OF AN OUTBREAK OF ZOMBIE INFECTION", mathematicians at Carleton University modified this model in various ways. In the simplest form, they replaced 'infective' with 'zombie' and allowed a certain proportion of the dead to be resurrected (as zombies).

They found that the ultimate outcome of that model, for all conditions, is "Doomsday" - i.e. the entire population zombified or dead.

They then modify it for a scenario where the infected are quarantined. What they found is that unless there's sufficient quarantining in the early stages of infection, it's Doomsday again; albeit a slightly delayed one. And since that sort of quarantining would be unfeasible anyway, it's not even worth trying in most cases.

The next approach was to replace quarantining with 'cure' - assuming the cure didn't also provide immunity. The result of this model suggests that a state of stable co-existence between humans and zombies could be reached.

But again, it's not ideal. You can only make so much of the cure. So the zombies would get you in the end.

The final alteration was replacing cure with strategic mass-eradication of zombies at frequent intervals. In this case, the zombie population could (eventually) be completely destroyed. But that is only if the resources for such attacks are available, and if the military (or whoever) act fast enough and often enough. Otherwise... you know... Doomsday..

To quote the article:

"In summary, a zombie outbreak is likely to lead to the collapse of civilisation [...] As seen in the movies, it is imperative that zombies are dealt with quickly, or else we are all in a great deal of trouble."

So basically, unless we can wipe out the zombies quickly, trying to fight and survive is just delaying the inevitable.

You might as well just accept your fate and get to work munching on those delicious BBRRRAAAAAIIIIINNSS!!!

Oatzy.