Sunday, July 31, 2011

Social Posting - Part Two: We Post Together

Where were we..

So the main limit to the stripped down model in the previous post is that it only puts people into one of two states - posting, or not posting.

A more realistic model would take into account how much people are posting. The nice thing is, the change to the model is only minor - T no longer has the condition that makes its value has to be either one or zero.

The values T_i now represents the post rate of each user - that is, the number of posts posted during some arbitrary time period, n.

Here is the full, modified equation for the model


Two More Things

As you've probably noticed, we've introduced two new variables. They are:

2) Beta

Beta is a column matrix, with values represent each user's baseline post rate - that is, the rate at which a user would post if there was no-one else online - no audience, no-one to talk to.


1) Alpha

Alpha is a little more complicated. But loosely speaking, its values relate to the upper limit of each user's post rate (ignoring external effects).

It works like this: first, imagine everyone has some maximum post rate. Tbar (T with a line over the top) is the column matrix of these values. Now, suppose everyone is posting at their maximum rate. Alpha values are calculated by rearranging the model equation, thus
Any negative values in W are set to 0, since (in theory) a user has their highest post rate when any of their antagonist are not online.

What this means, is that post rates are capped by their appropriate upper limits; when all members of the population reach their maximum post rates, we find that T(n) = T(n-1) for all n - i.e. the post rates all remain constant, unless externally affected.

For technical reasons alpha is a diagonal matrix.

Alpha and Beta look like this
They won't all ways be size 4, of course. Their size will depend on the size of each particular network. The values of alpha and beta could vary over time, but for all intents and purposes, it will suffice to hold them constant.


Rainbow Graphs

Going back to the graph from last time, we might get a system equation like this
In this full model, it's not just a matter of online or offline. So instead of a series of network graphs, we represent the progression of a network as a set of line graphs.

What we find then is that the system described by this equation seems to tend towards some constant, stable point - regardless of initial conditions.

For example, with the equation described above, and starting at T(0) = [1,0,0,0] we get
And even if we start from T(0) = [100, 100, 100, 100], we get
In each case, the system converges to [15, 10, 5, 10] - Tbar - the maximum rates set when defining the variables for this system.


Interestingly, here's what happens when you set beta = 0 for all users - alpha values recalculated as [1.5, 1.11, 0.5, 1.11]
You get this weird bouncing about. It does seem like they're converging towards some fixed points. And it's converging to points roughly in proportion to Tbar, though much lower.

In fact, in online/offline form, the system looks like this
It's this early jumping on and off -line that causes the system to oscillate at first, before starting to settle down as time goes on.

We also see that, because the system starts much lower than its defined stationary point (Tbar), and because it doesn't have the boost of the base-rates, the actual stationary point the system is converging to is much lower.

If, however, T(0) were set to Tbar, then the system would have stayed constant at Tbar. Similarly, if T(0) were set to [0,0,0,0], then the system would stay there, since it wouldn't have the base-rates to get any of them off the ground.


Antagonistic Altercations

If we set up an antagonist system like that in the last post - with beta and Tbar as above, and conjugate alpha - we get a line graph that looks like this
In this case, the base-rates act to temper A's dislike of B, so that B stays online; though the result is that the stable point is lower than the maximum for everyone. The stable point for A is lowered the most.

But if we remove the base-rates (and recalculate alpha accordingly), the system progresses much like it did last time
That is, as soon as B shows up, A leaves - eventually followed by everyone else.


Upping The Tension

Now, what if we change the setup so that A and B both want to avoid each other.

One might question why two people who dislike each other would be 'friends' (in the social networking sense). I dunno. This is a hypothetical situation meant to demonstrate a point, god damn it!

Here's the system equation
Matrix of maximums, Tbar, stays the same as above at [15,10,5,10].

So if we set T(0) = [0, 10, 5, 10], here's what happens
A comes online because of their base-rate, and once they do, B disappears. Here's this system's initial and stable points
In antagonistic systems such as this and the one before, Tbar isn't a stable point.

In this case, for example, the peak rate for A occurs when B is offline and C is at their peak rate. But C is at their peak rate when both A and B are online. So if B is offline, C can't be at their peak rate, and as a result neither can A. And vice versa. So Tbar can't be achieved, since the tension between A and B pushes down the rates of everyone, themselves included.

The system does still stabilise however. It just to stabilises with B offline, and everyone else at a slightly lower rate in this case.


Now, if we do as we did in the previous blog, and take A offline (indefinitely), here's how the system evolves
In this case, the stable rates of C and D are lower than they were when B was 'forced' offline by A. And even though A isn't online, B isn't posting at their maximum rate.


Exogenous Effects

So as before, we also have to consider external effects.

People can suddenly, and unexpectedly disappear - maybe fall ill, get a job, or just not post for a while. Conversely they could find themselves in a situation where they're posting more - maybe because of a major news event, or because they're at work (procrastinating).

We can quite easily create these effects artificially, then observe how the model responds to that change. For example, we could gauge a person's importance (to a network) by removing them and seeing how dramatic an effect that has.

So in the antagonist examples above, we could say that A is more important to that network than B, since removing B had a greater (negative) effect on the C and D.


For this, we define some arbitrary function, E(n). We then multiply our model equation by this function
So for the example last time, where we removed two users C and D, then later reintroduced C, the function(s), E(n), might look like this
Or, if some major news story broke between n=2 and 5, then the function might look like this

So, if we were to, say, remove A (indefinitely) from the top system above, then the result would look like this
The important thing to note here is that even though B is the only person directly linked to A in the network, removing A from the graph lowers the rates of everyone.

Alternatively, we could double A
In this case, multiplying A by two brings everyone else's stable points up. It's also worth noting that A' stable point isn't just doubled, but multiplied by 2.7. This is a good example of the feedback effect mentioned at the start of the previous blog - increasing A' rate increases everyone else's rates, increases A' rate.

Or if we half A
In this case, everyone else' rates drop too, and A's rate stabilises at less than half it's maximum.


Agitated Antagonist

So what if we go back to our antagonist system, and start poking it...

First of all, if we half A
Then B is able to stay online, but with a stable point much lower than their maximum.

But, here's where things get interesting - if instead we double B, here's what happens
This time, they each scare each other off. But, once the other goes offline, they each come back in the next turn, only to scare each other off again. And so on. And all this jumping between extremes makes C and D jump about as well, but half a cycle out of sync.

At least, that's what happens for initial conditions [0, 10, 5, 10]. If we start with all at 0, the behaviour is nearer to that with no external effects.


In General

So it seems to be the case that externally changing one user affects (almost) everyone in the graph.

That being said, I would expect that when the model is generalised to much larger networks, we would find that the greater the distance between two users, the less affect they will have on each other.

It probably stands to reason then that, for a given user, it's only worth considering a graph as wide as friends of friends. Maybe friends of friends of friends for (a collection of) dramatic changes at that distance.

More in general, when you apply the model to a system of more than four people (as we've looked at here), you would expect things to get more complicated.

But the overall results should be similar - that is, in most cases the system will tend towards some stable state. It may just take longer to reach that point, depending on the size of the graph.


In Conclusion


So the long and short of it is this - in most cases, a group's post rates will typically converge towards some stable point.

If there aren't any antagonists, and if the base-rates aren't zero for everyone in the network, then the stable state will typically be the common maximum, Tbar. Otherwise, the system may oscillate, or else become stable at a much lower set of rates.

If we set all base-rates to 0, we find that the systems behave much like they did in the simplified online/offline model discussed in the previous blog; with some quirks of their own.

In the absence of base-rates, or in the presence of antagonists, initial conditions become important, affecting if and where the system stabilises.

When we consider external effects, the same will tend to be true - but with the stable points altered, according to what that effect is.

Modifying one user affects everyone else in the graph (to some degree). Modifying different users will have different effects on the rest of the graph, depending on the 'importance' of that user.

Of course, in the real world, external effects are unpredictable, and will tend to keep the system off-balance (unstable). But we do what we can.


Oh, and if you were wondering how I got the numbers for the graphs, here's the code I used. But be forewarned, it's very sloppy.


Oatzy.


[..and the fundamental interconnectedness of all users.]

Saturday, July 30, 2011

Social Posting - Part One: Who's Online?

Coherence of Absence

So imagine it's Sunday, there's nothing to do, nothing's open, nowhere to go, nothing on TV. So why aren't people spending all this free time online - tweeting, Facebooking, whatever?

Sunday is an online dead-zone.

A random hypothesis I thought up to explain this, posits that this may be the result of large numbers of people thinking "why post, if there's going to be no-one around to see it?". After all, no-one's around on Sunday. If they were, they'd be posting stuff.

I called it Coherence of Absence, because a fancy name can make a daft idea sound respectable. The alternative was The Sunday Effect.

Of course, it could be that people mostly social network when they're at work (procrastinating). Or maybe they're asleep/hungover on Sunday. Or have nothing to talk about, since Sundays are so boring. I dunno. I'm unemployed, so Sundays aren't much different to any other day of the week.


In Other Words

So this can be boiled down to two principles:
1. People are likely to post more when there are more people online
2. The number of people perceived to be online at any given time is based on who is posting
It's fairly intuitive that when there are more people online (posting), you'll post more - not just because there are more people to talk to - tweets to reply to, posts to comment on - but also because there's a bigger audience; more people to see what you have to say, regardless of how interesting or mundane that might be.

In essence, these two together create a feedback effect where your posting habits are based on your friends' posting habits, are based on (to some extent) your posting habits. And so on.

For point two, we can't generally say for sure whether or not someone is online, except for when they post something. Though there are exceptions. Of course, the irony of this model is that it assumes that everyone is online all the time - just waiting for someone else to indicate their presence with a post.

Don't get me wrong, I'm not saying this is definitely how the world works. This is more of a, let's suppose this is how it works - what then?

NB/ the people a user follows are referred to as 'friends', since 'the people a user follows' is such a mouth full. These imaginary people may not, strictly speaking, consider each other friends.


Technically

Right, let's dive in: for some network of users, define a matrix M with entries
For example
NB/ connections between users needn't necessarily be two-way (except in the case of Facebook).

Next, we define a column matrix, T(n), with entries
Now, we say that if at least one of a user's friends is posting, then the user will 'come online' and post in the next turn. Vice versa, if none of their friends are posting, then the user will 'go offline' and not post in the next turn.

To do this mathematically, we just multiply the matrix M by the matrix T. For example, if we start with only B online (blue),
Then, as expected, all the people who follow B come online. B goes offline since none of the people they follow were online in the previous turn.

You'd then multiply M by T(1) to get T(2), M by T(2) to get T(3), and so on. Here's another example, starting with only A online,
Easy.


That's Not Normal

The problem with the model so far is that there only needs to be one other person online for a user to join them. But one person online will be more significant to a person following 5 people, than to a person following 100.

So we might instead say, if at least half of a user' friends are online, then the user will join them in the next turn. First of all, we define a matrix, W, with entries,
So for the example above, we get
That is, the matrix M normalised so that the sum along each row equals 1. Our definition of T then becomes
Which is just a fancy way of saying, if at least half of a user' friends are posting, then the user will post in the next term.

So, for example, if we start with A and D online,
As before, everyone ends up online. But in this case, if we had started with only A online, then that means only a third of B' friends are online, so B wouldn't come online in the next turn. Nor would anyone else, for that matter - everyone ends up offline.

Of course, you could change it so that a user will come online if a third, or a fifth, or whatever of their friends are online. If you really wanted to.


Weightlessness

Of course, this supposes that a user cares about all their friends equally. In reality, we might care more about one person being around than another. There may even be people we don't care about at all (in terms of what we post), or people that we want to avoid.

The trick is this - instead of assigning everyone a 1 in the matrix M, let user i assign to each of their friends, j, some numerical weighting, based on how much i cares about j. If i prefers to avoid j, then this value may be negative.

Call this matrix W'. This matrix is, again, normalised to generate W, as before; making the sum along each row equal 1. The only difference is that with antagonists, some rows may add up to 0. So to avoid division by zero we have this
At this point, I'll introduce the equation for the model so far


Avoid the Antagonist

So let's say this is our network now, with accompanying normalised-weighting matrix
B has no qualms with A, but for whatever reason A harbors some secret dislike of B, and will typically want to avoid them. Why does A still follow B? Keeping up appearances, stalking? I dunno. Let's just pretend there's a good reason.

So, for example, if we start with A, C, and D online, here's what happens
B comes online because both their 'friends are online. But B coming online effectively scares everyone else off, one by one. Admittedly, that's partly because I specifically chose weightings that would cause this to happen. But you get the idea.

Of course, once B goes offline, A might decide to come back online again, with D and then C following shortly after. At which point, we end up back at the start (1), where the cycle repeats.


Can't Talk

For this we go back to the first graph above, weightings all the same. Say one or more of the users go offline - maybe they're out on the town, or asleep, or something of that sort. What then?

If we take just one person offline, then nothing will happen to any of the other users - except if you remove B, in which case only A goes offline. So in the below example we take C and D offline (red)
By (4) everyone is offline, so that when D comes back online at (5) there's no-one else online to keep them around. And D alone isn't sufficient to bring anyone else back online.


Wait For It

So this is a very simplified model. For one thing, it only considers people as either online or offline - or if you prefer, posting or not posting. It also assumes that, in the absence of an 'audience', a person won't post. And that leads to the implication that a social network can be destroyed by strategically removing users. It can't. I don't think.

Anyway, til next time..


Oatzy.

Sunday, July 24, 2011

Tumbling, Appendix II: Getting Formal

Pre-Ramble

The problem with what I've described previously is it's all a bit fuzzy - it's based on flow diagrams and vague descriptions, it was informal, and lacked precision. Mathematics, however, likes to have ideas clarified, by transforming them into precise, but somewhat impenetrable, collections of symbols. And who am I to argue with maths?

Having found PlosOne and arXiv, which offer open-access to academic papers, I've come across a few papers - some related to subjects in my blogs - that have grabbed my interest. And usually I'll skim over them, then download for reading 'at a later date'. But even just skimming, I have started to picked up a few tricks

Being as this is just an informal blog, there's nothing wrong with being informal. Nonetheless, there's no harm in a little practice. So this is an attempt to reintroduce the model (derived previously) as a formal set of equations. It should be noted, though, that none of this changes or nullifies what was previously written. It simply sets it out in a more precise - albeit cryptic looking - way.

Consider this fair warning of what's to follow.


But First...

For completeness, I wanted to redefine the model as a compartmental one. For what it's worth. It would look and work something like this
In this case, alpha roughly reflects the average number of followers a user has, and beta the approximate average reblog rate. (gamma = 1 - beta). As discussed previously, this model isn't really sufficient. But it's nice to look at for comparison.

In fact, the exposed state (E) isn't strictly necessary in this case (more on that below). It's more of a placeholder, included for clarity. You can actually rework the model to get rid of it; in which case, it looks like this
Here's an example of what the output might look like (N=10,000, alpha=0.001, beta=0.14),
With the compartmental form of the model, because the variables - such as the reblog rate - are constant, the behaviour across multiple simulations is uniform.

In fact, for the same initial conditions, the results will always be the same. By comparison, the behaviour of the model previously outlined can vary wildly between multiple simulations, run with the same initial conditions.

You could modify it - maybe make the variables, literally, variable - but since we already have a perfectly good model, it's not really worth bothering.


Preamble

First of all, these are difference equations - as opposed to the differential forms used above. This is because we're working in discrete steps (rather than continuous), and because the equations aren't, strictly speaking, time dependent.

Second, for this formal approach, we're working in matrices. It's just easier that way.

Now. For the simulated model, I created the network at random, as I went. This was on the assumption that it would be computationally faster. But for formality, we suppose that the entire network is either known, or else generated prior to simulation.

At any rate, it works like this - define a matrix, M, with entries,
This is, essentially, a table of ones and zeros that represents some (arbitrary) network of connections - i.e. who follows who.

As an example, in the below, the matrix on the right represents the graph on the left,
From this, we can calculate things like: the number of people a user, i, follows, and the number of people user i is followed by (respectively) as
That is, in the left case, sum along the i-th row, and in the right, sum down the i-th column.

Next, we define column matrices for each of the possible states. So, for example, the susceptibility matrix, S, has entries
That is, if user i is susceptible at time n, then this will be indicated by a one in the i-th row of that state matrix. Otherwise its value is zero.

We can then define an operator to calculate the number of people in that given state as
Essentially, this counts the number of ones in whichever state matrix. Which leads to the equation
Which basically says that the sum of the number of people in each state equals the total population size.  Or in other words, that everyone in the population has to be in one of the three states.

Finally, we define a function, sigma, that looks like this,
This is the function for threshold testing, described previously. In essence, it takes a matrix of the users' thresholds, t, and their assessments of a given post, A(p), and returns a matrix of ones and zeros representing whether or not each user will 'reblog' that post.


The Model

So now, we can define our set of equations. Brace yourself,
The typical initial conditions will be,
In theory, this model shouldn't be significantly different from the one previously outlined - though, admittedly, I haven't tested this assumption.

Ultimately, the behaviour of the system depends on the matrices M, t, and A(p). And if these are predefined and fixed, then the behaviour will be the same over multiple simulations. But if these are newly generated (at random) for each simulation, then the behaviour will be variable, as before.

In this case, E(n) is really more a function, upon which the other equations depend, rather than a state in and off itself. In particular, no-one exists in that state for any longer than it takes for them to be redistributed into their proper state (for that iteration). This is similar to the redundancy of the E state in the compartmental model above, except that this time it's not so easily written out.

And that's basically it.


Oatzy.


[Hoping I didn't make any mistakes in my own model.]

Tuesday, July 12, 2011

With Enough Tries..?

Probability is tricky. It isn't always intuitive. Coincidences aren't necessarily as rare or as unusual as they might seem.

I can't remember how I got to it, but the other day I came across the wiki article on the Law of Truly Large Numbers. An interesting idea to say the least.

Then a couple of days later I was looking through one of my books for blog ideas, and came across an essay with an example strikingly similar to that in the wiki article (in never gave it a name).

Coincidence?


So what is the Law of Truly Large Numbers?

The Wiki page gives this description:
[The law] states that with a sample size large enough, any outrageous thing is likely to happen.
The example given on the page is a little inelegant, so I'll go with the (abridged) similar example from the book,
Suppose that a really memorable, once in a lifetime coincidence is one which has a one in a million chance of happening today, and that during any particular day there are 100 opportunities... [T]he chance that one of these coincidences will happen to you tomorrow is 1 in 10,000. Still very unlikely...
[But] the chance that every one of the next twenty years will have no one-in-a-million coincidences for you is.. 0.48, or a 48 per cent chance.
According to this extremely rough and ready calculation, there is actually more than a fifty-fifty chance that in the next twenty years you will experience a memorable one-in-a-million coincidence. This also means that for every twenty people you know, there is a greater than 50% [chance] that one of them will have an amazing story to tell during the course of a year.
Now this is an interesting thought.

And it raises an interesting question - If you play the lottery enough times, does winning eventually become significantly more likely? Inevitable?

It's an often quoted 'fact' that you're more likely to be stuck by lightening on your way to buy your ticket, than you are to win. But what does 'the law' have to say on the subject?


Preamble

For this we're assuming a good old fashion, six balls from a pool of 49 lottery.

Probability of winning the jackpot (matching all six balls) with one ticket is 1/13983816 or about 7 in 100million

If you play two lotteries, then your odds of winning are (Odd of winning the first) + (odds of winning the second) + (odds of winning both).

OR, and this is easier to work out,

Let 'Odds of not winning', q = 1-p(winning)

'Odds of winning at least once in two games' = 1 - (odds of winning neither) = 1 - (q*q)

This can be generalised to 'Odds of winning jackpot playing n games', p = 1 - (q^n)


Round One: Will I hit the Jackpot in My Lifetime?

First of all, odds of winning the jackpot by playing every week for a year

p = 1 - [1-p(winning)]^52 = 3.7 in 1million

So not great. How about if you play ever week, starting on your 16th birthday and giving up (dying) on your 86th. Or basically, playing for 70 years. Probability of hitting that jackpot?

About 1 in 4,000 chance. So still not great.

Of course, if you buy 40 tickets a week, then that gives you a 1 in 100 chance of winning the jackpot at some point in your life. But by that point you're spending £2,080 a year on lottery tickets. The average jackpot would have to be more than £14.6 million for the expected return (prize*chance of winning) to make it worth playing.


Round Two: What About Immortality?

So we've got the equation p = 1 - (q^n)

The question is, can we find n - i.e. the number of games you'd have to play - such that the probability of winning (p) is 50:50

The trick is logarithms, and the formula is

n = log(1-p)/log(q)

So for p = 0.5, n = 9,692,842 games, or about 186,400 years.

For a 1 in 4 chance of winning? 77,363 years

1 in 100 hundred chance?! 2,703 years

Alternatively, to have a 50:50 chance of winning in your lifetime (70 years) you'd need to buy 2,663 tickets a week. Yeah.

Basically, even by the Law of Truly Large Numbers, and immortality, you'd be waiting a ridiculously long time and you'd still be lucky to win.


Round Three: I'll Take Anything!

Now wait a minute, I hear you say, I could still win something by matching 5 numbers, or even 3. Okay, that's a fair point.

So you need to match 3 or more numbers to win something. Probability of winning anything in any given game? ~6 in 100,000

So once again, chance of winning something if you play every week for 70 years? 195 in 1,000

Now that's interesting. That's just short of a 1 in 5 chance. But to be worth playing, the average prize value would have to be ~£18,666. Worth it? I'll let you decide*.

And finally, how long would you have to play to have a 50:50 chance of winning something? ~223 years. Or 45 years if you buy 5 tickets a week.

Which is going to be a real kick in the balls if that something turns out to be £5.


Or To Put it Another Way

* Imagine a game you only get to play once. You pay me £3,640 to play, then you pick a number between 1 and 5. I then generate a random number between 1 and 5.

If the number that's generated is the number you chose then you will win some randomly chosen prize between £5 and £5million; you're more likely to win a smaller prize than a larger one, and you can't know in advance what the prize will be.

Want to play?

If you play the lottery, but answered no to the above, you should probably reconsider.


tl;dr As has been said many times before, your odds of winning the lottery jackpot are catastrophically minute. Even if you were to play every week of your life.


Oatzy.

Friday, July 01, 2011

T-Shirt Calendar: Revisited

If you'll recall, in early February I created a T-shirt calendar - that is, a 'calendar' where each square is coloured according to the colour of the T-shirt I wore on that day. Previous post here.

So, with half the year more or less done, here's what the calendar is looking like now,
There's not really much else to say on the subject, other than to note that there's very little pattern to it, and that I wear a lot of green T-shirts.

So ultimately we're left gazing upon it's random patch-work glory, and wondering what its purpose or meaning might be.


Oatzy.


[For what it's worth.]