Questions Nobody Asked..: gender

Showing posts with label gender. Show all posts

Saturday, March 24, 2012

Some Twitter Infographics

I did some stuff like this before. And I figured, while I was updating my network graphs, why not update some of the other graphics?

And it helps that I worked out how to easily extract data from Twitter (see previous blog). The code is here. Again, rate limits apply.

Who Do I Follow?

This is one of the ones I did before - collect together the bios of the people I follow, then make a word cloud (using Wordle)

Basically, I follow a bunch of geeks and writers. Who like 'things'. So really, same as a year and a half ago.

I would point out though that 6 of the people I follow don't have bios, and about 7 just have lyrics.

Data here.

Who Tweets the Most?

These rates are worked out as (total tweets posted)/(total days online). Obviously, the actually post rate will vary over different time scales..

Bubble chart (made with ManyEyes) - bubbles sized by tweet rate (the numbers on some of the bubbles).

The graph below gives a better idea of relative rates, and 'rankings' (click to embiggen)

The blue line is actual values.

The orange is a logarithmic trend-line. It's a pretty good fit (R2=0.95); and, loosely speaking, it means ~70% of the tweets in my timeline come from ~30% of the people I follow. [cf: Pareto Principle]

You get similar log-shaped graphs when you split up the genders.

Full data here.

Chattiest Gender?

You can read all the explanation, caveats, etc. in the previous posts (here and here). I'm just going to go straight into the data.

I follow 27 men and 21 women (excluding celebrities, etc.). The stats are as follow:

Men:
Average = 6.21 tweets/day
Standard Deviation = 6.47

Women:
Average = 13.79 tweets/day
Standard Deviation = 14.27

For clarity, here's a boxplot (made in R)

Basically, the women tweet more on average, and their rates are more spread out than for the men. In fact, roughly three quarters of the men tweet less than half of the women. Also, there's one outlier in the female group.

This is similar to what we found last time; although the women's average and spread aren't quite as high (average: 13.79 vs 19.21), and the men's average has increased slightly (6.21 vs 5.29).

If you take the ratio of the averages, the women tweet 2.15 times as much as the men. But maybe I just follow particularly chatty women..

Here's treemap (ManyEyes), which should give you a better idea of the gender balance (boxes sized by tweet rate)

Specifically, the graphic above is 62.5% purple (female).

Data here.

Where in the World Are My Followers?

The site I used last time doesn't seem to exist anymore. So I'm using MapMyFollowers instead. As the name suggests, these are my followers, rather than just the people I follow. Nonetheless..

Mostly in the UK and the US. As you'd probably expect.

I will point out though, some of the locations are a little suspect. Some people haven't made their location available so aren't included, and others seem to be in countries they couldn't possibly be in. But it's the best we can do.

Here's a zoom in on the UK

What Do I Tweet?

Made with Wordle, with data from TweetStats.

Words are sized by how often I tweet them; and by extension, @usernames are sized by how often I tweet those people.

In fact, here are the people I 'mention' the most (TweetStats)

Couldn't get a good source on who @replies me. That was one of the things Twoolr used to do..

When Do I Tweet?

Twoolr used to be awesome for Twitter statistics. But sadly, when they left beta, they started charging. And their free service went to shit. Luckily, I found TweetStats. Weirdly, it doesn't need you to log-in or anything, but somehow it can pull data on (nearly) all your tweets - beyond the 3,200 limit. Strange.

Here's some more graphs

Basically, I tweet most on a Friday and Saturday, and at around 1-2pm.

And I've never tweeted at 5am. But that's probably because I'm always asleep at 5am

Except that one time I got really drunk. (SleepBot)

How Much Do I Tweet?

This is another one I used to go to Twoolr for. And, to be fair, I still could. But that only goes as far back as April '10, and its graphics aren't as clear. Here's TweetStats again

Like I said before, I didn't tweet much in my first year. In fact, I only posted 36 tweets in all of 2009.

Now, the one problem with TweetStats is that 5 month gap in 2010. Why is this significant? Well, I was definitely tweeting during that time. In fact, by my estimates, over those 5 months I posted 5,724 tweets (~37tweets/day). So those 5 months account for 43% of all my tweets.

See, the thing is, in 2010, I was out of university, single, and unemployed. I posted a total 8,823 tweets - 24tweets/day. Since I've been back at university, that number's dropped to 11tweets/day.

That lull in Summer 2011 was when I was spending all my time on Tumblr and watching classic Doctor Who. Incidentally, I haven't posted on Tumblr since the start of September '11. It's terribly addictive, you see. I wouldn't recommend it; unless you're addicted to Doctor Who and Sherlock, and have lots of time on your hands..

So yeah.

Oatzy.

[Self-indulgent statistics, and pretty illustrations.]

Wednesday, August 03, 2011

The Toilet Seat Conundrum

Gentlemen, do you leave to toilet seat up, or courteously put it down after use?

I read a (somewhat tongue-in-cheek) article a while back, I can't remember where, that explained the toilet seat conundrum in terms of game theory. As best I can remember, it was quite clever. I was recently reminded of it when reading a Cracked article, and thought I'd try to recreate it, and - as is my wont - take it a step further.

Game theory, for those unfamiliar, is an area of maths/economics that studies 'competitive' interactions, "in which an individual's success in making choices depends on the choices of others".

Preamble

The (average) probability of the gentleman needing to 'sit down' when visiting the bathroom, we call p*. The probability of not is (1-p).

If the toilet seat is in the 'wrong position' for a given visit, we call the cost of this c1, and we assume that this cost is the same for both genders. This may not be strictly true.

The simplest cost would be in having to move the seat, typically in the form of mild inconvenience, and the potentially unpleasant experience of having to touch the underside of the seat. I'm also told that there are certain perils in visiting the toilet at the night, if the seat is in the upright position and is required to be otherwise. I can't say this is a cost I've ever experienced.

One might argue that the 'costs' are inconsequential; but for the sake of arguing, they aren't.

For the sheer hell of it, we'll call the woman Alice and the man Bob. Alice and Bob have been in a relationship/living together just long enough to quarrel over such matters. I suspect, for most people, this is a non-issue; but that's not the point of the post.

There is a third possible game, not discussed below, in which Bob can just leave the seat down at all times. In this case, we have c3, the cost of clean up if Bob's aim isn't quite up to scratch.

Oh, and there is a fourth game, where the default position of the the seat is upright. This is the worst possible game for Alice, and is only the best possible game for Bob if p<0.28. I can only imagine this game working in an all male household, and even then (a re-adjusted) Game One works out better.

Game One - Leave It As Is

Probability of the seat being down is the probability of Alice being the last to visit the lav plus the probability Bob was the last and left the seat down. Probability the seat is up is 1 minus the above.

Here's the cost matrix for this game

Where cost is c1 multiplied by the probability that the seat is in the wrong position

To get the total costs to Alice and Bob for this game, we work out

(Probability the seat is down x the cost if seat is down) +

(probability seat is up x cost if seat is up)

And we can work out the ratio of costs

B:A => 2p+1 : 1

In the extreme case, where p=0 (Bob never poops), their costs are equal. But in all other cases, Bob's cost is greater than Alice's.

Game Two - Return to Default

Default meaning the seat is always returned to the downright position after use. The probability of the seat being up is always 0.

Here's what the matrix looks like

In this case, Alice incurs no cost. Bob, on the other hand, incurs double cost - when he needs to urinate, he has to move the toilet seat twice: up before use, and down afterwards.

It's obvious that Alice, once again, fairs better than Bob.

Game Three, mentioned in the preamble, works the same as this, but with 2c1 replaced with c3. Alice still comes out better though. Unless she doesn't like the thought of sitting on a toilet seat that's (potentially) been peed on - even if it is cleaned - in which case, there's some abstract cost to her.

If she doesn't mind, then which of games Two and Three Bob would prefer depends on which is smaller: 2c1 or c3.

Lowest Costs

First of all, we note that in both games Alice comes out better than Bob - incurring a lower cost in both cases. That said, Alice does better in the latter game, incurring no cost at all in that one. So Game Two is preferable to Alice.

But what about Bob?

If we take the cost ratio of game one to game two for Bob, this is what we get

B1:B2 => 2p+1 : 4

In the extreme case of p=1 (Bob never urinates), Game Two incurs a greater cost for Bob (3:4) - and by extension, Game Two always incurs a greater cost to Bob.

THIS is where and why the conflict arises.

Alice prefers Game Two, Bob prefers Game One.

Tipping the Scales

So Alice would prefer to play Game Two, but she has to encourage Bob towards it. So Alice introduces a new penalty - c2 - for Bob leaving the toilet seat up.

The cost will typically be something along the lines of a bollocking, silent treatment, arguments, or whatever.

So what we do is this - the odds of Bob leaving the toilet seat up, and Alice being the next to use the bathroom -> (1-p)/2

Multiplied by the cost, c2, and added to the pre-existing total cost for game one

Now, we - or rather, Alice - wants the cost to be such that Game Two is preferable, i.e. B'1 > B2

Rearranging and simplifying, we get

c2 > 4c1(2p+1)

However, if Alice were feeling kind, she could introduce a 'reward' for putting the toilet seat down, instead. It works effectively the same - barring psychological, carrot/stick considerations.

For this, Alice would have to offer a reward, R, with

Arguably, Bob could introduce a new cost - or enticement - himself, to 'persuade' Alice towards Game One. But TV leads me to believe that this is seldom thought of, or executed.

This might be because Alice has more to gain/lose - in as much as, Alice can avoid any cost by 'playing' Game Two. Bob, on the other hand, incurs some cost in both games.

You can draw your own conclusions on that one.

In terms of a co-operative solution, if we add together Alice and Bob' costs in each game and compare, we find that Game One has a lower total cost than Game Two.

So one could argue that Game One is better overall. The challenge, though, is convincing Alice that that is the best solution for both of them, given that, from Alice' point of view, she does worse in Game One.

Casino Bathrooms

So this is all well and good, but it's kind of a specialised case - the situation of a house with one male, one female, and one toilet. In our house, for example, we have two males, two females and three toilets. What then?

There are a few other problems with the probability-based approach, as well. For one thing, it uses an average poop-probability for the gentleman. It also assumes both Alice and Bob use the bathroom about the same number of time during a given time period - whereas some people have more robust insides than others.

So for this, we create a Monte Carlo simulation.

[This is what we call excessive commitment to an idea.]

In the simulation, we create a 'person' object, and assign to them a gender, an average number of bathroom visits, and, for males, an average bladder to bowel movement ratio. To capture the day to day variability in number of visits to the bathroom, we use Poisson distributions.

We also create a 'toilets' set, representing however many toilets there are, and their current states -> 1 = toilet seat up, 0 = toilet seat down. Each toilet has an equal chance of being chosen for use by any given person at any given time.

Each person has a counter, which is incremented when the person in question has to move the seat. At the end, these counters are grouped by gender for comparison.

In the Middle of Our Street

So I created a 'house' of two males, two females, and three toilets (variables chosen arbitrarily). Then ran the simulation for 10,000 hypothetical days.

The Game One simulation gives a result of ~ 3.28 seat moves per male per day, and 2.27 seat moves per female per day. That's a male:female seat move ratio of 1.44.

The Game Two simulation gives a result of 9.11 seat moves per male per day - bearing in mind, men have to move the seat twice per standing visit, in this version - and women never have to move the seat.

Code here.

Fun fact: Without additional costs and rewards, Game One is always preferable to men, Game Two to women. Regardless of the balance of men and women in a house.

So now you know!

Of course, in some cultures the conflict never really arises, since it is 'the norm' for men to sit for all visits to the lavatory.

Oatzy.

*[inb4 shouldn't p be the probability of needing to urinate lol]

Thursday, February 03, 2011

Revisited: Tweets by Gender

So I first looked at this here, with a follow up here. The conclusion was that among the people I follow, the women do tweet more than the men.

So it's now about 4 months since that last post, so I thought I'd have a quick look at how things have changed.

As a quick recap for those too lazy to click the link above, here are the results from last time (now in graph form)

If you want to know about the technical details, you will have to click the link.

And here are the new numbers

It's pretty much the same deal, but it's there for those interested. And just to further clarify whatever point it is I'm trying to make, here's a boxplot comparison of the numbers (made in R)

The three circles over the female's numbers are outliers. And once you factor out the outliers, you find the numbers are actually quite similar. But the middle quartiles (the box parts) are still slightly lower for the males.

The other thing I did, for the shear hell of it, was the rates for the last 4 months; the other graphs are 'lifetime' rates.

You can get the dataset for all these numbers - as well as totals, days online, changes in rates, etc. - here.

What About Everyone Else?

This isn't something I'm going to try and work out myself, because frankly it's not worth the amount of effort it would require.

So I googled it instead, and found this

Apparently, on average women tweet 12% more than men. That number is based on total number of tweets posted though, where mine are based on rate.

But an article I linked in the first post suggested that men and women tweet at about the same rate.

So make of all this what you will.

Oatzy.

[If you see anything that looks like an error, leave me a comment and I'll look into it.]

Saturday, October 02, 2010

Follow Up: The Women Really Do Tweet More!

As Aerliss quite rightly pointed out, taking an average over the last 2 weeks isn't that accurate for anyone who has been uncharacteristically quite or talkative over that period. Problem was, I couldn't get numbers for further back than that.

In fact, it turns out the numbers I would've gotten with the Twitter API, I can get in browser by going to, for example http://twitter.com/statuses/friends/oatz.xml.

And what you can get in there is the date a user joined and their total number of tweets, which means you can work out a 'lifetime average'.

For those interested, here is the code that turns date joined into number of days online.

Anyway. I ran the numbers. Some were surprisingly close to those I got yesterday. Others were wildly out.

The Numbers

What I found was that there are 3 particular outliers - 3 women who are WILDLY more talkative that anyone else I follow.

So for the sake of naming and shaming, and so you can all feel vaguely competitive with each other, here are the numbers:

Sorted by rate, coloured by gender. Click to enlarge.

Needless Complaining

Now, I do still have some qualms with this approach. For example, I joined last May. But there was a period of around 6 months when I never went on Twitter. So if we could exclude that period my average would be more representative of my tweet rate (and higher). That said, my average is still one of the higher ones.

I don't know if and how this affects other people, but there are some cases where the lifetime-averages are dramatically different to the 2week-averages.

Still, it does give me some of the numbers that were missing from yesterday's. So pros and cons. But in all honesty, drawbacks aside, I do trust these numbers more than yesterday's.

The Results

Enough waffling, here are the revised results:

Men:
Average = 5.29 tweets/day
Standard Deviation = 4.93

Women:
Average = 19.21 tweets/day
Standard Deviation = 24.94

So yeah. I wasn't imagining it. The women I follow really do tweet (on average) a lot more than the men!

A Graph

To show the spread - but mostly just because I can - here are the rates plotted on a graph (natural-logarithmic scale). Blue squares are women, orange diamonds are men.

So with the women you have a big clump in the middle and a handful at the extremes, whereas the men are sort of more evenly spread.

Even if you exclude the 3 female 'outliers', the average - 7.28 - is still higher than the men's, although the standard deviation becomes lower than the men's.

If you include me in the men's average, theirs still only goes up to 5.79.

So there you have it.

Random facts

- Of the people I follow, abooth202 has been on Twitter the longest - 966 days - with 5th November 2010 being his 1,000th day.

- miss_popcouture has posted the most tweets in her time on Twitter, with a heroic 49,922.

Oatzy.

* All figures correct as of 2nd October 2010. Subject to variation over time.

Friday, October 01, 2010

More Twitter Insights

Introductions

First of all, following a few recent new follows, here's the revised introductions graph.

[Click to enlarge. Interactive version here.]

Main new thing going on is that some of the chains are getting longer. Also I put in some of the people I missed out last time (for clarity), just so it's more complete.

Gender

Recently, I thought I noticed that a lot of my timeline was filled with tweets by women-folk. Obvious first conclusion being that I must now be following more woman than men.

Having looked into this matter further, I actually found that I'm following almost equal numbers of men (15) and women (14) - excluding celebrities and dead-accounts.

Next question then is - What the hell?

Okay, this next theory may sound a little sexist but bear with me - what if it's just that the stereotypes are true and women really do talk more (on average)?

Given that a recent update to the Twitter API borked my previously mentioned programs and I don't (yet) know how to fix them, I had to get numbers and such by hand. Which made life a little harder.

Individual tweet rates are only approximate, and based on an average over the last ~14days. And I'll be honest, I don't entirely trust the source numbers. Also, I couldn't get numbers for some people, which isn't ideal.

NB/ For more applicable results, it would've made sense to only count tweets that would have shown up in my timeline - i.e. excluding tweets @ people I don't follow. But working that out would require a lot more effort, and frankly I don't care enough about that degree of correctness to bother.

Anyway, here are the results:

Men:
Average = 5.2 tweets/day
Standard Deviation = 4.29

Women:
Average = 6.6 tweets/day
Standard Deviation = 4.81

So yeah. The women tweet about 27% more than men. They also have a slightly greater spread of rates.

Basically, the women that tweet the most, tweet more than the men who tweet the most. Which pushes the average and spread up.

Again, these results are only approximate and only apply to my personal network, and could vary (significantly) over time. Also if we had the missing data, it could turn out the opposite is actually true.

In fact, someone already did research on this matter for a random sample of 300,000 Twitter users. What they found was that, while there are slightly more women (55%) on twitter than men,

We found that an average man is almost twice more likely to follow another man than a woman. Similarly, an average woman is 25% more likely to follow a man than a woman.

And that, on average, men and women tweet at about the same rate.

Maybe I'm just following particularly talkative women/quiet men...

Location

Just because I can.

These are maps of people who follow me, rather than just people I follow (which I would've prefered). But I couldn't find something that could do that, so gave up and settled on this. Interactive version and make your own here.

World View

UK View

[Click to enlarge.]

Bios

For the people in the graph at the top of the page, I took their bios and made this word cloud [click to enlarge. Interactive here.]:

This is the company I keep - geeks and writers.

And One More Thing

Previously mentioned Malcolm Gladwell wrote an article for The New Yorker recently, about activism in social networks - Twitter in particular - called "Why the Revolution Will Not Be Tweeted".

In it, he explains how and why social-media based 'activism', is quite different from and less effective than real-world activism - the sort of activism that brings about genuine change. Rather, social-media activism is good at getting lots of people to participate, but they (mostly) only do so with the least amount of effort.

So, for example, they might join a FB group, sign an online petition, or even do the sort of crap 4chan pulls. But they tend not to actually go out and protest, where genuine commitment is needed and where there's the risk of say physical harm. And the resulting pay-off is much less significant as a result.

"This is because, Gladwell says, online networks are all about weak ties — a weak tie is a friend of a friend, or a casual acquaintance — whereas real activism depends on strong ties, or those people you know and trust"

In response, Jonah Leher - writing for Wired - argues that Gladwell's dismissal of weak ties in social activism may be a little short sighted. And in particular it can be necessary for a leader of a cause to have lots of weak ties, so as to have greater reach - or at least, this seems to be the case in real world situations.

Gladwell also talks about how a hierarchically structured group is more effective in activism than a decentralized-network structured group - as is often the form online groups take. And again, this is required for greater levels of discipline, control and commitment.

Both make good arguments, and both articles are worth reading.

Oatzy.