Friday, December 09, 2011

Best Day to Go Shopping

The Question

As the title suggests - which is the best (least busy) day to go shopping on?

Or more generally, how do crowds at shopping centres vary over time? Which days are busiest, or least busy? Are the shops getting busier as we get closer to Christmas? Less busy?!

It an interesting question, and one that's probably been looked into before. But still, I had an idea and I'm running with it.

[Feel free to skip straight to the results if you're not interested in statistics and the likes..]


Data Collecting

Data collection on this is tricky. Especially if you don't have legions of people to go out and actually count people. What I want to do is extract numbers with the minimum of effort.

So here's the game - Foursquare.

If you're unfamiliar, Foursquare is a 'social game', for which you 'check-in' to locations and earn points and badges accordingly. It's also good if you're an obsessive types who likes to keep track of where they've been.

So the idea is this - some subset of shoppers will be Foursquare users, who will check-in when they visit any given shopping centre. If we can extract check-in counts over a given time period, hopefully that can be used as an indicator of a place's 'busyness'. Obviously, this is flawed - but more on that below.

Right. So on the Foursquare page for a given venue, there isn't a total historical record of check-ins over time. Instead, what we have is the total number of check-ins at that location up to the time when you loaded the webpage.

What we do, then, is record that number at some fixed time every day (say, midnight). Then the number of check-ins on a given day is the difference between the total at the end of the day and the total for the end of the previous day. Easy.

In fact, to make life a little easier, I wrote this bit of code [python]. All I have to do is remember to run that every evening, and we have our data.


Sampling

There are two possible sources of sampling errors:

1) Location

If we only track one location, we have a very small sample size. That means we're subject to perturbations - for example, a major event like the Christmas lights being switched on - or just general statistical noise. Also, shopping patterns may vary across the country, or depending on how close to a city centre the centre is located, and so on.

It's actually fairly easy to overcome this. First of all, we have this list of the largest shopping centres in the UK. From that list I picked 20 locations to sample. This data can then be normalised and averaged to look for any general patterns that are (relatively) store independent.

Oh, and I should probably mention, since this data is being collected from places in the UK only, patterns may vary for different countries.


2) Users

Using Foursquare data, we're working on the assumption that as the number of shoppers increases (or decreases), the number of Foursquare check-ins will increase in proportion.

This is not necessarily the case.

First of all, we look up the Foursquare user demographics. There is no one source of definitive data on this (that I could find). But to get a general idea, there is this, based on a survey of BART travelers.

Obviously, this demographic source is for users of an American transport service, but I'm assuming it's representative of Foursquare users in general.

From this we see that the typical user is most likely male, age 25-34. Or to put it another way, women, young people, and old people are under-represented. And from my experience, it seems like women and old people are the most common shoppers on weekdays.

So this may introduce a disparity between the data and reality. But it's not one we can really do anything about (without seeking an alternative source of data). So, as long as there is a general size proportionality between shoppers and check-ins, we'll consider the data acceptable.


Results

By far, Saturday is the worst day to go shopping (in terms of crowds). But you already knew that.

So I have my data for the last 3 weeks, for 20 shopping centres across Britain. Here's the raw data, for if you're into that sort of thing.

I worked out the check-ins for each place on each day, then normalised by shopping centre - so that the total number of check-ins for each shopping centre over the three week period now adds up to 100. I then averaged these 'norms' across all shopping centres.

Here's what those results look like.
In fact, I went back and 'tidied up' the data, removing venues with less than 100 check-ins total during the recording period - since their sample sizes were maybe too small for any patterns to be statistically significant - and removed a couple of anomalies (one place ended up with negative check-ins).

This is what the tidy plot looks like
[Updated since original post]

Pretty similar, but some of the bars are now closer together (removes the anomalies from Sunday and Wednesday).

So from the results above, the order of days, from least to most busy, seems to be:

1) Monday
2) Thursday
3) Tuesday
4) Wednesday
5) Sunday
6) Friday
7) Saturday

But note, it's pretty close amongst the top 3 least busy days.

It shouldn't be too surprising that weekdays are less busy than weekends - what with people working.

As for Sunday being so low compared to Friday and Saturday - well, that might be the result of Sunday opening hours.

For example, Meadowhall has typical opening hours of 9am-8pm, but on Sundays it's 11am-5pm -> 11hrs vs 6hrs. So maybe it would make sense to re-adjust accordingly. But deciding how, exactly, to re-adjust is tricky. So we'll just leave it be.

NB/ Wednesday, week 2 maybe distorted due to public sector strikes - there certainly appeared to be more people on the train. But but a lot of that increase was from children (see sample bias above).


Of course, each shopping centre is unique, and there will be variation as to which days are best and worse for each. As an example, here's what the (non-normilised) plot looks like for Westfield London (the most checked-in to shopping centre by far)
Again, pretty similar to the average. But in this case, Wednesday is less busy than the average, and Thursday more. And Friday has that weird dip in week 2, bringing its average down.


One last thing I'd like to point out - notice there is no particular week-on-week trend. That is, the number of check-ins isn't (on average) increasing as we get closer to Christmas. Or decreasing for that matter. Which is, perhaps, not what you'd expect.

But maybe that will change within the next couple of weeks. And certainly after Christmas, when the January sales kick off. Maybe.


This is an on-going project - bear in mind, this is only 3 weeks worth of data, so it may be too soon to draw any solid conclusions - but I will keep you posted. Maybe I'll do another post just after Christmas, or after New Year's. At any rate, I'll tweet it when I do.


Oatzy.


[There's always online shopping..]

No comments: