Project: Facebook Data Liberation — Part 1

One year ago my wife, Elizabeth, started a facebook group to help encourage people with their personal exercise programs.  She didn’t know if her friends would be at all interested in participating.  It turns out that not only were they interested but they asked their friends to join, too.  A year later, the group is still going strong with a number of posts every day from dedicated runners, bikers, and exercisers. 

With the first year anniversary coming up, Elizabeth wanted to see how many miles biked, kilometers ran, pushups completed, situps performed and races finished were posted to the group.  The problem is that facebook doesn’t give you a good way to see every post made to a group in an easily searchable or exportable way.  She also wants to find some way to automatically count up the number of miles running and biking people are logging to the group.  We were talking about it and I thought that it must be doable with the Facebook API. 

So with a new mini-project starting up using some interesting technology, I decided that I would blog through the process.  You can follow along while I built a proof of concept HTML5 application to do what she is looking for. If everything goes well and it seems like a useful tool, this mini-project will end up as a HTML5 mobile web app that lets you interact with an active Facebook group.

This post will talk about the Facebook API in general and my personal techniques for gaining an understanding of how this kind of API works.

My first stop was the Facebook Graph API page.  This gives a great overview of the API and provides links into the details I’m interested in.  After a quick browse I wanted to get some more details on the Group object.  You can pull a Facebook group object using a simple HTTP GET request that you can type right into your browser URL bar, like this for example:  Even better than that, Facebook provides a very nice developer tool called the Graph API Explorer where you can perform your test queries and get back the results plus documentation links.

This is actually great for me.  When I see a new web API like this I like to plug URLs into browsers and see what I get back.  It’s a quick way to get a handle on what data you need to supply and what data you get back from various calls.  If the API you are working with requires POST formatted data, you can use tools like curl or wget from a Unix command line.

To get the information on my wife’s group, I needed to put that group’s id in the URL.  I found this id by opening up the group in facebook and looking at the URL of the page.  The long string of numbers at the end of the URL is the group number.  What I’m really interested in is the “feed” or group posts so I plug in the group ID to the following url: (If you want to do this for your group, use your id where I have # marks!)  What I got back was the following:

“error”: {
“message”: “An access token is required to request this resource.”,
“type”: “OAuthException”,
“code”: 104

Ahh, because this group is private, you also need to get a valid access token that authenticates your connection and authorizes it to get the private data.  This is another nice feature of the Graph API Explorer web page. It has an interactive tool to generate a token with the required privileges. I just click the button labeled “Get Access Token” to generate a token you can use while doing some interactive exploration of the API.  When I package this process into a stand alone application, I will have to use the Facebook API so the user can get a token interactively, but that is a later step.

Taking my newly created token, I put together a URL like this:….

That URL gets me a very nice JSON result with 25 most recent posts to the group. 

Poking around in the documentation I found that I could add a limit argument to the URL so I would receive more results from one call:….

The response that I get is JSON formatted and has a structure that looks like this:

  • data
    • id
    • from
    • to
    • message
    • actions
    • type
    • application
    • created_time
    • updated_time
    • comments
      • data
        • id
        • from
        • message
        • created_time
    • is_published
  • paging
    • previous
    • next

Since this group has a lot of posts, but I don’t know how many, I keep pushing up the limit to see what happens.  Yes, a more studious developer would consult the documentation to see what the maximum value for the limit is, but I am a little more inclined to experiment.  What I found was that much over 500 and my request would time out.  A quick google search confirmed that other people also saw that many API requests would time out before the max value of limit was hit. 

So, if I can’t grab them all in one go, I will either have to request them in blocks of time using the “since” and “until” arguments or I can page through the results. If you look at the end of the JSON response format (showing above), you will see that at the highest level the response is made up of a “data” element and a “paging” element.  The paging element has URLs, “previous” and “next”.  By cut and pasting the “next” URL into my browser, I was able to verify that it really does take me to the next set of 500 elements.

Doing this a few times I was able to retrieve all the posts from the group in a nice JSON format.  This is exactly what I needed for my minimum goal: Giving elizabeth a list of every post to date.

So at this point I have all the data I need for my first task of analyzing the group postings.  Next time I will show you what I’m doing on the HTML side to analyze and visualize the data.  Once that is done, I will automate the data collection process so other people could use it.