14. How to get text from Facebook

Facebook is another massive trove of text that it would be fun to peek into. Facebook has published two APIs for developers, one in REST and another in their own format. Unfortunately, the REST API has been deprecated, so we focus exclusively on the other, called GraphAPI.

14.1. How to get an access token

You cannot interact with GraphAPI without validating yourself. Validation is a simple process, but it helps to start by logging onto your Facebook profile. Then point your browser at the Graph API Explorer.

14.2. How to use the Graph API Explorer: GraphAPI vs. FQL Query

Oddly enough, GraphAPI supports two protocols for sending queries to its database. There is one that is native to it, known simply as GraphAPI, and another constructed on the standard syntax of structured query languages such as MySQL, called FQL Query (for Facebook Query Language). Each protocol has its advantages and disadvantages, which you will discover in the following examples. [1]

Since you already have the Graph API Explorer open, let’s look at a quick example. Select Graph API and GET and delete whatever is in the entry box to the right of GET until it is reduced to /me. Click on Submit. For me, what appears is the following dictionary:

{
  "id": "2811272",
  "birthday": "11/01/1957",
  "first_name": "Harry",
  "gender": "male",
  "languages": [
    {
      "id": "113301478683221",
      "name": "American English"
    },
    {
      "id": "312525296370",
      "name": "Spanish"
    }
  ],
  "last_name": "Howard",
  "link": "https://www.facebook.com/harry.howard.581",
  "locale": "en_US",
  "name": "Harry Howard",
  "quotes": "\"Richard Hamming’s three questions for new hires at Bell Labs:\r\n\r\n1- What are you working on?\r\n2- What’s the most important open problem in your area?\r\n3- Why aren’t 1 & 2 the same? (Ouch!)\r\n\r\n“You and Your Research” --- Richard Hamming (1986)",
  "timezone": -5,
  "updated_time": "2013-04-24T15:46:22+0000",
  "username": "harry.howard.581",
  "verified": true
}

This is all the personal information in my profile.

Now click on FQL Query. The information from the previous query is retained, though not for long. All FQL queries have the format:

SELECT [fields] FROM [table] WHERE [conditions]

To limit the request to the logged-in user, WHERE takes the condition uid = me(). uid stands for “user id” and me() stands for the logged-in user. The database table that you are going to select fields FROM is “user”. Here is a list of all the other FQL Tables So the only item that remains to construct a well-formed query are the fields. In the example above of my profile, there are twelve (first level) fields –– but FQL has nothing like ALL. You must indicate a field. You could enter all of them, but that would be time consuming, and besides, you probably have fields in your profile that aren’t in mine – see user. So we are stuck for the time being.

To filter personal information in GraphAPI, extend the line in the entry box to /me?fields=name and click on Submit. The result is:

{
  "name": "Harry Howard",
  "id": "2811272"
}

FQL can do this, too: SELECT name FROM user WHERE uid = me(). The response is the same as the previous one.

You can query multiple fields in GraphAPI by separating their tags with commas. /me?fields=gender,languages,timezone returns:

{
  "gender": "male",
  "languages": [
    {
      "id": "113301478683221",
      "name": "American English"
    },
    {
      "id": "312525296370",
      "name": "Spanish"
    }
  ],
  "timezone": -5,
  "id": "2811272"
}

In FQL, do the same, except that the gender field is “sex”: SELECT sex,languages,timezone FROM user WHERE uid = me().

Fields can filtered by sub-field, but my personal information does not have enough depth to make this interesting, so let us look at status updates in GraphAPI with /me?fields=statuses. One of mine starts out like this:

{
  "statuses": {
    "data": [
      {
        "id": "867537645469",
        "from": {
          "name": "Harry Howard",
          "id": "2811272"
        },
        "message": "So, does anyone know whether KISS is going to play in the rain?",
        "updated_time": "2012-03-31T01:11:42+0000",
        "comments": {
          "data": [
            {
              "id": "867537645469_2740816",
              "from": {
                "name": "Wright McFarland",
                "id": "100000054518509"
              },
              "message": "They  got the fireworks. My first grownup big boy concert. Municipal Auditorium 1976.",
              "can_remove": true,
              "created_time": "2012-03-31T01:37:13+0000",
              "like_count": 0,
              "user_likes": false
            },

In FQL, the table is “status”, but once again you cannot select all of the fields, SELECT ??? FROM status WHERE uid = me().

Notice in the GraphAPI response that the text is introduced with the message tag. To filter them out, add .fields and then the tag of the sub-field between parenthesis, like /me?fields=statuses.fields(message). Mine begin with:

{
  "statuses": {
    "data": [
      {
        "message": "So, does anyone know whether KISS is going to play in the rain?",
        "id": "867537645469",
        "updated_time": "2012-03-31T01:11:42+0000"
      },
      {
        "message": "Triskaidekaphobia: fear of the number 13. \nFriggatriskaidekaphobia: fear of Friday the 13th.\nAny (frigga)triskaidekaphobics out there?",
        "id": "813350252439",
        "updated_time": "2012-01-13T17:47:27+0000"
      },
      {
        "message": "62 to 7. What the hell kind of score is that?",
        "id": "763840919539",
        "updated_time": "2011-10-24T03:58:23+0000"
      },
      {
        "message": "It was great to see so many posts from long lost FB friends yesterday.",
        "id": "744976184619",
        "updated_time": "2011-09-05T15:44:41+0000"
      },
      {
        "message": "Update on TS Lee: the four sewers around our house are clear and draining without any problem. There is hardly any standing water in the street and just a few small broken branches. Our rain gauge maxed out at 6\" sometime in the night. We have some water in the basement, so the ground is now saturated.",
        "id": "744537603539",
        "updated_time": "2011-09-03T14:09:27+0000"
      },

GraphAPI does not permit any further filtering, so you would be stuck with the unwanted “id” and “updated_time” values. FQL, in contrast, just returns the messages, SELECT message FROM status WHERE uid = me(). Mine begin like so:

{
  "data": [
    {
      "message": "So, does anyone know whether KISS is going to play in the rain?"
    },
    {
      "message": "Triskaidekaphobia: fear of the number 13. \nFriggatriskaidekaphobia: fear of Friday the 13th.\nAny (frigga)triskaidekaphobics out there?"
    },
    {
      "message": "62 to 7. What the hell kind of score is that?"
    },
    {
      "message": "It was great to see so many posts from long lost FB friends yesterday."
    },
    {
      "message": "Update on TS Lee: the four sewers around our house are clear and draining without any problem. There is hardly any standing water in the street and just a few small broken branches. Our rain gauge maxed out at 6\" sometime in the night. We have some water in the basement, so the ground is now saturated."
    },
    {
      "message": "“I went to a bookstore and asked the saleswoman, “Where’s the self-help section?” She said if she told me, it would defeat the purpose.” –  George Carlin"
    },

You can also look at your Facebook friends. In GraphAPI, just extend /me as /me/friends. The first few of mine come out like:

{
  "data": [
    {
      "name": "Rachel Andersen",
      "id": "2800366"
    },
    {
      "name": "Ernesto Kufoy",
      "id": "2800781"
    },
    {
      "name": "Katie Single",
      "id": "2801206"
    },
    {
      "name": "Eric Wilder",
      "id": "2802995"
    },
    {
      "name": "Matthew Crossland",
      "id": "2803931"
    },
    {
      "name": "Chase Faucheux",
      "id": "2804521"
    },

This is considerably more complex in FQL. You must use the IN proposition to extend the uid. This is the syntax that the GraphAPI developers recommend, though as before, it will not work without a specification of a field, SELECT ??? FROM user WHERE uid IN (SELECT uid2 FROM friend WHERE uid1 = me()).

In GraphAPI, the fields that you can use are found under + Search for a field. Selecting “gender” to produce the query me/friends?fields=gender returns a list like this one:

{
  "data": [
    {
      "gender": "female",
      "id": "2800366"
    },
    {
      "gender": "male",
      "id": "2800781"
    },
    {
      "gender": "female",
      "id": "2801206"
    },
    {
      "gender": "male",
      "id": "2802995"
    },
    {
      "gender": "male",
      "id": "2803931"
    },
    {
      "gender": "male",
      "id": "2804521"
    },

Fleshing out the FQL query to SELECT sex FROM user WHERE uid IN (SELECT uid2 FROM friend WHERE uid1 = me()) returns the “sex:value” pairs without the id number, so despite its complexity, FQL may give a more easily processed response.

But enough about me. As long as you have an active access token, you can also peek into other people’s Facebook profiles. If you know someone’s user id or user name, you can get their public information by plugging the identifier into GraphAPI as in /2800366 or /Rachel.Andersen.Niaffi:

{
  "id": "2800366",
  "first_name": "Rachel",
  "gender": "female",
  "last_name": "Andersen",
  "link": "https://www.facebook.com/Rachel.Andersen.Niaffi",
  "locale": "en_US",
  "name": "Rachel Andersen",
  "updated_time": "2014-03-04T08:52:41+0000",
  "username": "Rachel.Andersen.Niaffi"
}

FQL requires a number (and for sake of illustration, a field), SELECT name FROM user WHERE uid = 2800366:

{
  "data": [
    {
      "name": "Rachel Andersen"
    }
  ]
}

For a change of pace, the New York Times of Spain is called El País. It has a Facebook profile, which you can find by searching for “El Pais” in Facebook. The user name of the profile is – you guessed it – “elpais”. You can plug it into GraphAPI to get:

{
  "id": "8585811569",
  "about": "EL PAÍS ofrece noticias de última hora y toda la actualidad nacional, internacional, economía, deportes, sociedad, viajes. Y mucho más. http://www.elpais.com/",
  "can_post": true,
  "category": "News/media website",
  "checkins": 0,
  "company_overview": "El País, el periódico global en español",
  "cover": {
    "cover_id": "10151994667946570",
    "offset_x": 0,
    "offset_y": 26,
    "source": "https://fbcdn-sphotos-h-a.akamaihd.net/hphotos-ak-frc3/t1.0-9/s720x720/10153020_10151994667946570_7064500975385090380_n.jpg"
  },
  "founded": "4 de mayo de 1976  http://www.elpais.com http://eskup.elpais.com http://www.twitter.com/el_pais http://www.tuenti.com/elpais http://www.netvibes.com/elpais http://www.youtube.com/elpaiscom",
  "has_added_app": false,
  "is_community_page": false,
  "is_published": true,
  "likes": 995007,
  "link": "https://www.facebook.com/elpais",
  "name": "El País",
  "products": "Eskup: http://eskup.elpais.com\nTwitter: http://www.twitter.com/el_pais\nTuenti: http://www.tuenti.com/elpais\nGoogle+: https://plus.google.com/103019117518606328359/\nNetvibes: http://www.netvibes.com/elpais\nYoutube: http://www.youtube.com/elpaiscom\n",
  "talking_about_count": 105413,
  "username": "elpais",
  "website": "www.elpais.com",
  "were_here_count": 0
}

These are different from the user fields, because this is actually a Facebook page, though there is no explicit indication thereof. FQL is stumped to produce all of these fields from SELECT ??? FROM page WHERE page_id = 8585811569.

Narrowing the query to a single field in GraphAPI via /elpais?fields=about and in FQL via SELECT about FROM page WHERE page_id = 8585811569 returns:

{
  "data": [
    {
      "about": "EL PAÍS ofrece noticias de última hora y toda la actualidad nacional, internacional, economía, deportes, sociedad, viajes. Y mucho más. http://www.elpais.com/"
    }
  ]
}

14.2.1. Summary of the dueling protocols

table:

Graph API FQL Query
/me no equivalent
/me?fields=last_name SELECT name FROM user WHERE uid = me()
/me?fields=gender,languages,timezone SELECT sex,languages,timezone FROM user WHERE uid = me()
/me?fields=statuses no equivalent
/me?fields=statuses.fields(message) SELECT message FROM status WHERE uid = me()
/me/friends no equivalent
/me/friends?fields=gender SELECT sex FROM user WHERE uid IN (SELECT uid2 FROM friend WHERE uid1 = me())
/elpais no equivalent
/elpais?fields=about SELECT about FROM page WHERE page_id = 8585811569

14.3. How to send GraphAPI queries with your web browser

GraphAPI was designed to accept queries over the Internet. By “over the Internet”, I mean using the Hypertext Transport Protocol, or HTTP, or its secure version, HTTPS, to send a query and return its response. The particular way in which GraphAPI does this is called representational state transfer or REST. Representational state transfer involves the construction of a uniform resource locator (URL) from a uniform resource identifier (URI) and some additional information. The uniform resource identifier is the Internet name for a web service. The URI of GraphAPI is https://graph.facebook.com/. For the specifics of a query, GraphAPI needs three more pieces of information. The first is an identifier – a number or a name – for the object to be queried. For your first try, you can use me, which is an abbreviation for the id number of the user currently logged onto Facebook. This is why I ask you to log on to it as the first thing that you od. The second is a field to be queried, introduced by ?fields=. If it is left empty, the default is the “status” field, which includes all of your public personal information. It can be made explicit with ?field=status. The final and crucial bit is your user token, introduced by &access_token=. As a RESTful web service, a query must also include one of the four HTTP methods, GET, PUT, POST, or DELETE, to tell the GraphAPI what to do with the information. GET asks for a response and is implicit in the query. The outcome is considered a uniform resource locator, even though it does not end with an abbreviation for a media type, like .gif or .html. By default, GraphAPI returns an object of the JSON type.

You can put all of this together to give it a try by typing the following into the address bar of your web browser: https://graph.facebook.com/me?fields=&access_token=YOUR_TOKEN_HERE. You should immediately get a response, which fills the window of your web browser. Mine is:

{
 "username": "harry.howard.581",
 "first_name": "Harry",
 "last_name": "Howard",
 "verified": true,
 "name": "Harry Howard",
 "locale": "en_US",
 "gender": "male",
 "updated_time": "2013-04-24T15:46:22+0000",
 "languages": [
  {
   "id": "113301478683221",
   "name": "American English"
  },
  {
   "id": "312525296370",
   "name": "Spanish"
  }
 ],
 "quotes": "\"Richard Hamming\u2019s three questions for new hires at Bell Labs:\r\n\r\n1- What are you working on?\r\n2- What\u2019s the most important open problem in your area?\r\n3- Why aren\u2019t 1 & 2 the same? (Ouch!)\r\n\r\n\u201cYou and Your Research\u201d --- Richard Hamming (1986)",
 "birthday": "11/01/1957",
 "link": "https://www.facebook.com/harry.howard.581",
 "timezone": -5,
 "id": "2811272"
}

The response is a dictionary of items (key:value pairs), though a value can be a list of dictionaries.

14.4. How to send GraphAPI queries with requests

You will soon grow tired of typing complex queries into your web browser, and you can’t really do anything with the response. The Python package “requests” was designed to let Python do most of this tedious work for you. It is part of the Canopy and Anaconda distributions, so go ahead and import it in Spyder, import requests.

14.4.1. How to send GraphAPI queries with the GraphAPI protocol

the first step is to break down the different parts of the GraphAPI URL into variables for easy access, and then join them together into a string that looks like what you typed into your web browser. Go ahead and put these together either in a Python script or Spyder’s interactive console as below. I will explain “json” in just a moment:

import requests, json
fb = 'https://graph.facebook.com/'
id = 'me'
campo = ''
accessToken = 'tu token aqui'
url = fb+id+'?fields='+campo+'&access_token='+accessToken

By leaving campo empty, you request your public profile information.

Requests sends the url to GraphAPI and collects the response. The Python package json helps to make sense of it:

respuesta = requests.get(url).json()
print json.dumps(respuesta, indent=1)

Running this code should give the same response as above.

The rest of the GraphAPI queries in the left column of the table above are encoded as so:

# /me?fields=last_name
id = 'me'
campo = 'last_name'
# /me?fields=gender,languages,timezone
id = 'me'
campo = 'gender,languages,timezone'
# /me?fields=statuses
id = 'me'
campo = 'statuses'
# /me?fields=statuses.fields(message)
id = 'me'
campo = 'statuses.fields(message)'
# /me/friends
id = 'me/friends'
campo = ''
# /me/friends?fields=gender
id = 'me/friends'
campo = 'gender'
# /elpais
id = 'elpais'
campo = ''
# /elpais?fields=about
id = 'elpais'
campo = 'about'

14.4.2. How to send GraphAPI queries with the FQL protocol

A HTTP query to GraphAPI using the FQL protocol consists of only three pieces of information, since the user is incorporated into the query: GraphAPI’s address, the query in FQL, and your access token, joined together to make a string with the same format as what you input into the Graph API Explorer. Go ahead and concatenate these either in a Python script or Spyder’s interactive console as below:

import requests, json
fb = 'https://graph.facebook.com/'
consulta = 'SELECT name FROM user WHERE uid = me()'
accessToken = 'tu token aqui'
url = fb+'fql?q='+consulta+'&access_token='+accessToken
respuesta = requests.get(url).json()
print json.dumps(respuesta, indent=1)

The response should look just like what you got back from the Graph API Explorer. For me, it is:

{
 "data": [
  {
   "name": "Harry Howard"
  }
 ]
}

The rest of the queries from the right column of the table above are sent merely by substituting them into consulta as so:

consulta = 'SELECT sex,languages,timezone FROM user WHERE uid = me()'
consulta = 'SELECT message FROM status WHERE uid = me()'
consulta = 'SELECT sex FROM user WHERE uid IN (SELECT uid2 FROM friend WHERE uid1 = me())'
consulta = 'SELECT about FROM page WHERE page_id = 8585811569'

14.5. How to send GraphAPI queries with facepy

To interact with GraphAPI in Python, it currently appears that the best choice is the package called facepy. Go ahead and download it to your installation in Terminal with:

$ pip install facepy

To recap our brief introduction to FQL, type the following code into a Python interpreter or even better, make it into a script:

from facepy import GraphAPI
accessToken = 'tu Token'
graph = GraphAPI(accessToken)

To submit a GraphAPI query, such as our basic /me, use graph.get() and put the query in the parentheses, like so:

graph.get('me')
graph.get('me?fields=first_name')
graph.get('me?fields=gender,languages,timezone')
graph.get('me?fields=statuses')
graph.get('me?fields=statuses.fields(message)')
graph.get('me/friends')
graph.get('me/friends?fields=gender')
graph.get('elpais')
graph.get('elpais?fields=about')
graph.get('elpais/feed')
graph.get('elpais/feed?fields=link')
graph.get('elpais/feed?fields=comments.fields(message)')

Facepy has a parameter that turns on (first level) pagination and returns a “generator object”:

graph.get('elpais/feed?fields=comments.fields(message)', 'paginate=True')
<generator object paginate at 0x107f5daa0>

But you can’t do anything with it directly; it has to be unpacked with a loop, as if it were a list. And here things get a little complex. Recall that what we want is the text in the message item, but it is at the bottom of a hierarchy. At the top of the hierarchy is the page, which is a dictionary with paging and data items. Data is a list of dictionaries with created_time, id, and comments items. Comments is a dictionary with paging and data items. Data is, once again, a list of dictionaries, but here with message and id items. This arrangement can be displayed hierarchically as the first object below; after it comes the linear version that is closer to how Python looks at it:

page = {paging, data}
        data = [{created_time, id, comments}]
                comments = {paging, data}
                        data = [{message, id}]

page {paging, data[{created_time, id, comments{paging, data[{message, id}]}}]}

The generator object adds one more layer on top, essentially a list of pages. Augmenting the hierarchy above with this layer and rearranging to reveal the fundamentals produces the hierarchy below, plus its linearized version, where “#” indicates the number of members of a list:

[generator
        {page
                [data
                        {comments
                                [data
                                        {message}
                                ]
                        }
                ]
        }
]

[page#['data'][comment#]['comments']['data'][message#]['message']]

So to drill down to the message text entails running through three for loops, plus accessing the intermediate dictionaries. The script below accomplishes this, but be careful if you run it. Since it has no limit, it will try to get every message in El País‘s comment feed. This could be a lot, and it could end on an error. It is better to save the messages to a text file, but I will it up to you to add the relevant code:

from facepy import GraphAPI
import json

accessToken = 'YourTokenHere'
graph = GraphAPI(accessToken)

id = 'elpais/feed'
consulta = 'comments.fields(message)'
url = id+'?fields='+consulta
pages = graph.get(url, 'paginate=True')

messages = []
for page in pages:
    for commDict in page['data']:
        if 'comments' in commDict:
            for messDict in commDict['comments']['data']:
                messages.append(messDict['message'].encode('utf8'))

print 'Number of messages is '+str(len(messages))
print json.dumps(messages[:20], indent=1)

Finally, an example of a FQL query and response ends up this chapter:

graph.fql('SELECT name, languages, birthday FROM user WHERE uid = me()')
{u'data': [{u'languages': [{u'id': 113301478683221, u'name': u'American English'}, {u'id': 312525296370, u'name': u'Spanish'}], u'birthday': u'November 1, 1957', u'name': u'Harry Howard'}]}

14.6. Summary

14.7. Further practice

14.8. Further reading

14.9. Appendix

14.9.1. FQL queries

Click Submit to receive the response:

{
  "data": [
    {
      "name": "Harry Howard",
      "languages": [
        {
          "id": 113301478683221,
          "name": "American English"
        },
        {
          "id": 312525296370,
          "name": "Spanish"
        }
      ],
      "birthday": "November 1, 1957"
    }
  ]
}

Footnotes

[1]See this comparison between GraphAPI and FQL Query in answer to the question FQL vs Graph API - Which is better for basic filtering? at StackOverflow.

Last edited: April 30, 2014