Saturday, 3 November 2012

Monitoring Facebook and Twitter Data

Update on SocialSamplr

Been a while since my last post, but I've been very busy developing our social media scoring platform so it's been hard to find the time to write much - but here's a new post on using memcache and big query on App Engine  Things have been progressing very well and socialsamplr is now on-boarded to the Creative HQ business incubator here in Wellington NZ and we're looking forward to accelerating things a lot from here on in.
We've been following the "lean startup" process and think we've got the information now to build a pretty cool and unique minimum viable product so stay tuned on how it progresses.  If anyone is interested in discussing what the product is going to offer or to even be beta testers please contact me at  Also, being a cash-strapped startup, if anyone's interested in the source code for the Google apps script stuff I was working on last year I'd be happy to provide this for a small fee along with some training data to create your own sentiment engine (kind of a "build your own sentiment engine starter pack").  Again, just drop me line and we can work something out (here's the link to see that in action

Monitoring Facebook and Twitter Data from Google Apps Script

This week I'll be covering how to set up automated monitoring for Facebook and Twitter data from Google Apps Script.  Then by using Google Charts, Big Query and Google Sites you can easily set up a web application to report on the data.

Set up your Twitter and Facebook Applications

First thing to do is set up your Twitter and Facebook applications (for Facebook see an earlier blog post).  This is so you can get the client ID and client secret you'll need to get OAuth tokens to read the data you want.  A couple of key points to note:
  • To set up Facebook monitoring of users' wall or news feed you'll need to request the additional read_stream permissions (passed in under "scope" in the query string).  Also, in my case I'm sending email notifications as part of the functionality I'm offering through my site so I request the email address as well.  So the request url winds up looking something like this (fbmonitoring being the page we're redirecting back to):,email
  • Make sure you read clearly and stick to the rules of the road for both Twitter and Facebook.  In particular, if you're using the rest API for Twitter you need to be aware of quota limits on the number of requests you make and how you process results.  Break the rules too much and you risk being blocked from using the API any further.

Running the Automated Monitoring

The automated monitoring for Facebook and Twitter share some similarities and, depending on the type of monitoring you're doing also have some differences.  Also there are some key restrictions to running "server-side" processes in Google Apps Script that you need to be aware of.  Mainly this is around the 5 minute execution timeout, inability to create a "hanging get" connection when connecting to another web server via UrlFetch and the odd unexplained error you receive (my personal favourite being "backend error").
In order to overcome some of these limitations I recommend following these guidelines.

  • Make use of the ScriptDb to track exactly where you are in your monitoring process so the next time the script runs it can just pick up from where it left off.  See the code below for an example of how I've achieved it.  I created a function which is run at the beginning of each script which moves to the last row which was processed from the previous run.

  • Utilise parallel processing for your apps script processes.  By this I mean for a given process (for example Twitter monitoring) due to the 5 minute execution time limit I can only monitor a very limited amount of Twitter data in that time.  By splitting my monitoring across a series of parallel processes (what I've called streams) I can utilise multiple apps scripts to do the monitoring.  Obviously this does result in a somewhat more complex structure to the scripts but it does build in a level of scalability.
  • Build in redundancy to your processes to ensure data is not missed when the delightful "backend error" is encountered.  By this I mean make each individual task you're performing atomic and log it to Script db so if there's a failure on the next task it'll know to re-run that task, but not those beforehand that were executed successfully.
  • By making heavy use of Script DB for automated processes make sure you have daily or weekly "clean-up" processes that run to ensure the database doesn't get too full.  Any real data you're storing should be in some other data store (in my case Big Query - again this is another batch process run through Google apps script - keeping it in the family).  The following is a good generic function to remove data from your Script DB as a batch process - in this case removing old Facebook sessions from the database.

Sample Code for Monitoring Twitter and Facebook

This section is basically just showing some straight-forward code to process tweets and facebook messages.  In my case I then use the content for sentiment analysis but can obviously be used for any other downstream process.  Note the content comes back as a JSON string which be parsed to get all the components returned (such as geo location etc.).  Note in the code below it uses the built in OAuth 1 support - passed in the "options" in the UrlFetchApp.fetch method.  For Facebook we're using the token received after authenticating through OAuth 2.  For a full breakdown of Facebook Query Language (FQL) which I use to get data from Facebook, check this link  The urls sent to Facebook have the access token tacked on the end (not visible in the screen shot below).



Next post..

So that's it for now. The next post may be a bit delayed as I'm moving my application to Python and Google App Engine in preparation for a fully "commercial release" of my sentiment analysis engine - so there should be lots of interesting material to write about once that's done.  In the meantime all of the above can be seen working at  

Over and out.
Footnote:  Nice to see the sentiment engine tracking our "foot-in-mouth" PM this week...

No comments:

Post a Comment