Hello world, we're two aspiring hackers from UC Berkeley: Sabrina Atienza and George Ramonov. We comprised Team Healthify at xHack 2012 held this past June. From the start, we were prepared to fulfill the stereotype of Berkeley crazy. Our vision for the hackathon appeared overly ambitious, borderline insane for only twenty-four hours, a likely dysfunctional mess, doomed to elicit a few chuckles at best.
But that wasn't the case and here's how it went:
The grand vision was to create a social media surveillance system that discovers meaningful Twitter content in the midst of noisy data. In our vision and our final version of Healthify, the HTML5 mobile/web application pulls thousands of tweets from Twitter, applies machine learning algorithms and natural language processing to analyze the relevance of tweets to keywords in a customizable dictionary, geolocates those tweets deemed relevant, plots them on a Google Map with distinct markers for distinct keywords, and then displays an information window for each marker, revealing the text of each tweet for audiences to read, gawk at, gasp at.
As an added bonus, spurred in part by hack prizes, we developed a generalized Greasemonkey plug-in that lets users embed interactive Google Maps into via.me posts, which in our case displayed the Healthify map of infectious disease tweets.
We spent the majority of our time at the onset writing a single Python script to aggregate Twitter data, analyze the relevance of tweets to our chosen dictionary of infectious disease terms, and geolocate relevant tweets via Google Maps API.
After finishing this milestone in our project, we discussed the suite of company-sponsored prizes and their corresponding company APIs, arriving at a sound conclusion: create a Heroku app using the MongoDB add-on for storage as well as the Iron.io's IronWorker service for massively parallel aggregation and processing. While these three deployment decisions are quick and easy to execute, by far the decision to utilize IronWorker reaped the greatest benefits.
At its core, a social media surveillance system necessarily entails crunching tons of social data in pursuit of meaningful content. To this end, IronWorker prove indispensable. We followed the Python tutorial to set up and schedule workers to run our single, multipurpose script at regular intervals.
Within an hour, we deployed a set of five workers, all processing Twitter data in parallel, each assigned a unique subset of terms from our dictionary of infectious diseases. Since the uploaded code is available for download from the Iron.io dashboard, we easily verified that we distributed the dictionary of terms correctly.
In real-time, we witnessed workers running our Python script, aggregating thousands of tweets and analyzing their relevance, consequently populating our MongoDB storage with recent tweets across the US, all highly relevant to infectious diseases.
The critical advantage of using Iron.io is that we were able to distribute the crawling and acquire significantly more data for our analytics. In under ten hours, Healthify processes over 100,000 tweets and determined that nearly 10,000 are relevant to bio-surveillance!
In very little time, we utilized the Iron.io distributed platform to empower Healthify with tremendous processing power, consuming large amounts of Twitter data with ease. Without Iron.io, we could not have gathered such a large data set within the timeframe of hackathon.
In greater perspective, IronWorker augmented our team's efficiency by eliminating hours of painful installations and configurations, replacing it with less than an hour of work to help us meet and even exceed our grand vision.
|Sabrina Atienza and George Ramonov|
(after a 24 hours of programming)
Altogether, IronWorker let Healthify deliver fresh updates on disease predictions. To our utmost satisfaction, we presented a fully functional prototype of our social media bio-surveillance system at the hackathon.
Team Healthify thanks you, Iron.io!
Healthify, Iron.io, and overly ambitious ideas FTW!