Blog

Search This Blog

Loading...

Wednesday, July 11, 2012

Serverless PhantomJS with IronWorker



PhantomJS is a headless WebKit with Javascript API. From it's website, "It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG." In other words, PhantomJS is a great solution for things like web crawling/scraping, headless website testing, etc. The perfect match for IronWorker.

Q: So why do I need IronWorker? 
A: Fast, easy, scalable, parallel, "serverless" implementation.

I'll expand. IronWorker is a worker system, also known as a task queue. It's designed to run scheduled tasks, asynchronous jobs passed from your apps, and oftentimes heavily parallelized work. The power of IronWorker is easily seen after you've used it, so here's a tutorial to get up and running with PhantomJS in just a few minutes.


What's about to happen...

You'll be using the IronWorker command line interface to build and ship two workers to the IronWorker platform.

crawler loads up the PhantomJS library and crawls Google Maps for pizza places in San Francisco. For each match found, that worker then kicks off a second worker,

processor then follows the link, grabs a screenshot of the page, and posts it to the imgur anonymous API.

All without ever dealing with a single server or installing any software or packages yourself. Pretty cool huh?

The builders are separate workers that kick off automatically when you upload your workers and the system detects in the .worker file that you called the build() command.

Let's get started!



Step 1: The Basics
  1. Sign up for a free account at Iron.io
  2. Create your first project. This is your workspace and can be used for all Iron.io services. We’ll be using IronWorker.
  3. Install our iron_worker_ng gem (gem install iron_worker_ng). At this time, the recommended way to manage your workers is using our command line interface tool, which is offered as a Ruby gem. This doesn't mean you'll be writing any Ruby though.




Step 2: Configure
  1. Grab the files from this folder: https://github.com/iron-io/iron_worker_examples/tree/master/binary/phantom-nodejs

  2. Add your credentials to the iron.json file like so:


You can find your credentials by clicking the icon shown in the picture below:



Step 3: Upload

In the same directory as your .worker files, you can simply type the following commands:

$ iron_worker upload crawler
$ iron_worker upload processor

Our command line tool will package up and deploy both the crawler worker and processor worker to the Iron platform.


Step 4: Queue

And for the final step, simply call the queue command on the crawler worker.

$ iron_worker queue crawler

That's it!

Now let's verify that everything worked. Open up the user interface (HUD) and you should now see all of your completed tasks.

Click into the "processor" worker and then the "Log" link on one of the completed tasks to view the output of the processor.



Which should look like this:



Now what?

The sky's the limit with PhantomJS and IronWorker. Feel free to grab all the code from our iron_worker_examples repo and start building yourself!


Jump into our chat room which we like to say is generally monitored by engineers 23/6.8. All questions/feedback welcome.

Visit the PhantomJS website and get involved in the community.

Learn more at our new Developer Center.

Thanks and let me know what you thought... was this useful?