Serverless PhantomJS with IronWorker
PhantomJS is a headless WebKit with Javascript API. From it’s website, “It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.” In other words, PhantomJS is a great solution for things like web crawling/scraping, headless website testing, etc. The perfect match for IronWorker.
I’ll expand. IronWorker is a worker system, also known as a task queue. It’s designed to run scheduled tasks, asynchronous jobs passed from your apps, and oftentimes heavily parallelized work. The power of IronWorker is easily seen after you’ve used it, so here’s a tutorial to get up and running with PhantomJS in just a few minutes.
- Sign up for a free account at Iron.io
- Create your first project. This is your workspace and can be used for all Iron.io services. We’ll be using IronWorker.
- Install our iron_worker_ng gem (gem install iron_worker_ng). At this time, the recommended way to manage your workers is using our command line interface tool, which is offered as a Ruby gem. This doesn’t mean you’ll be writing any Ruby though.
Step 2: Configure
- Grab the files from this folder: https://github.com/iron-io/iron_worker_examples/tree/master/binary/phantom-nodejs
- Add your credentials to the iron.json file like so:
You can find your credentials by clicking the icon shown in the picture below:
And for the final step, simply call the queue command on the crawler worker.
$ iron_worker queue crawler
The sky’s the limit with PhantomJS and IronWorker. Feel free to grab all the code from our iron_worker_examples repo and start building yourself!
Jump into our chat room which we like to say is generally monitored by engineers 23/6.8. All questions/feedback welcome.
Visit the PhantomJS website and get involved in the community.
Learn more at our new Developer Center.
Thanks and let me know what you thought… was this useful?
Broken link to the github repo.