How Cine.io Uses Node.js and IronWorker to Handle Their Background Processing
The following is a guest blog post by Thomas Shafer describing how cine.io deploys their workers within Iron.io to handle all of their background processing.
Cine.io is the only true developer-focused live streaming service. We offer APIs and SDKs for all mobile platforms and language frameworks and let developers build and ship live-streaming capabilities with just a few simple steps.
We have many background tasks to run and use Iron.io for all of our long-running and resource-intensive processing. These types of tasks include video stream archival processing, customer emails and log processing, and bandwidth calculations.
Over the course of working with the IronWorker service, we developed a few approaches to make it easy for us to integrate distributed processing into our Node.js application framework and maintain consistency with our main service across the full set of workers. The primary strategy is to use a single shared worker space.
Single Shared Worker Space
As an example of the approach we use, when we process logs from our edge servers, we need to gather log file data and attach each bandwidth entry to a user's account. To do this, we need credentials to access multiple remote data sources and follow some of the same logic that our main cine.io dashboard application uses.
To maintain logical consistency and a DRY architecture, we upload many shared components of our dashboard application to IronWorker. Our dashboard application shares code with the central API application to ensure logical consistency – as Wikipedia puts it "a single, unambiguous, authoritative representation within a system".
Because some of our background tasks require shared code between our dashboard application and our API, we decided to structure our IronWorker integration with a single .worker class, titled MainWorker.
MainWorker Serves as the Primary Worker
We use one worker to perform a number of tasks and so it needs to be flexible and robust. It needs to be able to introspect on the possible jobs it can run and safely reject the tasks it cannot handle. One way to make it flexible is to unify the payload and reject any attempts to schedule a MainWorker that does not follow the expected payload format.
A good way to enforce a predictable format is to, once again, share code. Whether it's the dashboard, API, or another MainWorker, they all use the same code to schedule a job.
Our MainWorker payload follows the following format:
{
configuration: {
// configuration holds sensitive variables
// such as redis credentials, cdn access codes, etc.
},
jobName: "",
// The name of the job we want to run
// MainWorker understands which jobs are acceptable
// and can reject jobs and notify us immediatly on inadequate jobNames
source: "",
// source is the originator of the request.
// This helps prevent unwanted scheduling situations.
// An example is preventing our API application
// from scheduling the job that sends out invoices at the end of the month.
// That job is reserved for IronWorker's internal scheduler.
jobPayload: {
// the payload to be handled by the job, such as model ids and other values.
}
}
The jobs folder we uploaded contains the code for every specific job and is properly included by the MainWorker, which is written in node. Here's a look at the .worker file for MainWorker
runtime 'node'
stack 'node-0.10'
exec 'main_worker.js'
dir '../models'
dir '../config'
dir '../node_modules'
dir '../lib'
dir '../jobs'
dir '../main_worker'
name 'MainWorker'
Benefits of Our Approach
After working with this setup for a while I'm convinced the advantages of a single shared space is the way to go.
Continuous Iron.io Deployment
By throwing our IronWorker jobs into the same codebase as our API and dashboard application, I know our logic will be consistent across multiple platforms. This allows us to integrate IronWorker with our continuous integration server. We can update every platform simultaneously with the most up-to-date code. With this approach, there is no way that one-off untested scripts can make their way into the environment. We update code on Iron.io through our CI suite and it's up to the developer, code reviewers, and our continuous integration server to validate our code. Everyone has visibility into what is on the Iron.io platform.
Consolidated Reporting
Flexible Scheduling
The job payload has a rigid structure but we can share the library for scheduling jobs. That library will be responsible for sending the appropriate structure with the necessary configuration variables, jobName, source, and jobPayload.
One Drawback to the Approach
There is a drawback with using a single shared space for our workers. When we look at jobs, whether running or queued, all we see is "MainWorker, MainWorker, MainWorker". We cannot use the dashboard to tell which jobs are taking a long time and therefore lose some of the usefulness of the Iron.io dashboard. (Note: If IronWorker were to allow tags or addition names that would go along way towards giving us visibility. I hear it's on the roadmap so let's hope it makes in it sometime soon.)
Conclusion
Deploying a shared environment to Iron.io has enabled our development team to focus on delivering customer value in a rapid and high quality manner. We can easily test our job code, ensure Iron.io has the most up to date code, and handle fixing any production errors promptly.
About the Author
Thomas Shafer is a co-founder of cine.io, the only developer-focused live streaming service available today. He is also a founder of Giving Stage, a virtual venue that raises money for social and environmental change. (@cine_io)
To see other approaches to designing worker architectures, take a look at how Untappd uses a full set of task-specific workers to scale out there background processing in this post. Also, be sure to check out this article on top uses of a worker system.