A worker system is an essential part of any production-scale cloud application. The ability to run tasks asynchronously in the background, process tasks concurrently at scale, or schedule jobs to run on regular schedules is crucial for handling the types of workloads and processing demands common in a distributed application.
At Iron.io, we’re all about scaling workloads and performing work asynchronously and we hear from our customers on a continuous basis. Almost every customer has a story on how they use the IronWorker platform to get greater agility, eliminate complexity, or just get things done. We wanted to share a number of these examples so that other developers have answers to the simple question “How do I use a worker system?” or “What can I do with a task queue?”
The following list is a pretty powerful set of examples. We’re confident there are uses here that every developer can benefit from. If you see any common ones that are missing, though, be sure to let us know and we’ll add to the list.
1. Image Processing
Pictures are a critical piece in consumer applications. If you’re not making use of them in your app then you’re missing out on ways to capture users and increase engagement. Nearly every use of photos requires some element of image processing whether that’s resizing, rotating, sharpening, watermarks, thumbnails, or otherwise. Image processing is, more often than not, compute-heavy, asynchronous in nature, and linearly scaling (more users mean more processing). These aspects all make it a great fit the flexible and elastic nature on IronWorker.
The most common libraries for image processing we see in IronWorker are ImageMagick, GraphicsMagick, and LibGD. These packages are easy to use and provide some incredible capabilities. It’s easy to include them within a worker and then upload to IronWorker. The beauty of this use case is that image processing is typically an atomic operation. An image is uploaded, processed, and then stored in S3 or another datastore. There may be call-backs to the originating client or another event might be triggered but the processing is isolated and perfect for running within a distributed and virtual environment. Scaling something like this is IronWorker is as simple as sending IronWorker more tasks – very little additional work for developers in return, almost limitless scale.
2. Web Crawling | Data Extraction
The web is full of data — from social, to weather, to real estate, to bitcoin transactions, data is available to access, extract, share, create derivatives, and transform in any number of ways. But crawling and extracting data from the web requires lots of concurrent processes that run on a continual or frequent basis. Another great fit for background processing and IronWorker.
Several great code libraries exist to help with webcrawling including packages such as PhantomJS, CasperJS, Nutch, and Nokogiri – all of which run seamlessly on the IronWorker platform. As with image processing, web crawling is essentially a matter of including these packages within your worker, uploading them to IronWorker, and then crawling at will.
There might be a sequence of steps – grab a page, extract links, get various page entities, and then process the most important ones – in which case, additional workers can be created and chained together. To give you a good idea of what’s possible here, we’ve written several examples and blog posts that you can find here and here.
3. Sending Push Notifications
A push notification is a message sent from a central server (publisher) to an end device (subscriber). The two most common platforms for sending push notifications are the Apple Push Notification Service for iOS, and the Google Cloud Messaging for Android.
Push notifications tend to go out in batches. For example, a breaking news alert might be sent to millions of subscribers. Notice of a flight delay might be sent to thousands of flyers. Sending these notifications out in serial batches takes way too long. A better architecture is to use IronWorker to deliver these push notifications through APNS and GCM in parallel. This approach also lends itself to processing the lists on the fly to either dedup lists or offer customized messages.
With a news alert, for example, you could spawn up 1000 workers in parallel that would each send 1000 batches of notifications serially. This would reach over a million news subscribers in the time it took to process a single set. This is a huge advantage on delivery speed and a capability that would be hard to create and manage on your own. With IronWorker, it’s a relatively simple matter to get this type of concurrency and throughput.
4. Mobile Compute Cloud
Mobile applications push a lot of the processing off the device and into the background. Services and frameworks like Parse and Firebase allow for rapid mobile app development by providing backend services such as user management and mobile app-centric datastores.
But these frameworks don’t work so well when it comes to providing processing capabilities. (Parse Cloud Code, as an example, provides a number of capabilities but falls short in many ways). Processing lots of evented data is where IronWorker shines.
Data can be put on a message queue directly from mobile apps and then workers in IronWorker can be running to continually process events from the queue. The processing that’s performed is entirely dependent on the needs of the app.
Alternatively, the mobile frameworks mentioned above also allow connections to HTTP webhooks. You can make these endpoints point to workers which can then be kicked off to perform actions. Using IronWorker as an asynchronous (and almost serverless) processing engine making building powerful mobile applications a breeze.
5. Data Processing
“Big data” is certainly a hot topic these days and Hadoop is a common answer. But all data is not “big” and even when it is, many “big-data” problems don’t work well with a map-reduce model. A couple of supporting articles on this theme can be found here and here.
In the end, a large amount of “big data” use cases essentially boil down to large scale “data processing” and IronWorker is made for this. Let’s say you have a big list of zip codes and need to pull weather data from a weather API as well as population data from a different API which times out after 10 concurrent connections. Traditional “big data” solutions are simply too complex to manage situations like this. IronWorker provides a flexible but still massively parallel way to accomplish this.
You can run tasks in parallel as well as perform complex workflows. High concurrency can be brought to bear so that 1000s of processes can run at a single time. Alternatively, you can put constraints on the processing so that only a limited number of workers run at a single time. In the case above, setting a max concurrency would ensure that you don’t exceed the 10 connection limit on the population API.
As with web crawling, tasks can be chained together and results stored in a cache or other datastore or placed on a queue for additional processing or aggregating results. The Iron.io platform is flexible and powerful enough to process almost any type of data – big, small, hot, cold, or anywhere in between.
6. Sending and Receiving SMS | Making Phone Calls
Many applications take advantage of SMS and voice calls at the programmatic level, thanks to SaaS telephony providers such as Nexmo and Twilio. For example, an app might allow customers to sign up for SMS alerts on product pricing changes.
It’s a snap for developers to use SMS services to build these capabilities into the app but the process to check and send this price change alert is inherently something that runs in the background – which means another use case almost specifically designed IronWorker.
In this example, workers can be scheduled in IronWorker to run on regular schedules to query for price checks. The entire set of checks can be distributed over a number of concurrent tasks. When a price changes, the worker detects this, and sends the SMS using the Twilio API. Simple and easy.
Another example here is to use IronWorker on the receiving end of texts. Instead of having an application continually running to receive and process them (and then scaling the app to handle the load), you can use IronWorker on the receiving end of a Twilio webhook. The code needed to process that event can be the sole component of the worker and IronWorker will scale out to handle the workloads coming in. Here’s a blog post that provide some details on using Iron.io and Twilio.
7. Sending Emails (SendGrid, Mailgun, Mandrill)
Sending email from within an application is easy given the SMTP services available. But any emails that are sent almost always require some element of pre-processing and coordination. If this processing is done serially, it can take a long time to deliver an even modest number of emails.
A common pattern, for example, is to send a nightly report to customers. Say you have 1M customers. Generating this report might access your database, make use of 3-4 different APIs, and then finish by sending an email using SendGrid, Mailgun, and Mandrill. If each customer took 10 seconds to process, serial execution would take 115 days to send this report. Obviously, not something that can be done without some element of scaling. Enter IronWorker.
One approach to addressing this need for throughput is have a single “master” task that orchestrates the amount of work to perform and then kicks off a number of “slave” workers that each take a slice of the work. By taking advantage of concurrent running processes on a large scale, you can bring the runtime down from hours to minutes. Here’s a blog post on the base approach as well another blog post with more details on using Iron.io with SendGrid to do some powerful things around sending emails.
8. Replacing CRON with a More Reliable Cloud Scheduler
Another popular use cases of IronWorker is to use it to schedule tasks, whether it’s a one time task or something that needs to run on a continual basis. The traditional way of handling scheduled jobs has been to use CRON. This approach, however, doesn’t translate well to cloud applications. CRON jobs running on a lone server (virtual or not) is difficult to maintain and can represent a single point of failure.
The scheduling capability within IronWorker is simple to use and offers greater reliability than CRON jobs. You can incorporate all of the processing within the scheduled task or you can have this task kick off other workers (a more common use). All of the processing in this article can be done as a result of a direct user events or they can be triggered on a particular schedule. Using IronWorker, you can eliminate the need to monitor or manage your own scheduling infrastructure.
9. Processing Webhook Events
Webhooks are by definition “user-defined HTTP callbacks”. Another way of putting it is that they are HTTP calls triggered by an event that hit a specific URI to alter or create some action. Webhooks are extremely useful when designing modern distributed applications as they allow event-based patterns and more real-time actions and flows.
IronWorker has native support for webhooks which means you can upload a task and then have it be an endpoint for a webhook. The task will run when invoked via a call and then it will terminate. The beauty here is that there’s no need to have an underlying application – just the code for the event uploaded in IronWorker. You provide the event logic and we provide almost unlimited processing and scale.
Here are a few examples:
- GitHub events trigger a webhook to an IronWorker that checks out the repo and performs some action
- Stripe transaction webhook fires off an IronWorker to process that transaction
- Twilio can send a webhook to trigger an IronWorker when an SMS is received
10. Adopting a Service Architecture
Application frameworks like Ruby on Rails and Python/Django have by all means increased development speed. But cloud apps are becoming much more complex – there are more interfaces, more pressure to reduce response times, and more processing around each user action and event.
This means there is a need to shift from tightly coupled monolithic app structures to more scalable evented models. One where a large amount of processing is done in the background and where workers and message queues are used to orchestrate this asynchronous processing. Rather than thinking of the request and response as the lifecycle of your application, many developers are thinking of each request loop as just another set of input/output opportunities.
The idea is to reduce core processes to reusable pieces but have them operate independent of any other pieces. You connect these “services” via message queues and worker systems. The benefit is each service will be independent of any other processes and so they can more easily change and adapt without adversely impacting other parts of the system. Read our blog post on the end of the monolithic app and the rise of cloud-based service architectures.
Using Workers Will Change the Way You Think – and Program
A worker system is a key part of any scalable architecture – it provides flexibility and scale and reduces complexity . But getting beyond the technical merits, using workers just makes developing much more productive and way more fun.
The ability to push task-specific code in almost any language to the cloud and then run it reliably at scale opens up a new world of options.
We guarantee, once you realize the power of a scalable worker system, it’ll be hard to go back to the old way of doing things.
To learn more about how IronWorker can help your app effortlessly scale to thousands of concurrent workers, visit Iron.io today.
And if you’re doing something cool with IronWorker, let us know. We’ll add it to the list, tell others (and send a t-shirt).
Top 10 Uses For A Message Queue
We have a related article on the uses of a message queue titled Top 10 Uses of a Message Queue. A message queue will typically backend a worker service (as our IronMQ does in the case of IronWorker) but has additional uses/benefits including asynchronicity, load buffering, database offloading, decoupling, and more.