The writers at HighScalability did us one better with their description of a message queue in their repost of the article. Here’s how they described it:
"[Y]ou can find a message queue in nearly every major architecture profile on HighScalability. Historically they may have been introduced after a first generation architecture needed to scale up from their two tier system into something a little more capable (asynchronicity, work dispatch, load buffering, database offloading, etc). If there's anything like a standard structural component, like an arch or beam in architecture for software, it's the message queue."
In this piece, we wanted to take a different route and talk about some of the concrete things you can do with a queue. It’s helpful to talk about separating components, creating non-blocking flows, and arriving at an independent, loosely-coupled architecture, but that still leaves things a bit too abstract. Talking about message and event flow will hopefully provide a bit more grounding.
Processing Requests In the BackgroundOne of the more fundamental actions of a message queue in a modern cloud app is triggering backend processing tasks. For tasks that either don’t need to happen within the user response loop or that can take place concurrently with additional user actions, it makes a lot of sense to send it to the background and process it asynchronously. Examples might be uploading documents, converting photos, processing credit cards, or collecting analytics.
Rather than address actions serially as part of the user response loop, each action can be sent to some sort of worker that takes care of the processing. Common worker frameworks include Celery, Resque, and our own IronWorker. Message queues serve as the core of these worker systems by buffering the task load and providing a mechanism for distributing the work across multiple cores and servers.
Processing Big DataBig data processing is far more than just Hadoop. The map-reduce pattern fits only a portion of large-scale processing needs, and Hadoop can sometimes be more complicated and with a steeper learning curve than is absolutely necessary for the task at hand.
Developers need easy ways to run their code in a highly parallel ways. Message queues in combination with scalable worker systems provide a rich platform to do massive processing without having to master new languages or complicated frameworks. A master or control task can split up data segments into manageable slices and queue up the hundreds or thousands of tasks that need to operate on those slices. These workers can put results in cache or database storage. Additional workers can be queued to consolidate results or do other actions on the derivative data.
Every application with large numbers of users or that is doing any processing in the background needs this type of large-scale parallel processing solution. Hadoop isn’t going away, but using robust message queues and high-throughput task queues to do large-scale parallel processing will address the vast number of big data problems that don’t fit a Hadoop model.
Delaying ProcessingOften actions need to be delayed to allow other, parallel tasks to finish their work first. Message queues (and worker systems) are often a good way to do this. Most queue systems provide a feature to put a message on the queue with a delay. The delays are, more often than not, short-term delays where it makes sense as part of the processing flow to create a distinct action. Any long term delays should probably be handled outside of a queue structure using scheduled jobs.
Buffering for Databases
Many times you need to persist data, but it doesn’t necessarily need to be (or shouldn’t be) persisted as part of the request loop. Etsy’s Statsd is a good example of this scenario: it’s important that stats get persisted, but there are big advantages to bundling a lot of these stats before persisting them. By using a queue, you can ensure that data will get persisted, while still getting the benefits of bundling the data. This lowers the number of database requests, open file handlers, and database load required to persist your data.
Collecting User and Log Events
Another use we’re seeing is the self-collection of user events and log data. While most use services like MixPanel, KissMetrics, Google Analytics, and a host of others to capture user actions and events, they are also looking for even more granular input or to post process the raw input according to their own needs.
Given the ease and affordability of storage and processing, this is not out of reach for many companies, even small ones. The process is pretty straight-forward – put events on one or more queues, use workers to continually process the data, store results in cloud storage, and then use a charting application to display the results or a query interface to create the desired reports.
Collecting Webhook Events
The growing availability of webhooks within many services – GitHub, Twilio, SendGrid, Stripe, and more – makes for a natural use for a message queue. Rather than connect services to an endpoint within an application you’re managing, it’s much more flexible and scalable to connect web events to a message queue you maintain and then process the events from the message queue at your convenience. An application can then be used and scaled up or down as needed--or better yet, a worker system.
Using a message queue eliminates any blocking that might happen when you use an application as the receiver. You can also put routing in place, should the service sending the webhook not provide finer grain events. Webhooks, JSON messages, message queues, and simple async processing are a powerful combination that is likely to create a modern version of the service-oriented architecture – in this case, open, massively distributed, and Internet wide.
Collecting Data from Connected Devices
Sensors, devices and other hardware that is deployed outside of your datacenter needs to be resilient; each device collects important data for your connected system, but is painful and expensive to do maintenance on, and it’s deployed in an uncontrolled environment. By using a queue as a buffer layer, you reduce the device’s functions to the bare minimum: it simply collects the data, then puts it on the queue. Once the data is on the queue, it can be manipulated, persisted in a database, and processed by hardware that operates in a controlled environment, protecting against lost data. (There are other issues to address with connected devices including device registration, auth, and access rights, but those are outside the scope of this article.)
Orchestrating Process Flows
Orchestrating process flows is not just about decoupling processes but about providing a framework to chain work together, handle exceptions, and escalate issues. Task-level processing has become virtualized, which means tasks can execute across any number of servers and even in multiple zones or clouds. Message queues are the key way to not just buffer and scale the background processing, but also coordinate the different tasks and the exceptions that might take place.
Chaining Work Together
Using message queues makes it easy to chain processes together and make complex processes much easier to manage. For example, crawling web sites is core to many web-based businesses – whether it’s monitoring prices on retail sites, analyzing sentiment in posts and reports, or vetting SEO metrics for client sites.
Creating a distributed system with independent tasks handling specific actions allows this processing to scale more easily. One type of task can crawl pages and store them in a cache, another can process the pages and pull out the text, images, links, and other items. Still other processes can process these items, cascading the work along as needed.
Separating the work and using message queues as the glue to connect these discrete tasks also provides greater agility. Adaptations can be made to adapt crawling and page processing approaches quickly. Message queues are ideal for gluing this work together (using workers to perform the work and caches to store temporary results, links, counters, and other interim data).
Message queues are useful for making exception handling easier to address. For example, say that you’re hitting an API to grab data or update a record, and the API is either busy or no record is available. Rather than address the issue in the task itself, a message can be put on the original queue with a delay to see if the issue goes away or after a certain number of times, it can be placed on another queue for another task to address. This means the original task can be kept lean while the exception handling can be addressed independently and over time grow in sophistication and robustness.
An important part of exception handling is escalation. What happens when an integration or a processing task fails even after multiple retries? Message queues can be used as easy ways to bump up the processing or notifications or serve as backstops by storing items in a database for later processing or manual intervention. Instead of having to hardcode escalation as part of main task or a single exception handler, the process can be decoupled so that steps can be easily added or modified.
Integrating Independent Systems
Last in this set of categories is one of the more canonical uses within the enterprise world: an integration layer from one enterprise system to another. Passing data from an ordering system (such as SAP) to a CRM system (like Salesforce) is an example. Two separate systems that need to pass data from one to another. Putting a message queue in the middle means that each system remains independent, so neither system needs to know about the other.
Message queues also offer an advantage of reliability and durability, meaning that messages can be made to persist and operations made to be failsafe. Doing a direct connection from system to system might work most of the time, but when it doesn’t, there’s no real safety net.
When moving data between two systems, they may not always communicate using the same schema. In these cases, the data needs to be transformed from one schema to another as part of integrating two systems. Sometimes a single transformation can be invoked for every type of message--for example, changing a property or schema slightly--but sometimes each message type needs its own transformation--for example, when one of the systems uses its own unique schema.
In the cloud world, these transformations are usually done in a worker system, in which a discrete process exists for the sole purpose of transforming between a single pair of systems. These workers can either read messages off the queue and directly modify the second system, but we’ve found the best practice is to have the worker read off a queue, modify the message, then put it on a second queue, which the second system will read off of.
Message queues are useful for triggering secondary actions. Sometimes the data doesn’t need to be transformed and no new processing really needs to take place, but notifications need to be sent to dashboards or social streams. Hipchat, Chatter, or Yammer are some good examples of internal streams.
The nice thing about using a message queue is that the events can be placed on a queue using a systems webhooks and then the messaging framework can be used to trigger the specific action or set of actions. At Iron.io, we post a continual stream of events to Hipchat, including GitHub commits, monitoring test results, plan upgrades, and other events and results. Each event is placed on a queue and then routed to the destination.
Buffering Data Updates
Along with keeping systems independent and providing a data transformation capability, another reason why message queues make sense is that they provide a easy mechanism to manage the pace of data integration. Given almost assuredly different rates of production and consumption by two or more systems, a mechanism is needed to either scale out data updating or to throttle it back. Here’s where a message queue comes in handy. Rather putting the burden of handling high levels of throughput onto each application, it makes sense to put a message queue in between and take advantage of the buffering nature of a queue.
As uses of message queues within an application expand, a common part of the action becomes the need to route the messages to the proper destination. A common situation is to have a single message queue, but either push or pull data off the queue and pass it to different systems based on the content of the messages. Rather than use different queues (which is always a possibility, but requires every process writing to or reading from the queue to be updated), the routing logic can live in a message broker or single, specialised set of routing processes.
Message Queues as Core Elements
Message queues are fundamental tools for any modern application. Even in the course of writing this post, the list kept growing, which really shows how ubiquitous and essential message queues are.
Messaging, along with compute and storage, make up three core blocks in almost every system diagram. Having the right architecture is the first step, but then you still need the right solutions – solutions that have the combination of agility, availability, reliability, scale, and affordability to fit your needs. Getting to these solutions takes work. For compute and storage, clear leaders are emerging. When it comes to a cloud messaging layer, we certainly have our view.
If you'd like to get started with an efficient, reliable, and hosted message queue today, check out IronMQ. If you’d like to connect with our engineers about how queues could fit into your application, they’re available at get.iron.io/chat. And, if you have any uses that we left off here, please let us know. We’d love to add to the list.