Search This Blog


Friday, December 19, 2014 Launches IronWorker within the Azure Marketplace is pleased to announce it is offering its IronWorker platform as an Application Service within the Azure Marketplace, providing a key infrastructure component that gives developers immediate access to highly scalable event-driven computing services.

Every application in the cloud needs to process workloads on a continuous basis, at scale, on a schedule, or in the background. IronWorker is a modern application development platform for processing at a task level by isolating code packages and dependencies in a containerized compute environment managed by

IronWorker on Microsoft Azure

Developers can use IronWorker to develop scalable distributed cloud architectures from the start, turn custom client-server applications into cloud-based microservices, serve as a mobile compute cloud, or incorporate highly concurrent workload processing into their applications without the need for complex orchestration, overhead, and maintenance.

By collaborating with, Microsoft gives its customers flexibility in their application architectures and cloud migrations. By making use of IronWorker’s multi-language support, enterprise organizations can move individual components to the cloud while maintaining safe and secure application environments. IronWorker can also act as a key processing gateway to Azure component services including storage, queues, mobile services, and more, making it easy to create hybrid solutions of existing client-server applications and cloud-based microservices.

Through our expansive ecosystem, we are providing customers with the solutions they need to deploy their critical applications seamlessly in the cloud and create hybrid connections with on-premises. Building on our great Docker support in Azure, we are excited IronWorker is now available in the Azure Marketplace, to simplify deployment of the fantastic event-driven and processing technology created by
– Corey Sanders, Director of Program Management, Microsoft Azure 

Building Modern Cloud Solutions

IronWorker provides developers a friendly environment for handling a variety of event-driven asynchronous processing use cases including streamlined ETL pipelines, processing large files and big data sets, sending email and notifications in bulk, and more. By extending these capabilities to Azure, Microsoft customers can move much of their scale-out tasks to the cloud in an efficient and easy to use manner. leverages Docker as its core task processing environment, and has launched over 500 million containers since moving its capabilities into production earlier in the year. The IronWorker platform currently offers over 15 different Docker-based environments for specific language versions and essential libraries with additional capabilities coming soon.

Microsoft has built a world-class computing platform in Azure. brings to Azure a proven technology that provides an immediate way for Microsoft customers to migrate applications and services into the cloud as well as provide revolutionary component technology that lets them extend their use of the cloud even further.
                                – Chad Arimura, CEO,

Getting Started With IronWorker on Azure

Users of Azure are today able to add IronWorker as a service by visiting the Azure Marketplace. We have written instructions for adding the IronWorker service in our Documentation. Developers can then write and package task code for deployment to IronWorker’s processing environment within Azure. The dashboard built into Azure provides detailed insight into the state of your tasks for monitoring your complete application activity and performance.

IronWorker is currently available in the West US region of Azure, and supports multiple languages including Go, Java, Ruby, PHP, Python, Node.js, and .NET.

To try IronWorker for free, sign up for an account at We’ll even give you a trial of some of the advanced features so that you can see how processing at scale will change the way you view modern application development.

Wednesday, November 26, 2014

Reed Allman Speaking on the RocksDB Meetup Dec 4th

Reed Allman, a systems-level engineer at, will be talking at the RocksDB meetup on Thursday, December 4th, 2014. The meeting will be at the Facebook headquarters in Menlo Park, CA.

RocksDB is an embeddable open-source key/value database that is a fork of LevelDB. It is designed to be scalable to run on servers with many CPU cores, to efficiently use fast storage, to support IO-bound, in-memory and write-once workloads, and to be flexible to allow for innovation.

For more background on RocksDB, see Dhruba Borthakur’s talk from the Data@Scale 2013 conference as well as this story on the history of RocksDB.

Here's the description of Reed's talk:
Building queues that are Rocks solid (and other marginal puns) started out experimenting with LevelDB and we ended up using RocksDB. We'll walk through a naive queue implementation – one that would have minimal use in practice, but seeing as we're programmers we're into that kind of thing. We'll take a queue on LevelDB and punish it to see how it performs. We'll then take it to RocksDB and do the same performance tests then we'll compare results and show why RocksDB makes great sense for use for a persistence layer.  
About the Speaker  Reed Allman is a system-level engineer for working in Go to solve hard problems within high-scale fault-tolerant distributed systems. Prior to, he worked on a research project with Google to build refactoring tools for the Go language. By his estimation, he's read the language spec more times than is healthy and has gained a somewhat irrational view of programming in anything that doesn’t have channels.

Here's the full agenda for the evening.

For information on RocksDB at, here's a post on IronMQ v3 and the on-premise version built for enterprise and carrier private cloud deployments. If you need high-availability message queuing in public or private clouds, feel free to reach out to us.

Wednesday, November 19, 2014 Adds Named Schedules is pleased to announce named schedules as a feature in its IronWorker service. Giving names or labels to schedules may seem a small feature but it’s been a common request from a number of users managing large workloads.

Users can now give scheduled tasks labels when uploading the tasks to IronWorker or add them later via the Dashboard. The tags will appear in the Dashboard alongside the schedule to make it easier to keep track of the what’s happening in the background of an application.

Making Use of Named Schedules

Named schedules are available for all plans – including the Lite/Free plan. To make use of named schedules, all users need to do is include a name or label when uploading a scheduled task to IronWorker. 

Simply use the '--label' param along with the name of the schedule:

$ iron_worker schedule import_worker --label "Critical Task" --start-at "2015-01-01T00:00:00-00:00" --run-every 3600

You can also add or amend a label within the Dashboard using the 'Label' field.

Scheduled Tasks now contain a "Label" field

The labels will then appear in the list of Schedules Tasks.

Named Schedules

Getting Started 

To try IronWorker for free, sign up for an account at

We’ll provide a trial of some of the advanced features so that you can see how running code in the cloud at scale will change the way you think about application development.

On-demand processing awaits.

Tuesday, October 21, 2014

Docker in Production — What We’ve Learned Launching Over 300 Million Containers

Docker in Production at
Earlier this year, we made a decision to run every task on IronWorker inside its own Docker container. Since then, we've run over 300,000,000 programs inside of their own private Docker containers on cloud infrastructure.

Now that we’ve been in production for several months, we wanted to take the opportunity to share with the community some of the challenges we faced in running a Docker-based infrastructure, how we overcame them, and why it was worth it.

IronWorker is a task queue service that lets developers schedule and process jobs at scale without having to set up or manage any infrastructure. When we launched the service 3+ years ago, we were using a single LXC container to contain all the languages and code packages to run workers in. Docker allowed us to easily upgrade and manage a set of containers allowing us to offer our customers a much greater range of language environments and installed code packages.

We first started working with Docker v0.7.4 and so there have been some glitches along the way (not shutting down properly was a big one but has since been fixed). We’ve successively worked through almost all of them, though, and finding that Docker is not only meeting our needs but also surpassing our expectations. So much so that we’ve been increasing our use of Docker across our infrastructure. Given our experience to date, it just makes sense.

The Good

Here is a list of just a few of the benefits we’ve seen:

Large Numbers at Work

Easy to Update and Maintain Images

Docker’s 'git' like approach is extremely powerful and makes it simple to manage a large variety of constantly evolving environments, and its image layering system allows us to have much finer granularity around the images while saving disk space. Now, we’re  able to keep pace with rapidly updating languages, plus we’re able to offer specialty images like a new ffmpeg stack designed specifically for media processing. We’re up to 15 different stacks now and are expanding quickly.

Resource Allocation and Analysis

LXC-based containers are an operating system–level virtualization method that let containers share the same kernel, but such that each container can be constrained to use a defined amount of resources such as CPU, memory, and I/O. Docker provides these capabilities and more, including a REST API, environmental version control, pushing/pulling of images, and easier access to metric data. Also, Docker supports a more secure way to isolate data files using CoW filesystem. This means that that all changes made to files within a task are stored separately and can be cleaned out with one command. LXC is not able to track such changes.

Easy Integration With Dockerfiles

We have teams located around the world. Being able to post a simple Dockerfile and rest easy, knowing that somebody else will be able to build the exact same image as you did when they wake up is a huge win for each of us and our sleep schedules. Having clean images also makes it much faster to deploy and test. Our iteration cycles are much faster and everybody on the team is much happier.
Custom Environments Powered by Docker

A Growing Community

Docker is getting updates at an extremely fast rate (faster than Chrome even). Better yet, the amount of community involvement in adding new features and eliminating bugs is exploding. Whether it’s supporting images, supporting Docker itself, or even adding tooling around Docker, there are a wealth of smart people working on these problems so that we don’t have to. We’ve found the Docker community to be extremely positive and helpful and we’re happy to be a part of it.

Docker + CoreOS

We’re still tinkering here but the combination of Docker and CoreOS looks like it will have a solid future within our stack. Docker provides stable image management and containerization. CoreOS provides a stripped-down cloud OS and machine-level distributed orchestration and virtual state management. This combination translates into a more logical separation of concerns and a more streamlined infrastructure stack than presently available.

The Challenges

Every server-side technology takes fine-tuning and customization especially when running at scale and Docker is no exception. (To give you some perspective, we run just under 50 million tasks and 500,000 compute hours a month and are rapidly updating the images we make available.)

Here are a few challenges we’ve come across in using Docker at heavy volume:

Docker Errors – Limited and Recoverable

Limited Backwards Compatibility

The quick pace of innovation in the space is certainly a benefit but it does have its downsides. One of these has been limited backwards compatibility. In most cases, what we run into are primarily changes in command line syntax and API methods and so it's not as critical an issue from a production standpoint.

In other cases, though, it has affected operational performance. By way of example, in the event of any Docker errors after launching containers, we'll parse STDERR and respond based on the type of error (by retrying a task, for example). Unfortunately the output format for the errors has changed on occasion from version to version and so we've ended up having to debug on the fly as a result.

Issues here are relatively easy to get through but it does mean every update needs to be validated several times over and you’re still left open until you get it released into the land of large numbers. We should note that we started months back with v0.7.4 and recently updated our system to use v1.2.0 and so we have seen great progress in this area.

Limited Tools and Libraries

While Docker had a production-stable release 4 months ago, a lot of the tooling built around it is still unstable. Adopting most of the tools in the Docker ecosystem means adopting a fair amount of overhead as well. Somebody on your team is going to have to stay up to date and tinker with things fairly often in order to address new features and bug fixes. That said, we’re excited about some of the tools being built around Docker and can’t wait to see what wins out in a few of the battles (looking at you, orchestration). Of particular interest to us are etcd, fleet, and kubernetes.

Triumphing Over Difficulty

To go in a bit more depth on our experiences, here are some of the issues we ran into and how we resolved them.

An Excerpt from a Debugging Session
This list come mostly from Roman Kononov, our lead developer of IronWorker and Director of Engineering Operations, and Sam Ward who has also been instrumental in debugging and rationalizing our Docker operations.

We should note that when it comes to errors related to Docker or other system issues, we’re able to automatically re-process tasks without any impact to the user (retries are a built-in feature of the platform).

Long Deletion Times

The Fix For Faster Container Delete 
Deleting containers at the onset took way too long and required too many disk I/O operations. This caused significant slowdowns and bottlenecks in our systems. We were having to scale the number of cores available to a much higher number than we should have needed to.

After researching and playing with devicemapper (a docker filesystem driver), we found specifying an option that did the trick `--storage-opt dm.blkdiscard=false`. This option tells Docker to skip an expensive disk operation when containers are deleted, which greatly speeds up the container shutdown process. Once the delete script was modified, the problem went away.

Volumes Not Unmounting

Containers wouldn’t stop correctly because Docker was not unmounting volumes reliably. This caused containers to run forever, even after the task completed. The workaround was unmounting volumes and deleting folders explicitly using an elaborate set of custom scripts. Fortunately this was in the early days when we were using docker v0.7.6. We removed this lengthy scripting once the unmount problem was fixed in docker v0.9.0.
Breakdown of Stack Usage

Memory Limit Switch

One of the Docker releases suddenly added memory limit options and discarded the LXC options. As a result, some of the worker processes were hitting memory limits which then caused the entire box to become unresponsive. This caught us off guard because Docker was not failing even with unsupported options being used. The fix was simple to address – i.e. apply the memory limits within Docker – but the change caught us off guard.

Future Plans

As you can see, we’re pretty heavily invested in Docker and continue to get more invested in it every day. In addition to using it for containment for running user code within IronWorker, we’re in the process of using it for for a number of other areas in our technical stack.

These areas include:

IronWorker Backend

In addition to using Docker for task containers, we’re in the process of using it to manage the processing that take place within each server that manage and run worker tasks. (The master task on each runner takes jobs from the queue, loads in the right docker environment, runs the job, monitors it, and then tear-down the environment after it runs.) The interesting thing here is that we’ll have containerized code managing other containers on the same machines. Putting all of our worker infrastructure environment within Docker containers also allows us to run them on CoreOS pretty easily.

IronWorker, IronMQ, and IronCache APIs

We’re no different from other ops teams in that nobody really likes doing deployments. And so we’re excited about wrapping all of our services in Docker containers for easy, deterministic environments for deployments. No more configuring servers. All we need are servers that can run Docker containers and, boom, our services are loaded. Should also note that we’re replacing our build servers – the servers that build our product releases for certain environments – with Docker containers. The gain here is greater agility and a simpler, more robust stack. Stay tuned.

Building and Loading Workers

We’re also experimenting with using Docker containers as a way to build and load workers into IronWorker. A big advantage here is that this provides a streamlined way for users to create task-specific workloads and workflows, upload them, and then run them in production at scale. Another win here is that users can test workers locally in the same environment as our production service.

Enterprise On-Prem Distribution

Using Docker as a primary distribution method our IronMQ Enterprise on-premises version simplifies our side of the distribution and provides a simple and universal method to deploy within almost any cloud environment. Much like the services we run on public clouds, all customers need are servers that can run Docker containers and they can get multi-server cloud services running in a test or production environment with relative ease.

From Production To Beyond

The Evolution of IT
(excerpted from
Docker has come a long way in the past year and a half since we saw Solomon Hykes launch it and give a demo on the same day at a GoSF meetup last year. With the release of v1.0, Docker is quite stable and has proven to be truly production ready.

The growth of Docker has also been impressive to see. As you can see from the list above, we’re looking forward to future possibilities but we're also grateful that the backwards view has been as smooth as it’s been.

Now only if we could get this orchestration thing resolved.

The Story Behind Our Use of Docker 
UPDATE: For additional background on our use of Docker, take a look at the earlier post that we wrote called How Docker Helped Us Achieve the (Near) Impossible. In it, we discuss the decisions behind using Docker,  the requirements we had going in, and more details on what it enables us to do.

For more insights on Docker as well as our emerging impressions of CoreOS, you can watch this space or sign up for our newsletter. Also, feel free to email us or ping us on twitter if you have any questions or want to share insights.

To try IronWorker for free, sign up for an account at We’ll even give you a trial of some of the advanced features so that you can see how processing at scale will change the way you view modern application development. 

About the Authors

Travis Reeder is co-founder and CTO of, heading up the architecture and engineering efforts. He is a systems architect and hands-on technologist with 15 years of experience developing high-traffic web applications including 5+ years building elastic services on virtual infrastructures. He is an expert in Go and is a leading speaker, writer, and proponent of the language. He is an organizer of GoSF (1450 members) and author of the following posts:

Roman Kononov is Director of Engineering Operations at and has been a key part of integrating Docker into the technology stack. He has been with since the beginning and has built and contributed to every bit of’s infrastructure and operations framework. He lives in Kyrgyzstan and operates’s remote operations team.

Additional Contributors – Reed Allman, Sam Ward

About is the maker of IronMQ, an industrial-strength message queue, and IronWorker, a high-concurrency task processing/worker service. Every production system uses queues and worker systems to connect systems, power background processing, process transactions, and scale out workloads.'s products are easy to use and highly available and are essential components for building distributed applications and operating at scale. Learn more at

About Docker

Docker is an open platform for distributed applications for developers and sysadmins. Learn more at

Monday, October 20, 2014

CEO Chad Arimura Speaking at Data 360 Conference on Real-time Data

Data 360° Conference (Oct 22-23, 2014)
Chad Arimura, CEO and Co-Founder of, will be speaking at the Data 360° Conference in Santa Clara this week.

The conference brings together leading figures in data processing and analysis to discuss trends in big data, cloud infrastructure, real-time data analysis, and distributed computing. Specific emphasis is on these topics in the world of healthcare, retail, finance, and IT services but the principles apply in any industry.

Here's the panel Chad will be speaking on:
Wed, 3:00 PM (Oct 22nd)
Resources for Real-time Results
Big data tools are now widely used due to resources like storage, compute and analytics largely available. The panel discusses how IT decision makers are considering where to invest to achieve real-time results using proprietary resources.
James Collom (Aisloc) - Moderator
Mark Theissen (Cirro Inc.)
Sundeep Sanghavi (DataRPM)
Chad Arimura (
Chad Arimura
The conference runs Wed/Thurs, October 22-23, 2014 at the Santa Clara Marriott Hotel. Other speakers are from companies that include EMC, Cloudera, Twitter, Google, Cisco, Splunk, GE, AT&T, TIBCO, CSC, Verizon, and more.  If you're at the conference, be sure to come up and say hello.

A Few of the Conference Speakers

Tuesday, October 14, 2014 Adds Longer Running Workers

Long-Running Workers are Now Available in IronWorker is happy to announce long-running workers are now available within IronWorker. Up until now, workers running on the platform have been limited to 60 minutes in duration.

Users on Production and Enterprise plans or using Dedicated Clusters can now have workers that run for hours at a time. This gives users greater flexibility to handle even more extensive asynchronous workflows.

Worker systems are essential for doing transaction and event processing, background processing, and other types of distributed processing. (GitHub once estimated that over 40% of their processing takes place in the background.)

Short-Running Tasks + Longer Running Tasks

A 60-minute limit fits most use cases for a worker system and provides the right balance between processing power, time-in-queue latency, system flexibility, and responsiveness. Great benefits can result when you distribute work across a set of task-specific workers as shown here, here and here – but that’s not always feasible to do and keep task duration to under 60 minutes.

Longer running workers are the answer to these situations where workloads can’t be broken into discrete units or where monitoring and scheduling of a complicated process might extend across a few hours.

Production plans and users on Dedicated clusters can make use of this feature right away. The default for longer running workers is 2 hours but this can be extended by talking with one of our account teams. (Users on Dedicated Clusters just need to let us know what and we can make the change quickly.)

Maximum Duration of Workers in IronWorker

Support for More Advanced Workloads

The release of this new capability joins several others we’ve released over the past few months. These new features on the platform include:

These advanced features are in response to a number of conversations with users with heavy workloads and more complicated workflows. 
One example where a longer-running worker might come into play is a large unbroken iteration. Many of our users use Ironworker to process large CSV files, some of which can span millions of rows. We generally suggest breaking down and parsing the files in chunks but in some cases, this can be difficult and even problematic as in the case where line items may need to be processed in order. An example here might be inventory changes or other transaction processing.
• A second example can be a large crawling and compilation operation. Using the PhantomJS stack within IronWorker, users are able to literally take thousands of images and snapshots of websites and build PDFs, gifs, and even video files of the images. In most cases, the translation and transcoding period grows linearly with the size of the required output, which means the amount of processing can quickly go above and beyond the standard 1-hour limit.
• A third example might be in the case of using a master worker to monitor a process that might take longer than 60 minutes. In general, we recommend tasks to queue up other work (master-slave pattern) and scheduled workers to monitor progress but in the case where users want a persistent worker to continually run, longer running workers now provide that capability.  
To see more use cases for a worker system and get more details the examples above, check out this article on top uses of a worker system as well as some of the success stories on our site. Most of the examples center around short duration workers but longer running workers can slot in to give you that extra processing element that you might need.

Making Use of Long-Running Workers

Longer running workers are available to users on Production and Enterprise plans or operating on Dedicated Clusters.

To make use of long-running workers, we first need to enable your account – contact one of our account teams to get provisioned. Once that’s done, all you have to do is include a new timeout value when you queue a task. The maximum limit for long running workers is initially set to 2 hours (7200 seconds) but this can be extended upon request to up to 24 hours for Enterprise accounts and Dedicated Clusters.

Setting a timeout with a curl command (in seconds)
$ curl -H “Content-Type: application/json” -d ‘{"tasks": [{"code_name": "ExampleWorker", "payload": "", "timeout": 7200}]}’<PROJECT ID>/tasks?oauth=<TOKEN>

Setting a timeout with the CLI tool in IronWorker (in seconds)
$ iron_worker queue <WORKER_NAME> --timeout 7200

Getting Started 

To try IronWorker for free, sign up for an account at We’ll even give you a trial of some of the advanced features so that you can see how processing at scale will change the way you view modern application development.

We can also connect with one of our account teams to dive into solutions at more depth or pair program with one of our developer evangelists to get up and running in minutes.

What are you waiting for? Simple, scalable, long-running processing awaits.

Thursday, October 2, 2014

How to Build an ETL Pipeline for ElasticSearch Using Segment and

ETL is a common pattern in the big data world for collecting and consolidating data for storage and/or analysis. Here's the basic process:
  • Extract data from a variety of sources
  • Transform data through a set of custom processes
  • Load data to external databases or data warehouses

Segment + + Elasticsearch = A Modern ETL Platform

While it may seem unnecessary to follow this many steps as the tools around Hadoop continue to evolve, forming a cohesive pipeline is still the most reliable way to handle the sheer volume of data.

The extract process deals with the different systems and formats, the transform process allows you to break up the work and run tasks in parallel, and the load process ensures delivery is successful. Given the challenges and potential points of failure that could happen in any one of these steps, trying to shorten the effort and consolidate into one process or toolkit can lead to a big data mess.

IronMQ + IronWorker 
This ETL pattern is a common use case with many of our customers, who will first use IronMQ to reliably extract data from a variety of sources, and then use IronWorker to perform custom data transformations in parallel before loading the events to other locations. This combination of IronMQ and IronWorker not only makes the process straightforward to configure and transparent to operate, it also moves the whole pipeline to the background so as not to interfere with any user-facing systems.

Leveraging the scaling power of allows you to break up the data and tasks into manageable chunks, cutting down the overall time and resource allocation. Here’s one example of how HotelTonight uses to populate AWS’ RedShift system to give them real-time access to critical information.

In this post, we thought we'd walk through a use case pattern that provides a real world solution for many – creating a customized pipeline from Segment to ElasticSearch using

Segment and An Integration That Does Much More

With the growing number of tools available to developers and marketers alike for monitoring, analytics, testing, and more, Segment is quickly becoming a force in the industry, serving as the unifying gateway for web-based tracking applications. In fact, Segment has become one of our favorite internal tools here at thanks to its ability to consolidate the delivery to various APIs with just one script. Whether it's Google Analytics, Salesforce, AdRoll, or Mixpanel just to name a few, Segment eliminates the pain of keeping all of our tracking scripts and tags in order within our website, docs, and customer dashboard for monitoring user activity. How nice.

We're not alone in our appreciation of Segment, and we've included an IronMQ integration of our own that you can read about here. Our integration follows a unique pattern in that it's not just a single point connect, though. Instead, connecting IronMQ to Segment as an endpoint creates a reliable data stream that can then be used for a wide range of use cases. The benefits of doing so include:

  • Data buffering – IronMQ provides a systematic buffer in the case that endpoints may not be able to handle the loads that Segment may stream.
  • Data resiliency – IronMQ is persistent with FIFO and one-time delivery guarantees, ensuring that data will never be lost.
  • Data processing – Adding IronWorker to the end of the queue can provide you with scalable real-time event processing capabilities.

A Data Pipeline into ElasticSearch

ElasticSearch is an open source, distributed, real-time search and analytics engine. It provides you with the ability to easily move beyond simple full-text search to performing sophisticated data access, collection, indexing, and filtering operations. ElasticSearch is being used by some of the largest businesses in the world and is growing at a rapid pace. You can read about the many customer use cases with ElasticSearch here.

Segment itself does a great job of collecting data and sending to individual services. Many users, however, will want to perform additional processing on the data before delivering it to a unified database or data warehouse such as ElasticSearch. Some example uses could be for building out customer dashboards or internal analytics tools. With ElasticSearch at the core, you can translate events into actionable data about your customer base. Business intelligence in today's environment is driven by real-time searchable data, and the more you can collect and translate, the better.

ETL Pipeline Instructions : Step-By-Step

The following tutorial assumes that you've installed Segment into your website. From here we'll walk through switching on the IronMQ integration and then running IronWorker to transform the data before loading into ElasticSearch.
Copy your Credentials from the Dashboard 

1. Connecting IronMQ to Segment

With Segment and, building a reliable ETL pipeline into ElasticSearch is simple. The initial extract process, often the origin of many headaches, is already handled for you by piping the data from Segment to IronMQ.

Flipping on the IronMQ integration within the Segment dashboard automatically sends all of your Segment data to a queue named "segment". All you need to do to initiate the process is create a project within the HUD and enter the credentials within Segment.

Enter Your Credentials into Segment

2. Transforming the Data

Now that we have our Segment data automatically sending to IronMQ, it's time to transform it prior to loading into ElasticSearch.

Let's say we want to filter out identified users based on their plan type so only paid user data gets sent to ElasticSearch. In the case of building customer dashboards, this allows us to maintain a collection of purely usable data, making our indexing and searching more efficient for the eventual end use case.

We're going to create a worker to pull messages from the queue in batches, filter the data, and then load into an ElasticSearch instance. For simplicity's sake, we're going to create a Heroku app with the Bonsai add-on, a hosted ElasticSearch service. Leveraging IronWorker's scheduling capabilities, we can check for messages on the queue at regular intervals.

With IronMQ and IronWorker, we can also ensure that we're not losing any data in this process, and that we're not overloading our ElasticSearch cluster with too much incoming data. Buffering and, buffering and...

Segment Data
Before we get to our worker, let's examine the data from Segment that gets sent to the queue. Segment is vendor agnostic, making it very simple to interact with the exported data. The Tracking API that we'll be working with consists of several methods: identify, track, page, screen, group, and alias. You can dive into the docs here. We use the Ruby client within the website to see where our users are going. Any page we want to track, we just place this line on the Controller.

Analytics.track( user_id: cookies[:identity], event: "Viewed Jobs Page" )

Here is a typical basic identify record in json format... parsed, filtered, and prettied. This is plenty enough for us to make our transformation before loading into ElasticSearch.

 “action”  : “Identify”,
 “user_id” : “123”,
 “traits”  : {
  “email”        : “”,
  “name”         : “Ivan Dwyer”,
  “account_type” : “paid”
  “plan”         : “Enterprise”,
 “timestamp” : “2014-09-10-02T00:30:00.276Z”

Worker Setup
Now let's look at creating the task that performs the business logic for the transformation. With IronWorker, you create task specific code within a "worker" and upload to our environment for highly scalable background processing. Note that this example uses Ruby, but IronWorker has support for almost every common language including PHP, Python, Java, Node.js, Go, .NET, and more.

For this worker, the code dependencies will be the irommq, elasticsearch, and json gems. It’s a quick step to create the .worker file that contains the worker config and we'll put the Bonsai credentials in a config.yml file.

Our worker in this example will be very simple as a reference point. With the flexible capabilities of IronWorker, you are free to build out any transformations you'd like, and load to any external service you prefer. The Segment queue can grow rapidly (because traffic), so we'll want to schedule this worker to run every hour and clear the queue each time. If your queue growth gets out of hand, you can create a master worker that splits up the processing to slave workers that can run concurrently, significantly cutting down the total processing time. Just another reason keeping the ETL process within is the way to go.

Once we initiate both the IronMQ and ElasticSearch clients, we can make our connections and start working through the queue data before loading to Bonsai. We know from our Segment integration that the queue is named "segment", and that our paid users are tagged with an "account_type" trait. This allows us to easily loop through all the messages and check whether or not each record meets our requirements. Post the data to Bonsai and then delete the message from the queue. Pretty simple.

3. Upload and Schedule our Worker

Now we can use the IronWorker CLI tool to upload our code package.

$ iron_worker upload segment

Add a New Scheduled Task
Once the code package is uploaded and built, we can go to the HUD and schedule it to run. Select the "SegmentWorker" code package and set the timing. Every hour at regular priority seems fine for this task as this is a background job with no time sensitivity. What's that mem1 cluster you ask? Why that’s our new high memory worker environment meant for heavy processing needs.

Now our worker is scheduled to run every hour with the first one being queued up right away. We can watch its progress in the Tasks view.

View Task Status in Realtime

Once it's complete, we can check the task log to see our output. Looks like our posting to ElasticSearch was successful.

Detailed log output

Just to be sure, let's check our Bonsai add-on in Heroku. Looks like we successfully created an index and populated with documents. Now we can do what we like within ElasticSearch.
Bonsai add-on within Heroku
There you have it. With the integration on Segment, you can build your own ETL pipeline any way you'd like.

Get Running in Minutes

To use for your own ETL pipeline, signup for a free account (along with a trial of advanced features).

Once you have an account, head over to Segment and flip the switch to start sending your data to IronMQ.

With the Segment and integration in place, you can branch the data to other locations as well as tie in IronWorker to transform the data. Let us know what you come up with... we'll write about it and send t-shirts your way!

Monday, September 22, 2014

New FFmpeg IronWorker Stack For Easy Video Processing

FFmpeg is the leading cross-platform solution to record, convert and stream audio and video. Dealing with audio and video can eat up resources, making the activity a great fit for IronWorker by moving the heavy lifting to the background.

In the past, usage of FFmpeg with IronWorker would require that our users include and install the dependency within each worker environment. In order to streamline that process for developers, we've included FFmpeg in an IronWorker stack as a custom runtime environment specifically meant for video processing.

The possibilities are endless with the flexibility of FFmpeg and the processing power of IronWorker. Here are a few examples we've come across in working with our users, which will give you a baseline for the capabilities.

  1. Format Encoding
  2. Audio Normalization
  3. Audio/Video Optimization
  4. Metadata Analysis
  5. DRM Encoding/Decoding
  6. Screencapture Production
  7. Resize, Reformat and Crop Video
  8. Change Aspect Ratio

$ ffmpeg -i input.mp4 output.avi


pre-built libraries: ffmpeg-2.3, GPAC-0.5.1, , x264-0.142.x
supported runtimes: php-5.3, node-0.10, ruby-1.9.3p0, python-2.7
Find out more detailed info about the FFmpeg stack here.

To use, simply include "ffmpeg-2.3" in your .worker file using the stack option:

stack "ffmpeg-2.3" 

Here are a few examples of how to use the FFmpeg stack in the supported languages.

Using Ruby runtime + FFmpeg stack

Using Node.js runtime + FFmpeg stack

Using Python runtime + FFmpeg stack

Using PHP runtime + FFmpeg stack

More flexibility for our developers
Making deployment and dependency management painless is a top priority for our team. Supporting a diverse range of languages, frameworks, and packages provides our users what they need to make their implementation successful.

We'd love to hear feedback or to even feature tutorials written by you! send me a message at:

Friday, September 19, 2014

Orchestrating PHP Dependencies with Composer and IronWorker

Package your dependencies on IronWorker using composer
This is a tutorial describing how to include and use the PHP package management tool Composer with IronWorker.

Composer is a tool for dependency management in PHP. It allows you to declare the dependent libraries your project needs, and it will install them in your project for you. Packagist is the main Composer repository. It aggregates all sorts of PHP packages that are installable with Composer.

Installing Using Composer locally

1. run  composer install command (this downloads composer.phar file locally)
$ curl -s | php
2. define packages and versions in a composer.json file
    "require": {
        "vendor/package": "1.3.2",
        "vendor/package2": "1.*",
        "vendor/package3": ">=2.0.3"
3. run the installation command
$ php composer.phar install
4. your packages will be installed in a /vender folder locally and can be loaded in your php script via
require 'vendor/autoload.php';

Using Composer on IronWorker

Its nearly as simple to do the same in a IronWorker.

1. include both the local compser.phar and composer.json in your .worker manifest. Also include a build script for us to run on our servers.
runtime "php"

file "composer.phar"
file "composer.json"
build "php composer.phar install"

exec "my_script.php

2. upload via our command line tool
$ ironworker upload <name_of_workerfile>
This is what you will see when running "iron_worker upload test.worker

BOOM! its that simple! Your packages will be built on our servers and accessible via to your script.

Composer growth!

With 38,464 packages registered and close to 35,000,000 packages installed each month, it is quickly becoming the standard in PHP package management by popular frameworks such as LaravelSymfonyYiiZend, and more. We also heard recently that EngineYard a leader in application management is sponsoring Composer with a 15,000 community grant!

We are excited to see the growth that Composer has received thus far, and are look forward to seeing our own users take advantage of this wonderful tool and the IronWorker platform together.


Props go out to Van der Stock, an awesome fan of IronWorker is actually using IronWorker to relay updates to users when their composer dependencies are out of date! Get updates when your Composer dependencies are out of date! send him regards at @dietervds

More flexibility for our developers
Making deployment and dependency management painless is a top priority for our team. Supporting a diverse range of languages, frameworks, and packages provides our users what they need to make their implementation successful.

We'd love to hear feedback or to even feature tutorials written by you! send me a message at:

Wednesday, September 10, 2014

How Uses Node.js and IronWorker to Handle Their Background Processing

The following is a guest blog post by Thomas Shafer describing how deploys their workers within to handle all of their background processing. is the only true developer-focused live streaming service. We offer APIs and SDKs for all mobile platforms and language frameworks and let developers build and ship live-streaming capabilities with just a few simple steps.

We have many background tasks to run and use for all of our long-running and resource-intensive processing. These types of tasks include video stream archival processing, customer emails and log processing, and bandwidth calculations.

Over the course of working with the IronWorker service, we developed a few approaches to make it easy for us to integrate distributed processing into our Node.js application framework and maintain consistency with our main service across the full set of workers. The primary strategy is to use a single shared worker space.

Single Shared Worker Space

As an example of the approach we use, when we process logs from our edge servers, we need to gather log file data and attach each bandwidth entry to a user's account. To do this, we need credentials to access multiple remote data sources and follow some of the same logic that our main dashboard application uses.

To maintain logical consistency and a DRY architecture, we upload many shared components of our dashboard application to IronWorker. Our dashboard application shares code with the central API application to ensure logical consistency – as Wikipedia puts it "a single, unambiguous, authoritative representation within a system".

Because some of our background tasks require shared code between our dashboard application and our API, we decided to structure our IronWorker integration with a single .worker class, titled MainWorker.

MainWorker Serves as the Primary Worker

We use one worker to perform a number of tasks and so it needs to be flexible and robust. It needs to be able to introspect on the possible jobs it can run and safely reject the tasks it cannot handle. One way to make it flexible is to unify the payload and reject any attempts to schedule a MainWorker that does not follow the expected payload format.

A good way to enforce a predictable format is to, once again, share code. Whether it's the dashboard, API, or another MainWorker, they all use the same code to schedule a job.

Our MainWorker payload follows the following format:

      configuration: {
        // configuration holds sensitive variables
        // such as redis credentials, cdn access codes, etc.
      jobName: "",
        // The name of the job we want to run
        // MainWorker understands which jobs are acceptable
        // and can reject jobs and notify us immediatly on inadequate jobNames
      source: "",
        // source is the originator of the request.
        // This helps prevent unwanted scheduling situations.
        // An example is preventing our API application
        // from scheduling the job that sends out invoices at the end of the month.
        // That job is reserved for IronWorker's internal scheduler.
      jobPayload: {
        // the payload to be handled by the job, such as model ids and other values.

The jobs folder we uploaded contains the code for every specific job and is properly included by the MainWorker, which is written in node. Here's a look at the .worker file for MainWorker

Example of's MainWorker.worker file 

runtime 'node'
stack 'node-0.10'
exec 'main_worker.js'
dir '../models'
dir '../config'
dir '../node_modules'
dir '../lib'
dir '../jobs'
dir '../main_worker'
name 'MainWorker'

Benefits of Our Approach

After working with this setup for a while I'm convinced the advantages of a single shared space is the way to go.

Continuous Deployment

By throwing our IronWorker jobs into the same codebase as our API and dashboard application, I know our logic will be consistent across multiple platforms. This allows us to integrate IronWorker with our continuous integration server. We can update every platform simultaneously with the most up-to-date code. With this approach, there is no way that one-off untested scripts can make their way into the environment. We update code on through our CI suite and it's up to the developer, code reviewers, and our continuous integration server to validate our code. Everyone has visibility into what is on the platform.

Consolidated Reporting

By running all of our jobs through the MainWorker, we know each new worker will gather metrics and handle error reporting out of the box. We don't need to figure out how each new worker will handle errors, what the payload will look like, etc. Enforcing a single convention leads to us focusing on the internal logic of the jobs and getting things shipped.

Flexible Scheduling

The job payload has a rigid structure but we can share the library for scheduling jobs. That library will be responsible for sending the appropriate structure with the necessary configuration variables, jobName, source, and jobPayload.

One Drawback to the Approach

There is a drawback with using a single shared space for our workers. When we look at jobs, whether running or queued, all we see is "MainWorker, MainWorker, MainWorker". We cannot use the dashboard to tell which jobs are taking a long time and therefore lose some of the usefulness of the dashboard. (Note: If IronWorker were to allow tags or addition names that would go along way towards giving us visibility. I hear it's on the roadmap so let's hope it makes in it sometime soon.)


Deploying a shared environment to has enabled our development team to focus on delivering customer value in a rapid and high quality manner. We can easily test our job code, ensure has the most up to date code, and handle fixing any production errors promptly.

About the Author
Thomas Shafer is a co-founder of, the only developer-focused live streaming service available today. He is also a founder of Giving Stage, a virtual venue that raises money for social and environmental change. (@cine_io)

To see other approaches to designing worker architectures, take a look at how Untappd uses a full set of task-specific workers to scale out there background processing in this post. Also, be sure to check out this article on top uses of a worker system.