Blog

Search This Blog

Loading...

Thursday, January 22, 2015

The Ephemeral Life of Dockerized Microservices



When using the word 'ephemeral', it's hard not to think of Snapchat these days, however the concept applies to the on demand computing pattern we promote here at Iron.io with our task processing service IronWorker. At a glance, each Docker container in which a task runs is only alive for the duration of the process itself, providing a highly effective environment for powering applications that follow the microservices architectural style.

Long Live the Container

As Docker continues to spread through the industry by promising a standardized, encapsulated runtime across any environment, an entire ecosystem has emerged around containers from their orchestration to their hosting. We were early adopters with our initial use case, and continue to further leverage the technology through multi-cloud deployments and integrations.

While deploying distributed applications within a Dockerized framework is on the fast track to be the model of the future, a number of concerns around security, discovery, and failure have been introduced when approached from a production-ready mindset. Without digging into those topics too deep, let's look at where Docker makes sense today, and why we've been so successful with it as a core component of our platform.

People have been surprised by our heavy use of Docker in production, however the nature of IronWorker lends itself well to the current state of Docker without as much worry for the drawbacks. That's certainly not to say we haven't had our own set of challenges, but we treat each task container as an ephemeral computing resource. Persistence, redundancy, availability – all the things we care so much about when building out our products at the service level, do not necessarily apply at the individual task container level. Our concern in that regard is essentially limited to ensuring runtime occurs when it’s supposed to, allowing us to be confident in our heavy use of Docker today.

To give a peek under the hood of IronWorker, we have of a number of base Docker images stored in block storage (EBS) that provide language/library environments for running code (15 stacks and counting). Users write and package their code with only the dependent libraries for the task and then upload to our file storage (S3). The IronWorker API allows users to run any task at a set concurrency level, either on demand or on a schedule. Tasks are placed in an internal queue (IronMQ) and then pulled by one of our many task execution servers.

These task execution servers, or "runners" as we like to call them, merge the selected base Docker image with the user's code package in a fresh container, run the process, and then destroy the container. Rinse and repeat at massive scale. This streamlined process is very clean and fast, and we are continually working hard to tighten up even further by optimizing the task queue and improving the container startup time.

Dockerized Microservices

Wikipedia defines microservies as, "a software architecture design pattern, in which complex applications are composed of small, independent processes communicating with each other using language-agnostic APIs. These services are small, highly decoupled and focus on doing a small task." This is in contrast to the monolithic approach where every component is embodied in a single and often cumbersome application framework.

While decoupling app components is not a new concept, microservices provide a more modern approach. What's often missing from the discussion, though, is the computing environment. Where do these individual processes actually live and run? One of the key benefits of the microservices style is more streamlined orchestration at the individual service level, however scaling and orchestrating infrastructure can get expensive and complex as you separate more and more components if you’re not extra careful.

The ephemeral use of Docker described here applies to microservices as the concept is to have independently developed and deployed services that each follow a single responsibility. Whether it be sending emails and notifications, processing images, placing an order, posting to social media – these processes should run asynchronously outside of the immediate user response loop. This means they really don’t need to be hosted in the traditional sense, they only need to be triggered by an event and run on demand.

This is where IronWorker comes into play – aside from providing a workload-aware computing environment fit for any task, Iron.io handles all of the operations, provisioning, and processing of your microservices for you in a highly efficient and effective manner. This means that you can keep your focus on developing code, without having to worry about how to deploy, manage, and scale. As microservices evolves to be the pattern for building modern cloud applications, having a dynamic platform like IronWorker to handle the bulk of the work will be crucial throughout the entire development lifecycle.

The Next Word

Not every service is a microservice, and there's still the topic of dealing with handling requests, state and inter-service communication. At the end of the day, a microservices application is meant to be a single application, and it must all come together in a unified manner. Stay tuned for the next post where we talk about those smart pipes.

About the Author
Ivan Dwyer is the Director of Channels and Integrations at Iron.io, working with various partners across the entire cloud technology and developer services ecosystem to form strategic alliances around real world business solutions.


To get started with IronWorker for free, sign up for an account today. Our containers may be ephemeral, but our service and support are lasting!

Find this interesting? Discuss on Hacker News

Thursday, January 8, 2015

AWS Lambda vs IronWorker

If imitation is the sincerest form of flattery, then consider us flattered by Amazon. The new AWS Lambda service is nearly the same thing as Iron.io's IronWorker service, solving the same problem with a slightly different API.

I took it on as a project recently to compare and contrast how each service works, how they are similar, and how they differ.

This post is the results of that comparison.


Example - A Lamda Function vs a Worker in IronWorker


We'll start using the first example from the AWS Lambda documentation, Walkthrough 1:  Handling User Application Events

Here are a few things we need to do first to get things set up for the test:

1. Install Node.js if you don’t have it already.
2. Create a project directory called 'lambda-vs-ironworker'.
3. You must have an AWS account and an Iron.io account setup.
4. Install the AWS cli tool and the IronWorker cli tool.

Lambda Hello World

These are the steps to create your Lambda function, upload it, and run it. A fast track version of the walkthrough in the AWS docs. 

1. Create a file called 'helloworld.js' and paste this code into it:


2. Zip up the files in the folder (just the helloworld.js for this example). From the cli, this would look like:

3. Create an IAM role for Lambda.

4. Upload the Lambda function:


5. Create a test input file called 'helloworld-input.txt' with:


6. Invoke the function:

7. Check the AWS console to ensure it ran and see the output.


IronWorker Hello World

Now let's try the same thing on IronWorker.

1. IronWorker will use almost the exact some code, but removing some Lambda specific stuff. Create a file called 'helloworld-iron.js' and paste this code into it:


2. Create a file 'helloworld.worker' and paste this into it:

3. Upload the worker:

4. Invoke/run the worker:


5. Check HUD (Iron.io's console) to ensure it ran and see the output. The queue command will return a link on the command line that will take you directly to the task too.

Results

You should see the same output on both systems and as you can see, the operations and code are very similar. All the code used in this example can be found here as well as code for Walkthrough #2.

Invoking Lambda Functions vs Queuing IronWorker Tasks

Invoking/queuing these functions/workers from your code instead of the cli like we've done here, is also very similar.


Comparison Chart

This is a comparison of a bunch of the main features of the products. 


Lambda
IronWorker
Event-Based Triggers
Yes 
(from an AWS-supported event only)
Yes 
(from an AWS-supported event as well as events from any services that supports webhooks)
Timeout
max 60 seconds
max 1 hour
(long-running workers available)
Startup Time
Sub-second
Few seconds
Language/runtime
Node.js only
All major languages – PHP, Python, Ruby, Node.js, Java, .Net, Go, Scala, binary executables.
Memory size
64 - 1024 MB, default 128MB
320 - 2048MB, default 320MB
Clouds
AWS
AWS, Rackspace, Microsoft Azure, private clouds, and more
Dedicated Clusters (Private)
No
Yes
On-Premise
No
Yes (see here)
Scheduling
No
Yes (see here)
Make simple functions via console
Yes 
(only useful when using aws-sdk node module only since that’s imported by default)
No



Conclusion 


As you can see, Lambda is very similar to IronWorker, in fact it's almost the same service with a different spin on it.

A few important things to note though is that IronWorker has been operating in production for over 4 years, runs on multiple clouds (including your own on-premise cloud), supports all major programming languages, has more features (see above), and did I mention our customer support rocks?

Lambda on the other hand is brand new, only supports Node.js, only runs on AWS, and their customer support leaves a lot to be desired (unless you pay up). Don't get me wrong, Lambda is solving a big problem and is probably a great service, as are all the AWS services, we just think IronWorker is better. But you can come to your own conclusions on that after trying them out.

There are some areas we'll be working on improving in IronWorker to address some of the noteworthy aspects of Lambda, most noticeably the process startup-time. Stay tuned!





Try IronWorker for Free

To try IronWorker for free, sign up for an account at Iron.io. On-demand processing awaits!

Wednesday, December 24, 2014

Microservices for Good : How Angel Flight West Uses Technology to Help People in Need

With the holiday season upon us, we thought it would be a fitting time to to highlight a wonderful non-profit organization and how they use technology for a great purpose.



Angel Flight West : A Mission for Good

Angel Flight West and its Mission


Angel Flight West is a nonprofit charitable organization established in 1983 for the sole purpose of providing flights for those in need. Their network of 1,800+ volunteer pilots fly their own planes and pay for the costs out of their own pockets for these critical journeys.

Angel Flight West coordinates with members to fly organ transplant candidates, chemotherapy patients, clinical trials, abuse victims seeking relocation, disabled or sick children to Make-A-Wish programs or summer camp, and many other humanitarian reasons. The beneficiaries of the flight – passengers and their families, healthcare organizations, and others – never pay for anything, ever.

Angel Flight West serves the western part of the US including Alaska and Hawaii and is a member of the Air Charity Network which covers all 50 states. It is also one of the driving technology forces in the network, beginning with an early initiative with web technologies, and extending to present day with its increasing use of the cloud, microservices, and distributed technologies.




The Technology Stack Behind Angel Flight West


Back in 1999, Angel Flight West developed a web-based flight coordination system to manage the matching of passengers who need transportation with the pilots who can provide it. With this technology, Angel Flight West grew over a three-year period from approximately 1,000 missions per year to over 4,000. As they’ve grown, web technologies have enabled Angel Flight West to use a small staff to coordinate flights and keep organizational costs at a minimum. To date, this small team, along with the assistance of host and flight-day volunteers, has flown over 60,000 flights for people in need.
A Screen from FlightPlan.com

Their system runs on a LAMP stack with PHP as the primary language and Symfony as the application framework. They use MySQL as their database and IronWorker for the increasing amount of asynchronous processing taking place in the background. They also use several aviation-specific websites including FlightAware for live flight tracking and FlightPlan.com for flight planning.

  • Language - PHP
  • Framework - Symfony
  • Database - MySQL
  • Async Processing - IronWorker
  • Other
    • Twilio (early stages) 
    • Google Maps 
    • FlightAware
    • FlightPlan.com


Releasing as Open Source to the Air Charity Network 

Angel Flight West regularly shares the software with others organizations in the Air Charity Network. They recently released the software as an open source application and set of services.

As part of this release, they have established a user community to fund enhancements, maintenance, and outreach. You can find more information at www.vpoids.org. With this move, they look to increase access to the software as well as accelerate development.





Using Microservices to Expand Their System Capabilities


The Angel Flight West system makes use of an application layer built in PHP and Symfony for the user-event response loop for website events as well as for access and storage of backend data which includes membership data, passengers, requesters, flight information, and other essential application data. The system also makes use of workers running within IronWorker to run processes asynchronously in the background.

This dual structure allows them to better segment real-time responses to user events from the processing of events and workloads in the background. An example of this latter type of processing is generating itinerary-form PDFs that get emailed to the pilots and passengers for each flight. This process is scheduled and generated using workers running within IronWorker. They also use asynchronous processing to download geolocation data from the Google Mapping APIs to provide distance information and driving directions for ground volunteers.

This type of distributed processing pattern – isolating task-specific actions within workers and then calling them directly or through webhooks – allows them to write and scale these background processes separately from their application. This approach was born out of the need to keep adding functionality to their system, but do so in a simpler and more isolated structure. One future project they have outlined using this microservices approach includes creating a capability that matches available flights with volunteer pilots and generate targeted alerts. They’re also looking at integrating Twilio into their stack for sending text alerts and updates for travel days, in which case they would look to expose these actions as thin microservices.

Definition of a Microservice
The general definition of a microservice is that of a small, RESTful, HTTP API that provides a simple interface for a single processing event. Generating and sending a form is but one example. Other examples including anything that might take a workload and generate a result. A microservice is a logical progression of an embedded function in a program or an action or method call in an object-oriented framework. The difference is that the interface is via an HTTP request and that the processing is handled independently by the microservice ideally in a distributed asynchronous manner.

The beauty of building small microservices instead of continually embedding functionality into a monolithic app is that Angel Flight West can have engineers donate time to build very specific functions.

This development approach lets them better compartmentalize their engineering work – allowing volunteer developers to take projects from idea to completion independently while reducing any overhead integrating the capabilities into a major new app release. One or more developers can write a worker to perform a task, wrap it with a webhook interface and produce an output in whatever language might be appropriate to the task. Angel Flight West uses PHP but it could be any language given their worker platform supports every major language.

This structure increases development speed by letting them use a small team with interchangeable parts to innovate at a faster pace. It also reduces risk because changes in a microservice do not extend into the other parts of the system. And it scales seamlessly because the services use a framework that gives them the concurrency and workload processing they need without having to provision it directly.



A Cause for Good


From its days of implementing early web technologies to its sharing of its solution and its release as open source to its growing reliance on microservices to power new capabilities, Angel Flight West and its small team of technologists have shown they are innovators when it comes to delivering tools to aid in its mission.

But the best technology and technical stack in the world doesn’t do much good unless it solves a problem or is in service of a cause.

In the case of Angel Flight West and the thousands of flights they provide to those in need, that cause is unquestionably a good one.


Godspeed to Angel Flight West, its aviators, staff, and volunteers on this December morn. 






What Stephan Fopeano, Chief Technologist for Angel Flight West, says about their evolving technical stack and Iron.io helps power their asynchronous processing.


What does the combination of PHP and Symfony give to you?
Mason Flying Above Arizona
Stephan Fopeano: PHP is of course a very common language, so there are tons of resources available for developers that speeds work and promotes best practices. Being open source is very important for us as a non-profit, for cost reasons and also to make sure our application will be maintainable into the future. 

Symfony is a great framework which speeds development time. It provides a set of reusable PHP components designed for web projects like ours. It doesn’t hurt that it’s a solid standard and trusted by some of the best PHP applications around.



How big is the main application and how have you distributed the workloads? 
The application itself is reasonably complex, since it manages flight coordination, membership, a customer database of passengers and requesters, as well as metrics and other miscellaneous functions. 

Where distributed workload processing is key is in functions that are triggered by user actions. Iron.io lets us get all the work done based on these triggers but doing so in the background without making the user wait.


Take us through the development of a worker / microservice? 
Matthew Checking if the Pattern is Full
For our first workers, we hired a consultant on Elance. He helped us figure out the basics of the architecture and the environment, and he helped us establish some best practices. Based on this work, we’re off and running. The basic structure will be the same for each new capability – one or more task-specific workers running on a worker queue operated through a webhook.


How does a microservices platform help you?
Non profits are always looking to economize and are short on resources for infrastructure. Developing microservices and creating workers gives Angel Flight West the ability to use resources as needed, and more importantly, to try new ideas without upfront costs.

One way we do this is to use the distributed processing capabilities inherent in the Iron.io platform to see if an idea is going to work, figure out the metrics, and them raise funds to support expansion and growth if and when it proves successful. Having the ability to scale a new feature without us having to manage any infrastructure is key to increasing our speed and pace of development.


Priscilla w/Pilot Bryan Painter
What do you think the impact of microservices and distributed cloud development is for large organizations?
While each organization in the Air Charity Network is pretty small, a good deal of complexity emerges from the fact that 10+ organizations use instances of the same application each with some slightly different needs and expectations for what they system should do. 

Using a microservices pattern helps us build in new functionality that’s tailored to specific needs of individual organizations, in a flexible, agile manner without compromising current capabilities or putting an unnecessary burden or block on future development.


What other functions does technology serve for the organization?
In addition to helping us run the operations, Angel Flight West has become increasingly dependent on marketing and outreach. We estimate that we are only serving about 10% of the people who could use our services.

Kenasyn and Mom w/Pilot Jamie Griffin
We’re making good use of digital marketing channels and services but we have big plans to get the more of the word out, especially where it counts – those whose lives are in turmoil and where financial, physical, and emotional difficulties can make ordinary trips extraordinarily difficult.


Any recommendations for organizations and companies large and small who want to get things done fast using small nimble teams?
Development of an application of any merit is a team effort that’s largely based on passion for a cause. In our case, we have our eyes on the prize of using aviation skills and resources to benefit those in need. It’s our cause and our mission and that singular focus helps eliminate ego and creates a great environment of cooperation, which is also critical for working effectively together. I don’t know if it maps to all organizations but it certainly works here.


How can readers find out more about Angel Flight West  and how can developers get involved?
To find out more about Angel Flight West on our website as well as our recent stats in the enclosed table. If you’re interested in a deeper view, you can see our transparency report at GuideStar. If you’re a developer and want to get involved – even building out a small component – please connect with us via the website. 

We’re always looking out for donations of time and money. But we also look for help in spreading the word whether that’s via a private mention to a friend or a tweet or a facebook post. We love it when the efforts of our pilots and volunteers receive public mention and praise.

We’re just as happy though when people simply read about what our team members do and the many silent thank you's, smiles, and prayers that result.




About Stephan Fopeano

Stephan Fopeano
Stephan Fopeano is no longer a current pilot himself but has found a way to be a part and give back to the aviation community by bringing his technical skillset to evolve Angel Flight West's technology. He has been working with Angel Flight West since 1989, bringing 20+ years of software development and data analysis experience.

Stephan is founder and owner of Meliorist Technology, a product development and product strategy consulting for big data applications, predictive analytics, and dashboard development. Clients include Angel Flight West plus ten other charitable aviation groups across the country, Disney/ABC, Sony Pictures Television Networks, CNN, PBS, Scholastic Media, as well as many small non-profits and start-ups. You can reach him at webmaster (at) angelflightwest.org. 


About Angel Flight West

For those whose lives are in turmoil, the financial, physical, and emotional burdens can make ordinary trips extraordinarily difficult. That’s where Angel Fight West comes in. Their network of 1,800+ volunteer pilots fly their own planes and pay for all costs out of their own pockets, in order to make these critical journeys.

Their pilots are engineers, scientists, and teachers. Doctors, lawyers, and corporate executives. Retired commercial pilots and young entrepreneurs. But as different as these men and women might be, they all have two things in common: The love of flying, and the desire to touch people’s lives.







About Iron.io

To try the Iron.io platform for free, sign up for an account at Iron.io. Included with the free plan is a 30-day trial of advanced features so that you can see how running code in the cloud at scale can change the way you think about application development.



Friday, December 19, 2014

Iron.io Launches IronWorker within the Azure Marketplace


Iron.io is pleased to announce it is offering its IronWorker platform as an Application Service within the Azure Marketplace, providing a key infrastructure component that gives developers immediate access to highly scalable event-driven computing services.

Every application in the cloud needs to process workloads on a continuous basis, at scale, on a schedule, or in the background. IronWorker is a modern application development platform for processing at a task level by isolating code packages and dependencies in a containerized compute environment managed by Iron.io.


IronWorker on Microsoft Azure

Developers can use IronWorker to develop scalable distributed cloud architectures from the start, turn custom client-server applications into cloud-based microservices, serve as a mobile compute cloud, or incorporate highly concurrent workload processing into their applications without the need for complex orchestration, overhead, and maintenance.

By collaborating with Iron.io, Microsoft gives its customers flexibility in their application architectures and cloud migrations. By making use of IronWorker’s multi-language support, enterprise organizations can move individual components to the cloud while maintaining safe and secure application environments. IronWorker can also act as a key processing gateway to Azure component services including storage, queues, mobile services, and more, making it easy to create hybrid solutions of existing client-server applications and cloud-based microservices.


Through our expansive ecosystem, we are providing customers with the solutions they need to deploy their critical applications seamlessly in the cloud and create hybrid connections with on-premises. Building on our great Docker support in Azure, we are excited IronWorker is now available in the Azure Marketplace, to simplify deployment of the fantastic event-driven and processing technology created by Iron.io.
– Corey Sanders, Director of Program Management, Microsoft Azure 


Building Modern Cloud Solutions

IronWorker provides developers a friendly environment for handling a variety of event-driven asynchronous processing use cases including streamlined ETL pipelines, processing large files and big data sets, sending email and notifications in bulk, and more. By extending these capabilities to Azure, Microsoft customers can move much of their scale-out tasks to the cloud in an efficient and easy to use manner.

Iron.io leverages Docker as its core task processing environment, and has launched over 500 million containers since moving its capabilities into production earlier in the year. The IronWorker platform currently offers over 15 different Docker-based environments for specific language versions and essential libraries with additional capabilities coming soon.


Microsoft has built a world-class computing platform in Azure. Iron.io brings to Azure a proven technology that provides an immediate way for Microsoft customers to migrate applications and services into the cloud as well as provide revolutionary component technology that lets them extend their use of the cloud even further.
                                – Chad Arimura, CEO, Iron.io





Getting Started With IronWorker on Azure

Users of Azure are today able to add IronWorker as a service by visiting the Azure Marketplace. We have written instructions for adding the IronWorker service in our Documentation. Developers can then write and package task code for deployment to IronWorker’s processing environment within Azure. The Iron.io dashboard built into Azure provides detailed insight into the state of your tasks for monitoring your complete application activity and performance.

IronWorker is currently available in the West US region of Azure, and supports multiple languages including Go, Java, Ruby, PHP, Python, Node.js, and .NET.



To try IronWorker for free, sign up for an account at Iron.io. We’ll even give you a trial of some of the advanced features so that you can see how processing at scale will change the way you view modern application development.


Wednesday, November 26, 2014

Reed Allman Speaking on the RocksDB Meetup Dec 4th

Reed Allman, a systems-level engineer at Iron.io, will be talking at the RocksDB meetup on Thursday, December 4th, 2014. The meeting will be at the Facebook headquarters in Menlo Park, CA.

RocksDB is an embeddable open-source key/value database that is a fork of LevelDB. It is designed to be scalable to run on servers with many CPU cores, to efficiently use fast storage, to support IO-bound, in-memory and write-once workloads, and to be flexible to allow for innovation.

For more background on RocksDB, see Dhruba Borthakur’s talk from the Data@Scale 2013 conference as well as this story on the history of RocksDB.


Here's the description of Reed's talk:
Building queues that are Rocks solid (and other marginal puns) 
Iron.io started out experimenting with LevelDB and we ended up using RocksDB. We'll walk through a naive queue implementation – one that would have minimal use in practice, but seeing as we're programmers we're into that kind of thing. We'll take a queue on LevelDB and punish it to see how it performs. We'll then take it to RocksDB and do the same performance tests then we'll compare results and show why RocksDB makes great sense for use for a persistence layer.  
About the Speaker  Reed Allman is a system-level engineer for Iron.io working in Go to solve hard problems within high-scale fault-tolerant distributed systems. Prior to Iron.io, he worked on a research project with Google to build refactoring tools for the Go language. By his estimation, he's read the language spec more times than is healthy and has gained a somewhat irrational view of programming in anything that doesn’t have channels.


Here's the full agenda for the evening.



For information on RocksDB at Iron.io, here's a post on IronMQ v3 and the on-premise version built for enterprise and carrier private cloud deployments. If you need high-availability message queuing in public or private clouds, feel free to reach out to us.

Wednesday, November 19, 2014

Iron.io Adds Named Schedules

Iron.io is pleased to announce named schedules as a feature in its IronWorker service. Giving names or labels to schedules may seem a small feature but it’s been a common request from a number of users managing large workloads.

Users can now give scheduled tasks labels when uploading the tasks to IronWorker or add them later via the Dashboard. The tags will appear in the Dashboard alongside the schedule to make it easier to keep track of the what’s happening in the background of an application.


Making Use of Named Schedules

Named schedules are available for all plans – including the Lite/Free plan. To make use of named schedules, all users need to do is include a name or label when uploading a scheduled task to IronWorker. 

Simply use the '--label' param along with the name of the schedule:

$ iron_worker schedule import_worker --label "Critical Task" --start-at "2015-01-01T00:00:00-00:00" --run-every 3600

You can also add or amend a label within the Dashboard using the 'Label' field.

Scheduled Tasks now contain a "Label" field


The labels will then appear in the list of Schedules Tasks.

Named Schedules

Getting Started 

To try IronWorker for free, sign up for an account at Iron.io.

We’ll provide a trial of some of the advanced features so that you can see how running code in the cloud at scale will change the way you think about application development.

On-demand processing awaits.


Tuesday, October 21, 2014

Docker in Production — What We’ve Learned Launching Over 300 Million Containers

Docker in Production at Iron.io
Earlier this year, we made a decision to run every task on IronWorker inside its own Docker container. Since then, we've run over 300,000,000 programs inside of their own private Docker containers on cloud infrastructure.

Now that we’ve been in production for several months, we wanted to take the opportunity to share with the community some of the challenges we faced in running a Docker-based infrastructure, how we overcame them, and why it was worth it.

IronWorker is a task queue service that lets developers schedule and process jobs at scale without having to set up or manage any infrastructure. When we launched the service 3+ years ago, we were using a single LXC container to contain all the languages and code packages to run workers in. Docker allowed us to easily upgrade and manage a set of containers allowing us to offer our customers a much greater range of language environments and installed code packages.

We first started working with Docker v0.7.4 and so there have been some glitches along the way (not shutting down properly was a big one but has since been fixed). We’ve successively worked through almost all of them, though, and finding that Docker is not only meeting our needs but also surpassing our expectations. So much so that we’ve been increasing our use of Docker across our infrastructure. Given our experience to date, it just makes sense.


The Good

Here is a list of just a few of the benefits we’ve seen:

Large Numbers at Work

Easy to Update and Maintain Images

Docker’s 'git' like approach is extremely powerful and makes it simple to manage a large variety of constantly evolving environments, and its image layering system allows us to have much finer granularity around the images while saving disk space. Now, we’re  able to keep pace with rapidly updating languages, plus we’re able to offer specialty images like a new ffmpeg stack designed specifically for media processing. We’re up to 15 different stacks now and are expanding quickly.

Resource Allocation and Analysis


LXC-based containers are an operating system–level virtualization method that let containers share the same kernel, but such that each container can be constrained to use a defined amount of resources such as CPU, memory, and I/O. Docker provides these capabilities and more, including a REST API, environmental version control, pushing/pulling of images, and easier access to metric data. Also, Docker supports a more secure way to isolate data files using CoW filesystem. This means that that all changes made to files within a task are stored separately and can be cleaned out with one command. LXC is not able to track such changes.

Easy Integration With Dockerfiles

We have teams located around the world. Being able to post a simple Dockerfile and rest easy, knowing that somebody else will be able to build the exact same image as you did when they wake up is a huge win for each of us and our sleep schedules. Having clean images also makes it much faster to deploy and test. Our iteration cycles are much faster and everybody on the team is much happier.
Custom Environments Powered by Docker

A Growing Community

Docker is getting updates at an extremely fast rate (faster than Chrome even). Better yet, the amount of community involvement in adding new features and eliminating bugs is exploding. Whether it’s supporting images, supporting Docker itself, or even adding tooling around Docker, there are a wealth of smart people working on these problems so that we don’t have to. We’ve found the Docker community to be extremely positive and helpful and we’re happy to be a part of it.

Docker + CoreOS

We’re still tinkering here but the combination of Docker and CoreOS looks like it will have a solid future within our stack. Docker provides stable image management and containerization. CoreOS provides a stripped-down cloud OS and machine-level distributed orchestration and virtual state management. This combination translates into a more logical separation of concerns and a more streamlined infrastructure stack than presently available.


The Challenges

Every server-side technology takes fine-tuning and customization especially when running at scale and Docker is no exception. (To give you some perspective, we run just under 50 million tasks and 500,000 compute hours a month and are rapidly updating the images we make available.)

Here are a few challenges we’ve come across in using Docker at heavy volume:

Docker Errors – Limited and Recoverable

Limited Backwards Compatibility

The quick pace of innovation in the space is certainly a benefit but it does have its downsides. One of these has been limited backwards compatibility. In most cases, what we run into are primarily changes in command line syntax and API methods and so it's not as critical an issue from a production standpoint.

In other cases, though, it has affected operational performance. By way of example, in the event of any Docker errors after launching containers, we'll parse STDERR and respond based on the type of error (by retrying a task, for example). Unfortunately the output format for the errors has changed on occasion from version to version and so we've ended up having to debug on the fly as a result.

Issues here are relatively easy to get through but it does mean every update needs to be validated several times over and you’re still left open until you get it released into the land of large numbers. We should note that we started months back with v0.7.4 and recently updated our system to use v1.2.0 and so we have seen great progress in this area.

Limited Tools and Libraries

While Docker had a production-stable release 4 months ago, a lot of the tooling built around it is still unstable. Adopting most of the tools in the Docker ecosystem means adopting a fair amount of overhead as well. Somebody on your team is going to have to stay up to date and tinker with things fairly often in order to address new features and bug fixes. That said, we’re excited about some of the tools being built around Docker and can’t wait to see what wins out in a few of the battles (looking at you, orchestration). Of particular interest to us are etcd, fleet, and kubernetes.


Triumphing Over Difficulty

To go in a bit more depth on our experiences, here are some of the issues we ran into and how we resolved them.

An Excerpt from a Debugging Session
This list come mostly from Roman Kononov, our lead developer of IronWorker and Director of Engineering Operations, and Sam Ward who has also been instrumental in debugging and rationalizing our Docker operations.

We should note that when it comes to errors related to Docker or other system issues, we’re able to automatically re-process tasks without any impact to the user (retries are a built-in feature of the platform).

Long Deletion Times

The Fix For Faster Container Delete 
Deleting containers at the onset took way too long and required too many disk I/O operations. This caused significant slowdowns and bottlenecks in our systems. We were having to scale the number of cores available to a much higher number than we should have needed to.

After researching and playing with devicemapper (a docker filesystem driver), we found specifying an option that did the trick `--storage-opt dm.blkdiscard=false`. This option tells Docker to skip an expensive disk operation when containers are deleted, which greatly speeds up the container shutdown process. Once the delete script was modified, the problem went away.

Volumes Not Unmounting

Containers wouldn’t stop correctly because Docker was not unmounting volumes reliably. This caused containers to run forever, even after the task completed. The workaround was unmounting volumes and deleting folders explicitly using an elaborate set of custom scripts. Fortunately this was in the early days when we were using docker v0.7.6. We removed this lengthy scripting once the unmount problem was fixed in docker v0.9.0.
Breakdown of Stack Usage

Memory Limit Switch

One of the Docker releases suddenly added memory limit options and discarded the LXC options. As a result, some of the worker processes were hitting memory limits which then caused the entire box to become unresponsive. This caught us off guard because Docker was not failing even with unsupported options being used. The fix was simple to address – i.e. apply the memory limits within Docker – but the change caught us off guard.



Future Plans

As you can see, we’re pretty heavily invested in Docker and continue to get more invested in it every day. In addition to using it for containment for running user code within IronWorker, we’re in the process of using it for for a number of other areas in our technical stack.

These areas include:

IronWorker Backend

In addition to using Docker for task containers, we’re in the process of using it to manage the processing that take place within each server that manage and run worker tasks. (The master task on each runner takes jobs from the queue, loads in the right docker environment, runs the job, monitors it, and then tear-down the environment after it runs.) The interesting thing here is that we’ll have containerized code managing other containers on the same machines. Putting all of our worker infrastructure environment within Docker containers also allows us to run them on CoreOS pretty easily.

IronWorker, IronMQ, and IronCache APIs

We’re no different from other ops teams in that nobody really likes doing deployments. And so we’re excited about wrapping all of our services in Docker containers for easy, deterministic environments for deployments. No more configuring servers. All we need are servers that can run Docker containers and, boom, our services are loaded. Should also note that we’re replacing our build servers – the servers that build our product releases for certain environments – with Docker containers. The gain here is greater agility and a simpler, more robust stack. Stay tuned.

Building and Loading Workers

We’re also experimenting with using Docker containers as a way to build and load workers into IronWorker. A big advantage here is that this provides a streamlined way for users to create task-specific workloads and workflows, upload them, and then run them in production at scale. Another win here is that users can test workers locally in the same environment as our production service.

Enterprise On-Prem Distribution

Using Docker as a primary distribution method our IronMQ Enterprise on-premises version simplifies our side of the distribution and provides a simple and universal method to deploy within almost any cloud environment. Much like the services we run on public clouds, all customers need are servers that can run Docker containers and they can get multi-server cloud services running in a test or production environment with relative ease.


From Production To Beyond

The Evolution of IT
(excerpted from docker.com)
Docker has come a long way in the past year and a half since we saw Solomon Hykes launch it and give a demo on the same day at a GoSF meetup last year. With the release of v1.0, Docker is quite stable and has proven to be truly production ready.

The growth of Docker has also been impressive to see. As you can see from the list above, we’re looking forward to future possibilities but we're also grateful that the backwards view has been as smooth as it’s been.

Now only if we could get this orchestration thing resolved.






The Story Behind Our Use of Docker 
UPDATE: For additional background on our use of Docker, take a look at the earlier post that we wrote called How Docker Helped Us Achieve the (Near) Impossible. In it, we discuss the decisions behind using Docker,  the requirements we had going in, and more details on what it enables us to do.






For more insights on Docker as well as our emerging impressions of CoreOS, you can watch this space or sign up for our newsletter. Also, feel free to email us or ping us on twitter if you have any questions or want to share insights.


To try IronWorker for free, sign up for an account at Iron.io. We’ll even give you a trial of some of the advanced features so that you can see how processing at scale will change the way you view modern application development. 



About the Authors

Travis Reeder is co-founder and CTO of Iron.io, heading up the architecture and engineering efforts. He is a systems architect and hands-on technologist with 15 years of experience developing high-traffic web applications including 5+ years building elastic services on virtual infrastructures. He is an expert in Go and is a leading speaker, writer, and proponent of the language. He is an organizer of GoSF (1450 members) and author of the following posts:



Roman Kononov is Director of Engineering Operations at Iron.io and has been a key part of integrating Docker into the Iron.io technology stack. He has been with Iron.io since the beginning and has built and contributed to every bit of Iron.io’s infrastructure and operations framework. He lives in Kyrgyzstan and operates Iron.io’s remote operations team.


Additional Contributors – Reed Allman, Sam Ward


About Iron.io

Iron.io is the maker of IronMQ, an industrial-strength message queue, and IronWorker, a high-concurrency task processing/worker service. Every production system uses queues and worker systems to connect systems, power background processing, process transactions, and scale out workloads. Iron.io's products are easy to use and highly available and are essential components for building distributed applications and operating at scale. Learn more at Iron.io.


About Docker

Docker is an open platform for distributed applications for developers and sysadmins. Learn more at docker.com.

Monday, October 20, 2014

CEO Chad Arimura Speaking at Data 360 Conference on Real-time Data

Data 360° Conference (Oct 22-23, 2014)
Chad Arimura, CEO and Co-Founder of Iron.io, will be speaking at the Data 360° Conference in Santa Clara this week.

The conference brings together leading figures in data processing and analysis to discuss trends in big data, cloud infrastructure, real-time data analysis, and distributed computing. Specific emphasis is on these topics in the world of healthcare, retail, finance, and IT services but the principles apply in any industry.

Here's the panel Chad will be speaking on:
Wed, 3:00 PM (Oct 22nd)
Resources for Real-time Results
Big data tools are now widely used due to resources like storage, compute and analytics largely available. The panel discusses how IT decision makers are considering where to invest to achieve real-time results using proprietary resources.
Speakers
James Collom (Aisloc) - Moderator
Mark Theissen (Cirro Inc.)
Sundeep Sanghavi (DataRPM)
Chad Arimura (Iron.io)
Chad Arimura
The conference runs Wed/Thurs, October 22-23, 2014 at the Santa Clara Marriott Hotel. Other speakers are from companies that include EMC, Cloudera, Twitter, Google, Cisco, Splunk, GE, AT&T, TIBCO, CSC, Verizon, and more.  If you're at the conference, be sure to come up and say hello.

A Few of the Conference Speakers