Shepherding Containers

Shepherding Containers

Shepherding Containers

Thanks to Yahoo for the base image CC BY 2.0

As one of the earliest users of Docker, I’ve had the pleasure of creating and working with multiple different platforms built on containers. Each platform has evolved in step with current ecosystem around it, and I’ve gotten the chance to really put Docker’s “batteries included, but removable” philosophy to the test.

Here at Iron.io, we have launched over 1 billion user containers in production, not to mention the containers we launch to keep our services running. The massive volume of containers we launch is enough to place great demands upon any platform that we use.

In our search for the right direction for the evolution of our platform, we’ve explored as many tools as possible. The release of Docker 1.9, combined with production-ready Docker Swarm and Docker Networking, brings a lot of value to those wanting to roll their own platforms.

Monoliths, Megaliths, and Microservices

It’s easy for monoliths to be opinionated. When running a single codebase, on a single platform, in a single language, opinions make it easy to focus on your real world problem. In my personal projects, when it’s appropriate, I’ll still happily use Ruby on Rails, deployed with Capistrano, in a Docker container, to a small VPS.

Life becomes more interesting with microservices. Whether or not microservices are the right approach, they are an integral part of this generation of computing. Iron.io has enabled companies to successfully eliminate a bias towards one single programming language, comprising one monolithic code base. Engineering teams can now productively explore writing software using the right tool for the job rather than the company standard. We provide our users with an elastic compute cloud capable of running any code in any language. I believe there is a chaotic beauty in this for any company using microservices.

In this microservice-oriented world, convention over configuration becomes more difficult at the software layer. I have new problems to solve like federated authentication and authorization, and can’t simply drop Devise and CanCan into my application, write the DSL, and go. I’m far more likely to write a tiny service in Go or Ruby, and deploy an API that does one small component that makes up my entire system.

This policy also applies to infrastructure. Our infrastructure at Iron.io is guaranteed to look very different from anybody else’s. Kubernetes provides one-line installers for multiple infrastructure providers. By default, this is a very opinionated, dare I say megalithic stack. Leader nodes include the Kubernetes master APIs, an ElasticSearch-LogStash-Kibana stack, and InfluxDB-Grafana with metrics being received by Heapster. Follower nodes send container statistics through cAdvisor, while running a local agent for working with the Kubernetes API.

I really enjoy working with this stack. Kubernetes is an incredible cluster manager and has worked well for our general use case. However, in our environment, we couldn’t simply spin up a conventional Kubernetes stack, migrate everything over, and walk away. We have security requirements for our customers, networking requirements around dedicated clusters, as well as a task-based platform that has it’s own lifecycle management. In short, we needed to come to a deep understanding of how Kubernetes worked, and found we needed to quickly make major investments to make it work for us.

For Iron.io, this became a large task for a three-person operations team.

Batteries Included, but Removable

The Docker ecosystem has grown, and that hasn’t been without controversy around the project. The company has taken a “batteries included, but removable approach.”

Even with everything in the Docker 1.9 release, it’s not a platform, rather a set of primitives that can be used independently or in conjunction with another. With the full stack, software still needs to be written to manage all of these pieces in a form specific to each and every use case. I think this is where Docker and friends shine for us.

As of today, we can use the multitude of Docker Volume and Networking plugins without an abstraction making decisions for us. We have begun the creation of our own docker-machine driver specific to our needs, while taking advantage of the community’s efforts there. Not only that, we can automate this process thanks to libmachine’s abstraction from the CLI app.

Beyond that, we avoid the impedance mismatch between a cluster manager and the container engine. Providing the best service we can for our customers requires our tools to be incredibly pluggable, and Docker is more plugin-based than ever. Cluster managers have done a fantastic job of allowing parts to be replaced, but that’s the crux of it: the only option is replacing the whole part. My hope is that will change in the future, and I look forward to tools like Deis helm to solve this problem.

There are still things we would like to see in Docker as well. Right now, Swarm scheduling strategies aren’t pluggable, and can be somewhat simplistic. I look forward to this being solved.

For now, Iron.io has decided to take a purely Docker-based approach to its infrastructure. This isn’t to say we won’t add Kubernetes to our infrastructure the future — it solves the majority use cases for the majority of our stack. However, as a core abstraction we rely on, I simply see it as another battery we put into the system.