Blog

Search This Blog

Loading...

Tuesday, October 21, 2014

Docker in Production — What We’ve Learned Launching Over 300 Million Containers

Docker in Production at Iron.io
Earlier this year, we made a decision to run every task on IronWorker inside its own Docker container. Since then, we've run over 300,000,000 programs inside of their own private Docker containers on AWS infrastructure.

Now that we’ve been in production for several months, we wanted to take the opportunity to share with the community some of the challenges we faced in running a Docker-based infrastructure, how we overcame them, and why it was worth it.

IronWorker is a task queue service that lets developers schedule and process jobs at scale without having to set up or manage any infrastructure. When we launched the service 3+ years ago, we were using LXC containers to contain all the languages and code packages to run workers in. Docker allowed us to easily upgrade and manage our containers and lets us offer our customers a much greater range of language environments and installed code packages.

We first started working with Docker v0.7.4 and so there have been some glitches along the way (not cleaning up or shutting down properly was a big one but has since been fixed). We’ve successively worked through almost all of them, though, and finding that Docker is not only meeting our needs but also surpassing our expectations. So much so that we’ve been increasing our use of Docker across our infrastructure. Given our experience to date, it just makes sense.


The Good

Here is a list of just a few of the benefits we’ve seen:

Large Numbers at Work

Easy to Update and Maintain Images

Docker’s 'git' like approach is extremely powerful and makes it simple to manage a large variety of constantly evolving environments, and its image layering system allows us to have much finer granularity around the images while saving disk space. Now, we’re  able to keep pace with rapidly updating languages, plus we’re able to offer specialty images like a new ffmpeg stack designed specifically for media processing. We’re up to 15 different stacks now and are expanding quickly.

Resource Allocation and Analysis


LXC-based containers are an operating system–level virtualization method that let containers share the same kernel, but such that each container can be constrained to use a defined amount of resources such as CPU, memory, and I/O. Docker provides these capabilities and more, including a REST API, environmental version control, pushing/pulling of images, and easier access to metric data. Also, Docker supports a more secure way to isolate data files using CoW filesystem. This means that that all changes made to files within a task are stored separately and can be cleaned out with one command. LXC is not able to track such changes.

Easy Integration With Dockerfiles

We have teams located around the world. Being able to post a simple Dockerfile and rest easy, knowing that somebody else will be able to build the exact same image as you did when they wake up is a huge win for each of us and our sleep schedules. Having clean images also makes it much faster to deploy and test. Our iteration cycles are much faster and everybody on the team is much happier.
Custom Environments Powered by Docker

A Growing Community

Docker is getting updates at an extremely fast rate (faster than Chrome even). Better yet, the amount of community involvement in adding new features and eliminating bugs is exploding. Whether it’s supporting images, supporting Docker itself, or even adding tooling around Docker, there are a wealth of smart people working on these problems so that we don’t have to. We’ve found the Docker community to be extremely positive and helpful and we’re happy to be a part of it.

Docker + CoreOS

We’re still tinkering here but the combination of Docker and CoreOS looks like it will have a solid future within our stack. Docker provides stable image management and containerization. CoreOS provides a stripped-down cloud OS and machine-level distributed orchestration and virtual state management. This combination translates into a more logical separation of concerns and a more streamlined infrastructure stack than presently available.


The Challenges

Every server-side technology takes fine-tuning and customization especially when running at scale and Docker is no exception. (To give you some perspective, we run just under 50 million tasks and 500,000 compute hours a month and are rapidly updating the images we make available.)

Here are a few challenges we’ve come across in using Docker at heavy volume:

Docker Errors – Limited and Recoverable

Limited Backwards Compatibility

The quick pace of innovation in the space is certainly a benefit but it does have its downsides. One of these has been limited backwards compatibility. In most cases, what we run into are primarily changes in command line syntax and API methods and so it's not as critical an issue from a production standpoint.

In other cases, though, it has affected operational performance. By way of example, in the event of any Docker errors after launching containers, we'll parse STDERR and respond based on the type of error (by retrying a task, for example). Unfortunately the output format for the errors does change from version to version and so we end up having to debug on the fly.

Issues here are relatively easy to get through but it does mean every update needs to be validated several times over and you’re still left open until you get it released into the land of large numbers. We should note that we started months back with v0.7.4 and recently updated our system to use v1.2.0 and so we have seen great progress in this area.

Limited Tools and Libraries

While Docker had a production-stable release 4 months ago, a lot of the tooling built around it is still unstable. Adopting most of the tools in the Docker ecosystem means adopting a fair amount of overhead as well. Somebody on your team is going to have to stay up to date and tinker with a things fairly often in order to address new features and bug fixes. That said, we’re excited about some of the tools being built around Docker and can’t wait to see what wins out in a few of the battles (looking at you, orchestration). Of particular interest to us are etcd, fleet, and kubernetes.


Triumphing Over Adversity

To go in a bit more depth on our experiences, here are some of the issues we ran into and how we resolved them.

An Excerpt from a Debugging Session
This list come mostly from Roman Kononov, our lead developer of IronWorker and Director of Engineering Operations, and Sam Ward who has also been instrumental in debugging and rationalizing our Docker operations.

We should note that when it comes to errors related to Docker or other system issues, we’re able to automatically re-process tasks without any impact to the user (retries are a built-in feature of the platform).

Long Deletion Times

The Fix for Faster Unmounting
Deleting containers at the onset took way too long and required too many disk I/O operations. This caused significant slowdowns and bottlenecks in our systems. We were having to scale the number of cores available to a much higher number than we should have needed to.

After researching and playing with devicemapper (a docker filesystem driver), we found specifying an option that did the trick `--storage-opt dm.blkdiscard=false`. This option tells Docker to skip an expensive disk operation when containers are deleted, which greatly speeds up the container shutdown process. Once the delete script was modified, the problem went away.

Volumes Not Unmounting

Containers wouldn’t stop correctly because Docker was not unmounting volumes reliably. This caused containers to run forever, even after the task completed. The workaround was unmounting volumes and deleting folders explicitly using an elaborate set of custom scripts. Fortunately this was in the early days when we were using docker v0.7.6. We removed this lengthy scripting once the unmount problem was fixed in docker v0.9.0.
Breakdown of Stack Usage

Memory Limit Switch

One of the Docker releases suddenly added memory limit options and discarded the LXC options. As a result, some of the worker processes were hitting memory limits which then caused the entire box to become unresponsive. This caught us off guard because Docker was not failing even with unsupported options being used. The fix was simple to address – i.e. apply the memory limits within Docker – but the change caught us off guard.



Future Plans

As you can see, we’re pretty heavily invested in Docker and continue to get more invested in it every day. In addition to using it for containment for running user code within IronWorker, we’re in the process of using it for for a number of other areas in our technical stack.

These include:

IronWorker Backend

In addition to using Docker for task containers, we’re in the process of using it to manage the processing that take place within each server that manage and run worker tasks. (The master task on each runner takes jobs from the queue, loads in the right docker environment, runs the job, monitors it, and then tear-down the environment after it runs.) The interesting thing here is that we’ll have containerized code managing other containers on the same machines. Putting all of our worker infrastructure environment within Docker containers also allows us to run them on CoreOS pretty easily.

IronWorker, IronMQ, and IronCache APIs

We’re no different from other ops teams in that nobody really likes doing deployments. And so we’re excited about wrapping all of our services in Docker containers for easy, deterministic environments for deployments. No more configuring servers. All we need are servers that can run Docker containers and, boom, our services are loaded. Should also note that we’re replacing our build servers – the servers that build our product releases for certain environments – with Docker containers. The gain here is greater agility and a simpler, more robust stack. Stay tuned.

Building and Loading Workers

We’re also experimenting with using Docker containers as a way to build and load workers into IronWorker. A big advantage here is that this provides a streamlined way for users to create task-specific workloads and workflows, upload them, and then run them in production at scale. Another win here is that users can test workers locally in the same environment as our production service.

Enterprise On-Prem Distribution

Using Docker as a primary distribution method our IronMQ Enterprise on-premises version simplifies our side of the distribution and provides a simple and universal method to deploy within almost any cloud environment. Much like the services we run on public clouds, all customers need are servers that can run Docker containers and they can get multi-server cloud services running in a test or production environment with relative ease.


From Production To Beyond

The Evolution of IT
(excerpted from docker.com)
Docker has come a long way in the past year and a half since we saw Solomon Hykes launch it and give a demo on the same day at a GoSF meetup last year. With the release of v1.0, Docker is quite stable and has proven to be truly production ready.

The growth of Docker has also been impressive to see. As you can see from the list above, we’re looking forward to future possibilities but we're also grateful that the backwards view has been as smooth as it’s been.

Keep an eye out for more insights on Docker and news from Iron.io – including our impressions of CoreOS. You can watch this space or sign up for our newsletter. Also, feel free to email us or ping us on twitter if you have any questions or just want to share your insights. We'd love to get your input.

Now only if we could get that orchestration thing resolved.





About the Authors

Travis Reeder is co-founder and CTO of Iron.io, heading up the architecture and engineering efforts. He is a systems architect and hands-on technologist with 15 years of experience developing high-traffic web applications including 5+ years building elastic services on virtual infrastructures. He is an expert in Go and is a leading speaker, writer, and proponent of the language. He is an organizer of GoSF (1450 members) and author of the following posts:



Roman Kononov is Director of Engineering Operations at Iron.io and has been a key part of integrating Docker into the Iron.io technology stack. He has been with Iron.io since the beginning and has built and contributed to every bit of Iron.io’s infrastructure and operations framework. He lives in Kyrgyzstan and operates Iron.io’s remote operations team.


Additional Contributors – Reed Allman, Sam Ward


About Iron.io

Iron.io is the maker of IronMQ, an industrial-strength message queue, and IronWorker, a high-concurrency task processing/worker service. Every production system uses queues and worker systems to connect systems, power background processing, process transactions, and scale out workloads. Iron.io's products are easy to use and highly available and are essential components for building distributed applications and operating at scale. Learn more at Iron.io.


About Docker

Docker is an open platform for distributed applications for developers and sysadmins. Learn more at docker.com.

Monday, October 20, 2014

CEO Chad Arimura Speaking at Data 360 Conference on Real-time Data

Data 360° Conference (Oct 22-23, 2014)
Chad Arimura, CEO and Co-Founder of Iron.io, will be speaking at the Data 360° Conference in Santa Clara this week.

The conference brings together leading figures in data analysis to discuss trends in big-data, cloud infrastructure, real-time data analysis, and distributed computing. Specific emphasis is on these topics in the world of healthcare, retail, finance, and IT services but the principles apply in any industry.

Here's the panel Chad will be speaking on:
Wed, 3:00 PM (Oct 22nd)
Resources for Real-time Results
Big data tools are now widely used due to resources like storage, compute and analytics largely available. The panel discusses how IT decision makers are considering where to invest to achieve real-time results using proprietary resources.
Speakers
James Collom (Aisloc) - Moderator
Mark Theissen (Cirro Inc.)
Sundeep Sanghavi (DataRPM)
Chad Arimura (Iron.io)
Chad Arimura
The conference runs Wed/Thurs, October 22-23, 2014 at the Santa Clara Marriott Hotel. Other speakers are from companies taht include EMC, Cloudera, Twitter, Google, Cisco, Splunk, GE, AT&T, TIBCO, CSC, Verizon, and more.  If you're at the conference, be sure to come up and say hello.

A Few of the Conference Speakers

Tuesday, October 14, 2014

Iron.io Adds Longer Running Workers

Long-Running Workers are Now Available in IronWorker
Iron.io is happy to announce long-running workers are now available within IronWorker. Up until now, workers running on the platform have been limited to 60 minutes in duration.

Users on Production and Enterprise plans or using Dedicated Clusters can now have workers that run for hours at a time. This gives users greater flexibility to handle even more extensive asynchronous workflows.

Worker systems are essential for doing transaction and event processing, background processing, and other types of distributed processing. (GitHub once estimated that over 40% of their processing takes place in the background.)


Short-Running Tasks + Longer Running Tasks

A 60-minute limit fits most use cases for a worker system and provides the right balance between processing power, time-in-queue latency, system flexibility, and responsiveness. Great benefits can result when you distribute work across a set of task-specific workers as shown here, here and here – but that’s not always feasible to do and keep task duration to under 60 minutes.

Longer running workers are the answer to these situations where workloads can’t be broken into discrete units or where monitoring and scheduling of a complicated process might extend across a few hours.

Production plans and users on Dedicated clusters can make use of this feature right away. The default for longer running workers is 2 hours but this can be extended by talking with one of our account teams. (Users on Dedicated Clusters just need to let us know what and we can make the change quickly.)

Maximum Duration of Workers in IronWorker

Support for More Advanced Workloads

The release of this new capability joins several others we’ve released over the past few months. These new features on the Iron.io platform include:


These advanced features are in response to a number of conversations with users with heavy workloads and more complicated workflows. 
One example where a longer-running worker might come into play is a large unbroken iteration. Many of our users use Ironworker to process large CSV files, some of which can span millions of rows. We generally suggest breaking down and parsing the files in chunks but in some cases, this can be difficult and even problematic as in the case where line items may need to be processed in order. An example here might be inventory changes or other transaction processing.
• A second example can be a large crawling and compilation operation. Using the PhantomJS stack within IronWorker, users are able to literally take thousands of images and snapshots of websites and build PDFs, gifs, and even video files of the images. In most cases, the translation and transcoding period grows linearly with the size of the required output, which means the amount of processing can quickly go above and beyond the standard 1-hour limit.
• A third example might be in the case of using a master worker to monitor a process that might take longer than 60 minutes. In general, we recommend tasks to queue up other work (master-slave pattern) and scheduled workers to monitor progress but in the case where users want a persistent worker to continually run, longer running workers now provide that capability.  
To see more use cases for a worker system and get more details the examples above, check out this article on top uses of IronWorker as well as some of the success stories on our site. Most certain around shorter duration workers but longer running workers can slot in to give you that extra processing element that you might need.


Making Use of Long-Running Workers

Longer running workers are available to users on Production and Enterprise plans or operating on Dedicated Clusters.

To make use of long-running workers, we first need to enable your account – contact one of our account teams to get provisioned. Once that’s done, all you have to do is include a new timeout value when you queue a task. The maximum limit for long running workers is initially set to 2 hours (7200 seconds) but this can be extended upon request to up to 24 hours for Enterprise accounts and Dedicated Clusters.

Setting a timeout with a curl command (in seconds)
$ curl -H “Content-Type: application/json” -d ‘{"tasks": [{"code_name": "ExampleWorker", "payload": "", "timeout": 7200}]}’ https://worker-aws-us-east-1.iron.io/1/projects/<PROJECT ID>/tasks?oauth=<TOKEN>

Setting a timeout with the CLI tool in IronWorker (in seconds)
$ iron_worker queue <WORKER_NAME> --timeout 7200



Getting Started 

To try IronWorker for free, sign up for an account at Iron.io. We’ll even give you a trial of some of the advanced features so that you can see how processing at scale will change the way you view modern application development.

We can also connect with one of our account teams to dive into solutions at more depth or pair program with one of our developer evangelists to get up and running in minutes.

What are you waiting for? Simple, scalable, long-running processing awaits.

Thursday, October 2, 2014

How to Build an ETL Pipeline for ElasticSearch Using Segment and Iron.io

ETL is a common pattern in the big data world for collecting and consolidating data for storage and/or analysis. Here's the basic process:
  • Extract data from a variety of sources
  • Transform data through a set of custom processes
  • Load data to external databases or data warehouses

Segment + Iron.io + Elasticsearch = A Modern ETL Platform

While it may seem unnecessary to follow this many steps as the tools around Hadoop continue to evolve, forming a cohesive pipeline is still the most reliable way to handle the sheer volume of data.

The extract process deals with the different systems and formats, the transform process allows you to break up the work and run tasks in parallel, and the load process ensures delivery is successful. Given the challenges and potential points of failure that could happen in any one of these steps, trying to shorten the effort and consolidate into one process or toolkit can lead to a big data mess.

IronMQ + IronWorker 
This ETL pattern is a common use case with many of our customers, who will first use IronMQ to reliably extract data from a variety of sources, and then use IronWorker to perform custom data transformations in parallel before loading the events to other locations. This combination of IronMQ and IronWorker not only makes the process straightforward to configure and transparent to operate, it also moves the whole pipeline to the background so as not to interfere with any user-facing systems.

Leveraging the scaling power of Iron.io allows you to break up the data and tasks into manageable chunks, cutting down the overall time and resource allocation. Here’s one example of how HotelTonight uses Iron.io to populate AWS’ RedShift system to give them real-time access to critical information.

In this post, we thought we'd walk through a use case pattern that provides a real world solution for many – creating a customized pipeline from Segment to ElasticSearch using Iron.io.



Segment and Iron.io: An Integration That Does Much More

With the growing number of tools available to developers and marketers alike for monitoring, analytics, testing, and more, Segment is quickly becoming a force in the industry, serving as the unifying gateway for web-based tracking applications. In fact, Segment has become one of our favorite internal tools here at Iron.io thanks to its ability to consolidate the delivery to various APIs with just one script. Whether it's Google Analytics, Salesforce, AdRoll, or Mixpanel just to name a few, Segment eliminates the pain of keeping all of our tracking scripts and tags in order within our website, docs, and customer dashboard for monitoring user activity. How nice.

We're not alone in our appreciation of Segment, and we've included an IronMQ integration of our own that you can read about here. Our integration follows a unique pattern in that it's not just a single point connect, though. Instead, connecting IronMQ to Segment as an endpoint creates a reliable data stream that can then be used for a wide range of use cases. The benefits of doing so include:

  • Data buffering – IronMQ provides a systematic buffer in the case that endpoints may not be able to handle the loads that Segment may stream.
  • Data resiliency – IronMQ is persistent with FIFO and one-time delivery guarantees, ensuring that data will never be lost.
  • Data processing – Adding IronWorker to the end of the queue can provide you with scalable real-time event processing capabilities.

A Data Pipeline into ElasticSearch

ElasticSearch is an open source, distributed, real-time search and analytics engine. It provides you with the ability to easily move beyond simple full-text search to performing sophisticated data access, collection, indexing, and filtering operations. ElasticSearch is being used by some of the largest businesses in the world and is growing at a rapid pace. You can read about the many customer use cases with ElasticSearch here.

Segment itself does a great job of collecting data and sending to individual services. Many users, however, will want to perform additional processing on the data before delivering it to a unified database or data warehouse such as ElasticSearch. Some example uses could be for building out customer dashboards or internal analytics tools. With ElasticSearch at the core, you can translate events into actionable data about your customer base. Business intelligence in today's environment is driven by real-time searchable data, and the more you can collect and translate, the better.



ETL Pipeline Instructions : Step-By-Step

The following tutorial assumes that you've installed Segment into your website. From here we'll walk through switching on the IronMQ integration and then running IronWorker to transform the data before loading into ElasticSearch.
Copy your Iron.io Credentials from the Dashboard 

1. Connecting IronMQ to Segment

With Segment and Iron.io, building a reliable ETL pipeline into ElasticSearch is simple. The initial extract process, often the origin of many headaches, is already handled for you by piping the data from Segment to IronMQ.

Flipping on the IronMQ integration within the Segment dashboard automatically sends all of your Segment data to a queue named "segment". All you need to do to initiate the process is create a project within the Iron.io HUD and enter the credentials within Segment.

Enter Your Iron.io Credentials into Segment

2. Transforming the Data

Now that we have our Segment data automatically sending to IronMQ, it's time to transform it prior to loading into ElasticSearch.

Let's say we want to filter out identified users based on their plan type so only paid user data gets sent to ElasticSearch. In the case of building customer dashboards, this allows us to maintain a collection of purely usable data, making our indexing and searching more efficient for the eventual end use case.

We're going to create a worker to pull messages from the queue in batches, filter the data, and then load into an ElasticSearch instance. For simplicity's sake, we're going to create a Heroku app with the Bonsai add-on, a hosted ElasticSearch service. Leveraging IronWorker's scheduling capabilities, we can check for messages on the queue at regular intervals.

With IronMQ and IronWorker, we can also ensure that we're not losing any data in this process, and that we're not overloading our ElasticSearch cluster with too much incoming data. Buffering and, buffering and...

Segment Data
Before we get to our worker, let's examine the data from Segment that gets sent to the queue. Segment is vendor agnostic, making it very simple to interact with the exported data. The Tracking API that we'll be working with consists of several methods: identify, track, page, screen, group, and alias. You can dive into the docs here. We use the Ruby client within the Iron.io website to see where our users are going. Any page we want to track, we just place this line on the Controller.

Analytics.track( user_id: cookies[:identity], event: "Viewed Jobs Page" )

Here is a typical basic identify record in json format... parsed, filtered, and prettied. This is plenty enough for us to make our transformation before loading into ElasticSearch.

{
 “action”  : “Identify”,
 “user_id” : “123”,
 “traits”  : {
  “email”        : “ivan@iron.io”,
  “name”         : “Ivan Dwyer”,
  “account_type” : “paid”
  “plan”         : “Enterprise”,
 },
 “timestamp” : “2014-09-10-02T00:30:00.276Z”
}


Worker Setup
Now let's look at creating the task that performs the business logic for the transformation. With IronWorker, you create task specific code within a "worker" and upload to our environment for highly scalable background processing. Note that this example uses Ruby, but IronWorker has support for almost every common language including PHP, Python, Java, Node.js, Go, .NET, and more.

For this worker, the code dependencies will be the irommq, elasticsearch, and json gems. It’s a quick step to create the .worker file that contains the worker config and we'll put the Bonsai credentials in a config.yml file.


Our worker in this example will be very simple as a reference point. With the flexible capabilities of IronWorker, you are free to build out any transformations you'd like, and load to any external service you prefer. The Segment queue can grow rapidly (because traffic), so we'll want to schedule this worker to run every hour and clear the queue each time. If your queue growth gets out of hand, you can create a master worker that splits up the processing to slave workers that can run concurrently, significantly cutting down the total processing time. Just another reason keeping the ETL process within Iron.io is the way to go.

Once we initiate both the IronMQ and ElasticSearch clients, we can make our connections and start working through the queue data before loading to Bonsai. We know from our Segment integration that the queue is named "segment", and that our paid users are tagged with an "account_type" trait. This allows us to easily loop through all the messages and check whether or not each record meets our requirements. Post the data to Bonsai and then delete the message from the queue. Pretty simple.


3. Upload and Schedule our Worker

Now we can use the IronWorker CLI tool to upload our code package.

$ iron_worker upload segment

Add a New Scheduled Task
Once the code package is uploaded and built, we can go to the HUD and schedule it to run. Select the "SegmentWorker" code package and set the timing. Every hour at regular priority seems fine for this task as this is a background job with no time sensitivity. What's that mem1 cluster you ask? Why that’s our new high memory worker environment meant for heavy processing needs.

Now our worker is scheduled to run every hour with the first one being queued up right away. We can watch its progress in the Tasks view.

View Task Status in Realtime

Once it's complete, we can check the task log to see our output. Looks like our posting to ElasticSearch was successful.

Detailed log output

Just to be sure, let's check our Bonsai add-on in Heroku. Looks like we successfully created an index and populated with documents. Now we can do what we like within ElasticSearch.
Bonsai add-on within Heroku
There you have it. With the Iron.io integration on Segment, you can build your own ETL pipeline any way you'd like.




Get Running in Minutes

To use Iron.io for your own ETL pipeline, signup for a free account (along with a trial of advanced features).

Once you have an Iron.io account, head over to Segment and flip the switch to start sending your data to IronMQ.

With the Segment and Iron.io integration in place, you can branch the data to other locations as well as tie in IronWorker to transform the data. Let us know what you come up with... we'll write about it and send t-shirts your way!

Monday, September 22, 2014

New FFmpeg IronWorker Stack For Easy Video Processing



FFmpeg is the leading cross-platform solution to record, convert and stream audio and video. Dealing with audio and video can eat up resources, making the activity a great fit for IronWorker by moving the heavy lifting to the background.

In the past, usage of FFmpeg with IronWorker would require that our users include and install the dependency within each worker environment. In order to streamline that process for developers, we've included FFmpeg in an IronWorker stack as a custom runtime environment specifically meant for video processing.

The possibilities are endless with the flexibility of FFmpeg and the processing power of IronWorker. Here are a few examples we've come across in working with our users, which will give you a baseline for the capabilities.

  1. Format Encoding
  2. Audio Normalization
  3. Audio/Video Optimization
  4. Metadata Analysis
  5. DRM Encoding/Decoding
  6. Screencapture Production
  7. Resize, Reformat and Crop Video
  8. Change Aspect Ratio

$ ffmpeg -i input.mp4 output.avi

Specifications:

pre-built libraries: ffmpeg-2.3, GPAC-0.5.1, , x264-0.142.x
supported runtimes: php-5.3, node-0.10, ruby-1.9.3p0, python-2.7
Find out more detailed info about the FFmpeg stack here.

To use, simply include "ffmpeg-2.3" in your .worker file using the stack option:

stack "ffmpeg-2.3" 

Here are a few examples of how to use the FFmpeg stack in the supported languages.

Using Ruby runtime + FFmpeg stack


Using Node.js runtime + FFmpeg stack


Using Python runtime + FFmpeg stack


Using PHP runtime + FFmpeg stack


More flexibility for our developers
Making deployment and dependency management painless is a top priority for our team. Supporting a diverse range of languages, frameworks, and packages provides our users what they need to make their Iron.io implementation successful.

We'd love to hear feedback or to even feature tutorials written by you! send me a message at: stephen@iron.io.

Friday, September 19, 2014

Orchestrating PHP Dependencies with Composer and IronWorker



Package your dependencies on IronWorker using composer
This is a tutorial describing how to include and use the PHP package management tool Composer with IronWorker.

Composer is a tool for dependency management in PHP. It allows you to declare the dependent libraries your project needs, and it will install them in your project for you. Packagist is the main Composer repository. It aggregates all sorts of PHP packages that are installable with Composer.

Installing Using Composer locally

1. run  composer install command (this downloads composer.phar file locally)
$ curl -s https://getcomposer.org/installer | php
2. define packages and versions in a composer.json file
{
    "require": {
        "vendor/package": "1.3.2",
        "vendor/package2": "1.*",
        "vendor/package3": ">=2.0.3"
    }
}
3. run the installation command
$ php composer.phar install
4. your packages will be installed in a /vender folder locally and can be loaded in your php script via
require 'vendor/autoload.php';

Using Composer on IronWorker

Its nearly as simple to do the same in a IronWorker.

1. include both the local compser.phar and composer.json in your .worker manifest. Also include a build script for us to run on our servers.
runtime "php"

file "composer.phar"
file "composer.json"
build "php composer.phar install"

exec "my_script.php

2. upload via our command line tool
$ ironworker upload <name_of_workerfile>
This is what you will see when running "iron_worker upload test.worker

BOOM! its that simple! Your packages will be built on our servers and accessible via to your script.

Composer growth!

With 38,464 packages registered and close to 35,000,000 packages installed each month, it is quickly becoming the standard in PHP package management by popular frameworks such as LaravelSymfonyYiiZend, and more. We also heard recently that EngineYard a leader in application management is sponsoring Composer with a 15,000 community grant!





We are excited to see the growth that Composer has received thus far, and are look forward to seeing our own users take advantage of this wonderful tool and the IronWorker platform together.

Bonus: VersionScout.co

Props go out to Van der Stock, an awesome fan of IronWorker is actually using IronWorker to relay updates to users when their composer dependencies are out of date! Get updates when your Composer dependencies are out of date! send him regards at @dietervds






More flexibility for our developers
Making deployment and dependency management painless is a top priority for our team. Supporting a diverse range of languages, frameworks, and packages provides our users what they need to make their Iron.io implementation successful.

We'd love to hear feedback or to even feature tutorials written by you! send me a message at: stephen@iron.io.

Wednesday, September 10, 2014

How Cine.io Uses Node.js and IronWorker to Handle Their Background Processing


The following is a guest blog post by Thomas Shafer describing how cine.io deploys their workers within Iron.io to handle all of their background processing.

Cine.io is the only true developer-focused live streaming service. We offer APIs and SDKs for all mobile platforms and language frameworks and let developers build and ship live-streaming capabilities with just a few simple steps.

We have many background tasks to run and use Iron.io for all of our long-running and resource-intensive processing. These types of tasks include video stream archival processing, customer emails and log processing, and bandwidth calculations.

Over the course of working with the IronWorker service, we developed a few approaches to make it easy for us to integrate distributed processing into our Node.js application framework and maintain consistency with our main service across the full set of workers. The primary strategy is to use a single shared worker space.

Single Shared Worker Space

As an example of the approach we use, when we process logs from our edge servers, we need to gather log file data and attach each bandwidth entry to a user's account. To do this, we need credentials to access multiple remote data sources and follow some of the same logic that our main cine.io dashboard application uses.

To maintain logical consistency and a DRY architecture, we upload many shared components of our dashboard application to IronWorker. Our dashboard application shares code with the central API application to ensure logical consistency – as Wikipedia puts it "a single, unambiguous, authoritative representation within a system".

Because some of our background tasks require shared code between our dashboard application and our API, we decided to structure our IronWorker integration with a single .worker class, titled MainWorker.

MainWorker Serves as the Primary Worker

We use one worker to perform a number of tasks and so it needs to be flexible and robust. It needs to be able to introspect on the possible jobs it can run and safely reject the tasks it cannot handle. One way to make it flexible is to unify the payload and reject any attempts to schedule a MainWorker that does not follow the expected payload format.

A good way to enforce a predictable format is to, once again, share code. Whether it's the dashboard, API, or another MainWorker, they all use the same code to schedule a job.

Our MainWorker payload follows the following format:

 {
      configuration: {
        // configuration holds sensitive variables
        // such as redis credentials, cdn access codes, etc.
      },
      jobName: "",
        // The name of the job we want to run
        // MainWorker understands which jobs are acceptable
        // and can reject jobs and notify us immediatly on inadequate jobNames
      source: "",
        // source is the originator of the request.
        // This helps prevent unwanted scheduling situations.
        // An example is preventing our API application
        // from scheduling the job that sends out invoices at the end of the month.
        // That job is reserved for IronWorker's internal scheduler.
      jobPayload: {
        // the payload to be handled by the job, such as model ids and other values.
      }
    }

The jobs folder we uploaded contains the code for every specific job and is properly included by the MainWorker, which is written in node. Here's a look at the .worker file for MainWorker

Example of cine.io's MainWorker.worker file 


runtime 'node'
stack 'node-0.10'
exec 'main_worker.js'
dir '../models'
dir '../config'
dir '../node_modules'
dir '../lib'
dir '../jobs'
dir '../main_worker'
name 'MainWorker'

Benefits of Our Approach

After working with this setup for a while I'm convinced the advantages of a single shared space is the way to go.

Continuous Iron.io Deployment

By throwing our IronWorker jobs into the same codebase as our API and dashboard application, I know our logic will be consistent across multiple platforms. This allows us to integrate IronWorker with our continuous integration server. We can update every platform simultaneously with the most up-to-date code. With this approach, there is no way that one-off untested scripts can make their way into the environment. We update code on Iron.io through our CI suite and it's up to the developer, code reviewers, and our continuous integration server to validate our code. Everyone has visibility into what is on the Iron.io platform.

Consolidated Reporting

By running all of our jobs through the MainWorker, we know each new worker will gather metrics and handle error reporting out of the box. We don't need to figure out how each new Iron.io worker will handle errors, what the payload will look like, etc. Enforcing a single convention leads to us focusing on the internal logic of the jobs and getting things shipped.

Flexible Scheduling

The job payload has a rigid structure but we can share the library for scheduling jobs. That library will be responsible for sending the appropriate structure with the necessary configuration variables, jobName, source, and jobPayload.

One Drawback to the Approach

There is a drawback with using a single shared space for our workers. When we look at jobs, whether running or queued, all we see is "MainWorker, MainWorker, MainWorker". We cannot use the dashboard to tell which jobs are taking a long time and therefore lose some of the usefulness of the Iron.io dashboard. (Note: If IronWorker were to allow tags or addition names that would go along way towards giving us visibility. I hear it's on the roadmap so let's hope it makes in it sometime soon.)

Conclusion

Deploying a shared environment to Iron.io has enabled our development team to focus on delivering customer value in a rapid and high quality manner. We can easily test our job code, ensure Iron.io has the most up to date code, and handle fixing any production errors promptly.


About the Author
Thomas Shafer is a co-founder of cine.io, the only developer-focused live streaming service available today. He is also a founder of Giving Stage, a virtual venue that raises money for social and environmental change. (@cine_io)




To see other approaches to designing worker architectures, take a look at how Untappd uses a full set of task-specific workers to scale out there background processing in this post. Also, be sure to check out this article on top uses of IronWorker.

Tuesday, September 9, 2014

Message Queues for Buffering : An IronMQ and Python Case Study

Using IronMQ and Python to as a Buffer between Systems
Connecting systems and moderating data flows between them is not an easy task. Message queues are designed for just this purpose – buffering data between systems so that each can operate independently and asynchronously.

Here's a short article on using IronMQ to buffer a CMS system from real estate data from a listing service. The app developer uses Python for the bridge code between the service and the CMS system.

Here's an excerpt from the post:
Building a System with IronMQ and Python 
One of my most recent projects was writing a system to deliver real estate listing data to a content management system. Since the listing data source was bursty and I wasn’t sure how the CMS would handle the load, I decided to use a message queue, where the messages would have a JSON payload. Message queues are great at decoupling components of a system. 
For the queue, I used IronMQ. The company already was using it, it has a free tier (up to 24 messages a second), the service has been stable and reliable, it has great language SDKs, and setting up a durable message queue is something I’d rather outsource...
I wrote the bridge code from the listing database to the message queue in python. The shop was mostly Java and some Python, and Python seemed a better fit for a small ‘pull from here, push to there’ application... 
[F]or this kind of problem, Python was a great solution. And I’ll reach for IronMQ any time I need a message queue. This pair of technologies was quick to implement, easy to deploy, and high performance wasn’t really a requirement, since the frequency of the listing delivery was the real bottleneck.
          Read the full post >> 

About the Author

Dan Moore is a developer of web applications. Since 1999, Dan has created a variety of web sites and web applications, from ecommerce to portal to user self-service. He is competent in PHP, Java, perl, SQL, GWT, and object oriented design. 




For other ways that you can use message queues, check out the following post from our archives.

Top 10 Uses For A Message Queue
We’ve been working with, building, and evangelising message queues for the last year, and it’s no secret that we think they’re awesome. We believe message queues are a vital component to any architecture or application, and here are ten reasons why:

See more >>

Monday, September 8, 2014

A Better Mobile Compute Cloud : NodeJS + Iron.io (repost from ShoppinPal)

There are number of tools for creating mobile apps but the one area that can be challenging is handling the background processing that takes place within mobile applications. 

A popular mobile app, ShoppingPal, is using Iron.io to handle its background processing needs with great results. They wrote a recent post on their success in moving from Parse to Iron.io.

Here's an excerpt:
We faced a challenge where incoming inventory updates for products weren’t processed in real time via parse triggers anymore... It was clear we weren’t going to grow if we stuck with Parse (background jobs) for our next set of retailers. 
ShoppingPal + Iron.io : Better Background Processing
That’s when we ran into Iron and what a lucky coincidence that was! 
They had queuing, they had workers, they had a default queue attached to their workers, they had public webhooks that would allow posting directly into a worker’s own queue. 
We haven’t looked back since and if you’re finding worker based queuing and execution becomes a beast for your project then slay it with Iron.  
          Read more >>

About the ShoppingPal

ShoppingPal provides mobile commerce capabilities that lets local retailers and online web sites offer state-of-art mobile storefronts.

About the Original Author


Pulkit Singhal is a co-founder and CTO of ShoppingPal. He wears many hats ranging from UI design and development to optimizing back-end architecture. He is an avid blogger on technical subjects and is active in a number of open source communities and forums. (#Pulkit)





For another story on using Iron.io as a mobile compute cloud check out out post on the widely popular Untappd mobile app.


How One Developer Serves Millions of Beers: Untappd + Iron.io
Untappd provides a mobile check-in application for beer lovers. Their application has been downloaded by over a million users and on any given night, they can register over 300,000 individual checkins and social sharing events...

Read more >>

Thursday, September 4, 2014

Iron.io hosting CoreOS meetup – Speakers include Brandon Philips from CoreOS and Sam Ward from Iron.io

Iron.io will be hosting a CoreOS meetup on this Monday, Sept. 8th. Brandon Philips, CTO of CoreOS, will be a speaker as will representatives from DigitalOcean and Citrix.

Sam Ward, Senior Ops Engineer at Iron.io, will also be giving a talk. We're at the early stages of using CoreOS but we're liking what we're seeing. Here is a description of what he'll cover:

8:00 - 8:30 pm
Iron.io + CoreOS  
Sam Ward will give an overview of Iron.io's evolving operations environment and how CoreOS fits these requirements. He'll comment on the technical merits of CoreOS and discuss Iron.io's use of Docker to ship and manage their cloud service apps and worker environments.  He will also discuss consistency, high availability, and fault tolerance as factors of application design, and how CoreOS makes it easy to bake these properties into your application. 




CoreOS Meetup

Date:  Monday, September 8, 2014
Time: 6-9pm
Location: Heavybit Industries, 325 9th Street, San Francisco, CA

More Details: CoreOS + DigitalOcean Meetup