Best Practices: Scalable Image Processing

They say a picture is worth a thousand words. Coincidentally, processing an image is about a thousand times more demanding than processing text. As more and more developers look to image processing solutions like ImageMagick to power social, retail, and mobile apps, they will run into many challenges as they try to get it to run efficiently and to scale.

We recently launched a Solutions section on ways to use Iron.io services to do some amazing things. This is the first of a series of posts on the use cases shown there.  


Static/Serial Processing
Developers doing image processing can take a few routes; the first is to set up ImageMagick on the main app servers and run the processing within the request thread. This isn’t an efficient way of doing things (and in fact it’s making us cringe as we write this). The next evolution is to move the processing to a standalone environment. Which means binaries to install, servers to launch, and infrastructure to manage.

While it isn’t difficult to set up on a one-time basis, a static environment has limitations on the workload it can handle. As the amount of image processing increases, so will the need for additional infrastructure—otherwise the stream of images that normally took minutes could take hours to complete. This may be OK for a side project, but it can be brutal for a startup or company with revenue or reputation on the line.

Continuing to add ops time to scale an image processing setup is only a short-term fix. It’ll be costly in terms of development time—not to mention the opportunity cost of not being able to do more important things. The best way to overcome this scaling challenge is to leverage a hosted solution that distributes the workload across an elastic set of servers. Servers that don’t need to be set up, managed, scaled, or taken down.

Switching to a bigger, faster system to process your images is only a short-term fix and can be quite costly. 

Parallel Processing
Iron.io’s services are designed specifically to get you to scale fast. IronWorker can release an army of workers that distribute the workload across a massive set of servers. A set of images that might take hours to process can now be done in minutes because they’re running in parallel.

Projects that would normally take 10 hours to complete will now only take 10 minutes. 

Here are the steps an image processing worker might take:

blank

As you can see, the worker isn’t complicated. You use an object store like S3 to store the raw images and send the image ID to the worker. The worker fetches the image, manipulates it using the ImageMagick libraries, and then places it back in object storage or posts it wherever you need it.

Generating a Thumbnail
As an example, this code segment might be included in a worker to generate a thumbnail. (You can see the full example on GitHub here.)

def generate_thumb(filename, width=nil, height=nil, format='jpg')
output_filename = "#{filename}_thumbnail_#{width}_#{height}.#{format}"
image = MiniMagick::Image.open(filename)
image.combine_options do |c|
  c.thumbnail "#{width}x#{height}"
  c.background 'white'
  c.extent "#{width}x#{height}"
  c.gravity "center"
end
image.format format
image.write output_filename
output_filename
end

Serverless/Scalable Image Processing
The magic doesn't come necessarily in the worker (although the ImageMagick libraries are pretty amazing) but in being able to do this elastically and — even better, serverless. With a hosted service to scale out processing, you can process images whenever you want, use a scheduled job to run through a database on a regular basis, or even use a message queue like IronMQ to process them continually.

We’re dedicated to helping startups and rapidly-growing companies do big things without a lot of work. By using a hosted, massively parallel solution for your image processing, you set yourself up from day one to to scale. Not a bad way to go especially because you never know when you might have to go big.

To learn more about using IronWorker and ImageMagick, check out the examples on GitHub.

2 Comments

  1. blank mxx on June 8, 2012 at 9:12 pm

    I’d like to point out that S3 is not a block store service. EBS is a block store. S3 is an object store, commonly used for files, but doesn’t have to be.

  2. blank Ken Fromm on June 8, 2012 at 10:11 pm

    @mxx Thanks. That was my bad. Not sure how that crept in. Change made. Much appreciated.

Leave a Comment





This site uses Akismet to reduce spam. Learn how your comment data is processed.