How to Reduce a 9 Hour Job into a 9 Minute Job

A common problem developers have is how to run through a large number of tasks in a short amount of time. For instance, sending out a nightly notification to all your users, calculating your user's bills every month or updating their Facebook graph data. These types of things are very common and almost every application needs to do it and it becomes more and more of a problem as your user base grows.

I was just helping a customer today that was using Heroku workers at full worker capacity (24) and it was taking ~9 hours per day to go through his user database of 200,000 users and send out notifications. A quick calculation explains why:

((200,000 users * 3.5 seconds per task) / 3600 seconds/hour) / 24 workers processes = 8.1 hours

We took it down to less than 9 minutes. Here's how:

The easy (naive?) way would just be to queue up 200,000 tasks on IronWorker. This would have worked just fine, but it would have taken a while just to queue up the tasks and the setup/teardown time of a task would be wasteful when the task only takes 3.5 seconds to run. Instead, we had each task process 100 users which is now only 2000 tasks. Each task should take approximately 3.5 * 100 = 350 seconds = ~6 minutes to run and we can run them all at pretty much the same time, but I'll be conservative and adjust a bit for IronWorker capacity.

((200,000 users * 3.5 seconds) / 3600 seconds/hour) / 2000 worker processes * 1.5 capacity adjustment = 0.146 hours = 8.75 minutes

Batching up the tasks into batches of 100 was easy, here's sample code:

IronWorker gives you super simple access to huge compute power and you don't even need to think about a server. This customer will never have to worry about scaling this part of his application again.

9 Comments

  1. blank ismael on March 26, 2012 at 9:24 pm

    In my calculations this way saves them about $340 in workers, is it true?

  2. blank Travis Reeder on May 2, 2012 at 5:27 pm

    Hi Ismael, could you share your calculation?

    While it’s running, the cost would be the same, but with IronWorker, you would only pay for the time the workers are actually running, no more, no less. With Heroku you’d have to pay for the 9 hours of time and then be sure to turn off all your workers every day after they were done or pay $827 per month to keep them running.

    IronWorker would be ~$10 per nightly run (2000 * 6 / 60 * $0.05)

  3. blank Sunny Gleason on June 4, 2012 at 9:09 pm

    Are there always a multiple of 100 users? If not, it seems like this code might be missing the part to enqueue the final tail remainder…

  4. blank Travis Reeder on June 4, 2012 at 9:32 pm

    Hi Sunny,

    100 users in the example above is arbitrary. It’s your code and your choice to do it however you want, you could do 1 user per worker or 1000 per task if you wanted.

    The final task would be whatever is left over. If you had 200,001 users, then the final task would just do the work for 1 user. Again, it’s your code and you create the payload for each task so it would be totally up to you.

  5. blank IPDb Developers on November 10, 2012 at 8:23 pm

    Can you provide some insight as to how to choose batch size for each IronWorker? That is – if my task has a set up time of X and a run time of Y, how many tasks should I send to each IronWorker?

  6. blank Travis Reeder on November 12, 2012 at 10:05 pm

    We recommend a run-time of at least 30 seconds for each task so that would be a good starting point. Really, the more you can do in a single task, the more efficient it is because you amortize the setup/teardown time of your worker (loading, making db connections, etc). But at the same time, there is value in having short quick tasks, for instance if task errors out, it’s easier to debug and retry it. Also, you don’t have to wait forever to ensure that it worked (or didn’t).

    So as a rule of thumb, greater than 30 seconds, but less than 5 minutes is a good place to be.

  7. blank Trevor on December 4, 2012 at 4:14 pm

    You could use `find_in_batches`and avoid your custom batching:
    https://apidock.com/rails/ActiveRecord/Batches/find_in_batches

  8. blank Unknown on January 1, 2013 at 6:49 pm

    Wait a minute, that is 8.75 minute per worker process. So if you have 2000 worker process in parallel, ironworker will charge you for 8.75*2000*0.075/60 = $21.82

    true?

  9. blank Travis Reeder on January 1, 2013 at 9:31 pm

    Hi Unknown, worker hours are based on $0.05 per hour, not $0.75 (that’s the overage price if you exceed your plan limits) and the actual running time of each task was ~6 minutes (“Each task should take approximately 3.5 * 100 = 350 seconds = ~6 minutes to run”) so it would be:

    2000 * 6 / 60 * $0.05 = $10

Leave a Comment





This site uses Akismet to reduce spam. Learn how your comment data is processed.