Blog

Search This Blog

Loading...

Tuesday, August 27, 2013

Go After 2 Years in Production

After running Go for two years in production at Iron.io, I wanted to share our experience/feelings about it. We were one of the first companies to use Go (golang) in production and we didn't know what to expect in the long run, but so far, so great.

I talked a little about about this in a previous post about switching to Go from Ruby, but this will go into specific things that we love about the language, the things we learned along the way. In no specific order, here they are:

  • Performance
  • Memory
  • Concurrency
  • Reliability
  • Deployment
  • Talent


Performance


When we were first deciding on what language we were going to use, we did some research and created some mock tests for our use case, a message queue. I wrote a partial beanstalkd clone in Go that spoke the beanstalkd protocol so I could use existing clients to test it. Go performed very well, almost the same as the official C version (and it was surprisingly easy to write).

You can find benchmarks comparing Go to a bunch of other languages at the Computer Language Benchmarks Game site. The graph below compares it to Java (which would probably be the language we'd be using if Go didn't exist):




That benchmark shows that Go is a bit faster in a few cases, but slower in the rest. Still, not too bad for a language that is only a few years old. I have no doubt it will catch up given some time. You can also see that the memory footprint is substantially better though. 

Just for fun, check out the differences between Go and Ruby below, it’s insanely better in both performance and memory usage. 




More

Two years in, Go has never been our bottleneck, it has always been the database.



Memory


Go doesn't have a virtual machine or interpreter to load up, so it starts fast and small. IronMQ starts running in ~6444 KB of resident memory including loading configs, making connections, etc. After it’s been running for a while, it increases due to caching and what not. Right now, our production servers are running at ~400 MB (which I realize is somewhat irrelevant as this will depend on your application). 

Two years in, we've never had a memory leak or problems related to memory.


Concurrency


Concurrency is a huge part of Go with some high level constructs that make it a breeze to use. I used Java for many, many years and got very comfortable with the java.util.concurrency package, which is a pretty nice set of concurrency tools, but it's just not the same as Go in terms of simplicity or the underlying implementations. Go has goroutines for concurrent operations and channels for communicating between them. Goroutines are particularly interesting:

"Goroutines are part of making concurrency easy to use. The idea, which has been around for a while, is to multiplex independently executing functions—coroutines—onto a set of threads. When a coroutine blocks, such as by calling a blocking system call, the run-time automatically moves other coroutines on the same operating system thread to a different, runnable thread so they won't be blocked. The programmer sees none of this, which is the point. The result, which we call goroutines, can be very cheap: unless they spend a lot of time in long-running system calls, they cost little more than the memory for the stack, which is just a few kilobytes. To make the stacks small, Go's run-time uses segmented stacks. A newly minted goroutine is given a few kilobytes, which is almost always enough. When it isn't, the run-time allocates (and frees) extension segments automatically. The overhead averages about three cheap instructions per function call. It is practical to create hundreds of thousands of goroutines in the same address space. If goroutines were just threads, system resources would run out at a much smaller number." Source


One thing we had to do here was limit concurrency to make sure we didn't overload our database or other services during spikes. We did this with a simple semaphore using Go channels.



Reliability


Reliability in a language is kind hard to quantify, but we've found our Go applications to be very robust. I don't think we've had a failure/crash that wasn't related to some external problem (read: database or some bad library). In general though, there are very high quality open source libraries out there for Go. We've found no memory leaks and no major core library bugs.

I even find that our code is higher quality just due to the fact that it's written in Go. I'm not entirely sure why this is, but I get a warm and fuzzy feeling about our stuff written in Go. Perhaps it's the very strict compiler that even forces us to remove imports and variables that aren't in use. Perhaps it's the small amount of code you have to write to get a lot done. Maybe I'll figure this out and write more about it some day.


Deployment


Go compiles into a single, static binary file so deployment is simply putting the file on a server and starting it up. No dependencies required. No runtime required (you don't need to install Go on the server). And it's small; the IronMQ binary is ~6MB.


Rolling Back

If something goes wrong after deploying and you need to roll back, you can just stop the bad process then start the previous binary. You don't need to worry about a dependency being upgraded since the entire program is compiled into a single binary.


Talent


We took a big risk choosing Go when there wasn't a lot of people that knew the language, let alone people who had heard of it. We were the first company to post a Go job on the golang nuts mailing list and we were taken aback by the quality of people that applied. We received applications from developers at some of the top tech companies with tons of experience, some with PhD's, working on some hardcore projects. Most weren't programming full-time in Go, but had worked with it enough to be proficient and could transfer their experience and knowledge over. I'm not sure it even mattered what we were trying to build, they just wanted to work with Go.

Our first Go hire was one of the core Golang developers, +Evan Shaw, who has been with us ever since.


Conclusion


I can confidently say that after two years working with Go, we made the right choice. If we had started Iron.io today, it would have been a no brainer to choose it. A lot of other companies are using it now too including Heroku and Google and the people I talk to about it all have similar opinions. +Rob Pike, one of the creators of Go said:
“We realized that the kind of software we build at Google is not always served well by the languages we had available,” Pike said in 2011. “Robert Griesemer, Ken Thompson, and myself decided to make a language that would be very good for writing the kinds of programs we write at Google.”

+Derek Collison, the founder of Apcera, said recently in a Wired article:
“The management layers and infrastructure layers of the newer technologies that provide this cloud delivery model?” he tells Wired. “Within two years, a majority will be written in Go.”
Is Go the next gen language we've been waiting for. It's a bit too early to say, but it's certainly off to a good start.




-------

If you're in the bay area and interested in learning about Go, come join us at the next GoSF meetup.

Feel free to contact/follow me on g+: +Travis Reeder .

UPDATE: Lots of discussion at Hacker News and Reddit.

Friday, August 23, 2013

Event Handling with .NET, RaventDB, and IronMQ

The other day we blogged about a map-reduce contribution from the Iron.io community. Here’s another great contribution from an Iron.io user which highlights handling commit transactions within a database all the while coordinating downstream events using a message queue. 


Jef Claes is a developer in Belgium that has written a few things about IronMQ in the past (here and here). His latest blog post addresses maintaining eventual consistent domain events using RavenDB and IronMQ. It’s a short post but a good rundown on how to maintain consistency when transactions involved multiple distributed systems.

About the Technologies

RavenDB is a transactional, open-source Document Database written in .NET, offering a flexible data model designed to address requirements coming from real-world systems.


IronMQ is a scalable cloud-based message queuing service, designed for building distributed cloud applications quickly and operating at scale. It provides and easy mechanism to control work dispatch, load buffering, synchronicity, database offloading, and many other core needs.


–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––









One project I'm working on right now makes use of domain events. As an example, I'll use the usual suspect: the BookingConfirmed event. When a booking has been confirmed, I want to notify my customer by sending him an email.

I want to avoid that persisting a booking fails because an eventhandler throws - the mail server is unavailable. I also don't want that an eventhandler executes an operation that can't be rolled back - sending out an email - without first making sure the booking was persisted successfully. If an eventhandler fails, I want to give it the opportunity to fix what's wrong and retry.

Get In Line
Code Example from the Post
The idea is, instead of dealing with the domain events in memory, to push them out to a queue so that eventhandlers can deal with them asynchronously. If we're trusting IronMQ with our queues, we get in trouble guaranteeing that the events aren't sent out unless the booking is persisted successfully; you can't make IronMQ enlist in a transaction.

Avoiding False Events
To avoid pushing out events, and alerting our customer, without having successfully persisted the booking, I want to commit my events in the same transaction. Since IronMQ can't be enlisted in a transaction, we have to take a detour; instead of publishing the event directly, we're going to persist it as a RavenDB document. This guarantees the event is committed in the same transaction as the booking.

Getting the Events Out
Now we still need to get the events out of RavenDB. Looking into this, I found this to be a very good use of the Changes API. Using the Changes API, you can subscribe to all changes made to a certain document. If you're familiar with relation databases, the Changes API might remind you of triggers - except for that the Changes API doesn't live in the database, nor does it run in the same transaction. In this scenario, I use it to listen for changes to the domain events collection. On every change, I'll load the document, push the content out to IronMQ, and mark it as published.

A Back-up Plan
If the subscriber goes down, events won't be pushed out, so you need to have a back-up plan. I planned for missing events by scheduling a Quartz job that periodically queries for old unpublished domain events and publishes them.

In Conclusion
You don't need expensive infrastructure or a framework to enable handling domain events in an eventual consistent fashion. Using RavenDB as an event store, the Changes API as an event listener, and IronMQ for queuing, we landed a rather light-weight solution. It won't scale endlessly, but it doesn't have to either.

–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––

More on Patterns in Distributed Processing

The post does a great job of highlighting issues that arise building transaction-based systems using distributed cloud technologies. While you get tremendous ease of use and agility, developers do need to take into account the sequence and order of committing events and the ability to make sure that downstream events are only processed after the primary events have been recorded.

Message queues can be essential components for this but any production-ready solution also means having checks in place along with solid exception handling and good back-up plans. Stay tuned for more posts on distributed processing patterns in the future.


Notes on the Post

  • The latency hit mentioned in the article can be almost all attributed to the cross-atlantic message transit. Doing the processing in the same or nearby datacenters as the message queue will cause message transit to go into single ms territory. IronMQ is currently in AWS-East and Rackspace-ORD (and coming to a datacenter near you). 
  • And as for scaling the solution, the gating factor is more with database inputs than with IronMQ. IronMQ is designed to handle very large throughputs and so it should handle whatever is thrown at it.


Thursday, August 15, 2013

Map-Reduce Capabilities and Super Easy Concurrency (via Alan deLevie and IronResponse)

We came across a great contribution the other day from Alan deLevie that makes using IronWorker for a map-reduce pattern even easier than it already is. (Love seeing tweets announcing additions to the growing list of Iron.io community addons.)
IronWorker is a cloud-based on-demand service that out-of-the-box lets you do massively concurrent processing across slices of data – which is essentially the core of the map reduce pattern. (Here's a good visual explanation of map reduce in action.)

IronResponse adds a veneer (in the form of a simple Ruby gem) that more closely mirrors the map-reduce interface. It lets you abstract away the actual queuing of tasks and management of the data. All you have to do is pass service credentials and the data and then IronResponse will manage the tasking and data storage within IronWorker.
IronResponse + IronWorker + S3 = Simple/Powerful Map Reduce
Alan does a great job explaining how IronResponse works and how to use it. Rather than try to replicate it, we want to include a portion here and then refer you to the GitHub repo for the full details. (Note that it’s super easy to get up and running.)

–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
IronResponse on GitHub

IronResponse

IronResponse glues together IronWorker and AWS S3 to provide a response object to remote worker scripts. This allows you to write massively concurrent Ruby programs without worrying about threads.

Rationale

Iron.io's IronWorker is a great product that provides a lot of powerful concurrency options. With IronWorker, you can scale tasks to hundreds and even thousands of workers. However, IronWorker was missing one useful feature for me: responses.

What do I mean by that? In the typical IronWorker setup, worker files are just one-off scripts that run independently of the client that queues them up.

For example:
client = IronWorkerNG::Client.new
100.times do |i|
  client.tasks.create("do_something", number: i)
end

For many use cases, this is fine. But what if I want to know the result of do_something?

A simple way to get the result would be for your worker to POST the final result somewhere, then have the client retrieve it. This gem simply abstracts that process away, allowing the developer to avoid boilerplate and to keep worker code elegant.

Here's how you would interface with IronResponse for running map-reduce across an is_prime function:
require "iron_response"

config = {...}
batch = IronResponse::Batch.new

batch.auto_update_worker = true
batch.config[:iron_io]   = config[:iron_io]
batch.config[:aws_s3]    = config[:aws_s3]
batch.worker             = "test/workers/is_prime.rb"
batch.params_array       = Array(1..10).map {|i| {number: i}}

results                  = batch.run!

p results
#=> [{"result"=>false}, {"result"=>true}, {"result"=>true}...]
IronResponse adds a simple interface to IronWorker to make map-reduce patterns even simpler. 

Under the hood, iron_response uses some functional and meta-programming to capture the final expression of a worker file, convert it to JSON, and then POST it to Amazon S3. When all the workers in an IronResponse::Batch have finished, the gem retrieves the file and converts the JSON string back to Ruby.

...

Contributing to IronResponse

If you would like to add to these capabilities, here’s how:

  1. Fork the repo
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Added some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––

Alan is a big contributor to the Iron.io community and we're grateful for his work here and on other projects that help bring super easy scaling and concurrency to the development community.

If you're a Ruby developer and want to give this a try, let us and him know what you think. And if you’d like to replicate this for other languages, feel free to model this approach. It's pretty sound in our opinion. Be sure to get in touch with us if you do (so we can acknowledge you and pass the word).