Iron Shout-out: Scout APM

scout apm application monitoring

At Iron we love programming languages.  We started off with Ruby way back in the day, and eventually moved most of our latency critical services to Golang.  Internally, our team is made of some that love Typescript, some that speak fluent Rust, and myself… I’m a big Erlang nerd at heart, so I’m obviously a big Elixir fan.  

While the aesthetics and “pleasantness” of a language are important to its user, each language is a tool, and the right tool should be used for the right job. This often times isn’t the case, and we rely on tooling to give us more insight into the repercussions of our language choices and implementations.

Enter Scout APM.  Out of all the SaaS applications we use, it’s (by far) the one that has saved us the most money.  It must be noted that it’s not a product you install and hope it magically solves all your performance issues and optimizes your infrastructure for you.  Scout APM is more like a map that shows you all the treasure chests. It’s up to you to go dig them up, however deep they may be buried.

< image from dashboard showing performance improvement after a fix here >

After installing Scout APM for the first time, we looked for the lowest hanging fruit.  These were easy fixes like missing indexes in our database, N+1 queries, or slow external network requests.  These were thrown in our pipeline and resolved quickly, as they’re mostly one-line fixes. The next step for us was to identify the larger picture issues.  

Scout APM does a great job of giving you not just the finite details of a particular issue, but also the ability to take a step back and look at things with a birds eye view.  In our case, we found ourselves using an ActiveRecord construct in “many” places in our platform that was causing huge spikes in memory usage that was resulting in extreme process bloat.  This bloat led to churn, and… it definitely snowballed from there.

Our platform used to run on way too many machines and they ran hot.  After fixing most of our performance issues we were able to scale down our instance fleet significantly.  This was even after we went from 30 servers to 2 by moving a critical piece of our infrastructure from Ruby to Golang.

At the end of the day, the cost of Scout APM ended up being an insignificant percentage of what we were saving each month.  It took man hours to fix the actual issues themselves, but these performance enhancements ended up flowing into our pipeline like normal technical debt items.  The benefit of these items is that they were directly tied to decreasing operational costs.

To be noted, we ended up choosing Scout APM due to many factors. One of the biggest reasons was due to their fantastic customer support.  They went above and beyond to help answer our constant questions when we first started using their platform (and we asked A LOT of questions).  If you aren’t using an APM tool, or aren’t 100% happy with what you’ve got, the engineering team here at Iron highly recommends running with Scout APM.