At @SCALE 2015

Yesterday I swung down to San Jose to listen to Facebook, Box, Google, Twitter, Microsoft and other leaders speak at the overflowing San Jose Convention Center. The @SCALE 2015 conference was a great opportunity to hear Tech titans speak.

It was a scene of hungry minds and egalitarian ideals. If there’s a core takeaway from @SCALE, it’s this: we’re all in it together. Why not share?

Thanks to JD Hancock for the base image CC BY 2.0 Thanks to JD Hancock for the base image CC BY 2.0

For those unfamiliar, the @SCALE conf is a place for the largest of the large Tech companies to share strategies for serving billions of requests per day. Despite the talks mostly being angled at Tech behemoths, there’s a lot of great content for the common Tech org here, too.

Below are a list of the talks I attended, and what I thought!

Keynote

Scale kicked off with a Keynote from Facebook’s VP of Infrastructure and Engineering, Jay Parikh.

Jay started off with some whizbang graphs about Facebook’s ascent into both the mobile and the video space. One of the more impressive was a graph of the world, with a view on Facebook’s playback success rate. At a glance, it seemed Facebook covered the world.

Jay zoomed in closer, revealing the lie that “averages” were telling us. Some areas, on a city level (as opposed to a global avg), showed gaps. He carried similar insights into the world of UX and performance.

Mr. Parikh is on a search for small tweaks that together add up to a big impact. He called this a search for 1% gains. A phrase later echoed by other speakers. I think this’ll be one we hear more of in the future.

Scaling Flexibility panel

Following the Keynote was a panel of engineering higher ups, interviewed by Cade Metz from Wired. The panel included Sam Schillace SVP of Engineering at Box, Kevin Scott SVP Engineering and Operations at LinkedIn, and Jamshid Mahdavi Software Engineer at WhatsApp.

A big focus of this panel was on choice of language. How do you hire engineers for obscure languages like Erlang? And, does the choice of language matter that much anyways?

In the case of WhatsApp, Box’s VP Sam gave props to Erlang for being one of the first languages to live up to that hype. In general though, all speakers agreed that choice of language means a lot less than most think.

Hire smart engineers, and they’ll adapt. If your company is considering a new language for a new problem, the language should be evaluated by how well it utilizes resources. That could be hardware, but more often than not is human capital.

Keynote Finale

Angelique Reisch wrapped the keynote together with a lovely demo from Pixar. She laid out the challenges faced by her and her peers in the lighting department. Then, showed off some steeply climbing graphs of Pixar’s renderfarm, and it’s growth over time. Scary, steep cliffs here.

The unit for growth was “computer hours per x” which is definitely a new favorite, as metrics go.

Sadly, Angelique’s talk is one that didn’t make it to Youtube. Presumably, the bits of Pixar’s stack that she showcased weren’t meant for Dreamworks public eyes.

The Talk Tracks

From here, the conference split into three main tracks: data, dev tools, and mobile. I hopped around between tracks, mostly settling on data and dev tools.

Twitter Heron Stream Processing at Scale

Karthik Ramasamy, an Engineering Manager at Twitter, kicked off this talk on Heron.

Twitter does a lot of streaming text processing. Unsurprising! What was surprising is their weapon of choice was failing them. They chose Apache Storm as a way to process large swaths at once.

You can think of Storm like IronWorker, but with way more complexity on top. It turned out, that complexity was biting Twitter in the ass. Uneven node power led to one group of issues. Processing guarantees and an architecture that over communicates led to another.

The shared pain behind all of the issues with Storm were a complex mental model. This led to difficulty debugging, and a lot of pain when it came to tuning. All of this drove Twitter to develop a less safe, but API compatible solution in the form of Heron.

Karthik’s talk shows some of the ugliness of stream processing. This talk made me really happy to be sidled up with IronWorker. We’re in the same general space, minus the space shuttle-like complexity.

Data Analytics Monitoring at 250Gbit per second with Facebook

Next up was a talk on how Facebook keeps their analytics real time by Ostap Korkuna.

It’s worth repeating, the title of this talk is “Monitoring at 250Gbit per second.” Ouch. Ostap shared some graphs about Facebook’s data consumption, and went on to explain the cause of many of the peaks and valleys.

One fun insight is that Jevons Paradox applies to analytics. The faster queries become, the more they get run. Meaning, speeding up queries will lead to more desire to run new queries, paradoxically leading you back to slow queries.

Mr. Korkuna had some clever tricks to share. Particularly, that 90% of queries just needed the past day’s worth of data. And, beyond that, most only required the past two weeks. Optimizing for sets this size was definitely attainable.

Open, still unsolved problems were posed at the end of his talk. Definitely a good one for folks building their own BI / analytics stack.

The Motivation for a Monolithic Codebase: Why Google Stores Billions of Lines of Code in a Single Repository

Next up is a talk on monolithic repo’s by Google’s Rachel Potvin.

So, what’s the point of a mono-repo? I mean, aside from using it as a Bogeyman to scare children to sleep at night.

Rachel has a background as a game dev. While in the games biz, she noticed a lot of fragmentation. Games built on the same engine would diverge, and eventually envious product managers would ask for features. The other team built it, why can’t we have it, too?

Cue painful merge process.

Thus, the mono-repo was born. According to Rachel, this is also the first time Google’s allowed the name of the repo to be shared openly: Piper.

One of the big lessons from this talk is version control is hard. The move to a mono-repo like Piper requires a massive amount of tooling and testing. That said, the payoff (simple workflows) is absolutely worth it.

Phrased another way, do a mono-repo right, and you’ll be rewarded with immediately usable and highly collaborative version control.

Engineering culture needs to adapt a bit to make this work, as well. The best practices for Piper sound like good lessons for everyone in the distributed VCS world to pick up on.

The first is that Piper demands explicit ownership of code. Trees and folders have owners. Owners are a team of folks who have a good understanding of that section of code. When commits are made, owners are responsible for the accepting the merge, which happens after the usual code review / test / etc. Ownership is powerful!

The second is a very spiderman-y “With great power, comes great responsibility” style word of caution. For a monolithic codebase to work, code health must be a top priority. As a sub-point, minor changes in a, “everything is merged by an owner somewhere else” world, means that insignificant commits are murder. It’s death by 1,000 paper cuts.

Prioritize code health, and only commit changes that really matter.

Engineering Effectiveness at Twitter: Let 1,000 Flowers Bloom. Then Tear 999 of them out by the Roots

Twitter’s Peter Seibel gives an elegant approach to engineering effectiveness.

A lot of words have been written about the mythical 10x Engineer. A lot in the category of hiring them, for sure. Peter on the other hand focuses on enabling.

To find meaningful gains, Peter suggests looking at the size of your company first. Then, do a bit of napkin math to determine how much saving each engineer 5 minutes a day would net. What about 10 minutes?

Additionally, not all time savings are equal. Peter makes a fantastic point about flow. Flow is a state of high productivity. If this were the 80’s, we’d call it “being in the zone.”

It usually takes about 15 minutes for someone to enter a state of flow. But, just one minute to drop out of it. As a result, Peter conjectures, Tech debt really is the nefarious beast most devs paint it as. Turning an shoestring and gum build script into a solid tool is absolutely worth it, in the context of flow.

My favorite part of this talk is Peter’s take on when to start focusing on engineering effectiveness. With modest assumptions, he shows the first 2 hires for a Tools and Frameworks team should occur at a dev headcount of about 100. The benefit of better tooling, and unencumbered development start to show around this point.

Peter has a lot more to say about the stages that software goes through, as well. This presentation is definitely worth a look if, like me, you enjoy philosophical looks at programming.

Facebook Structuring and Adopting a GraphQL Server

Nick Schrock, a software engineer at Facebook, popped the lid off GraphQL.

Do you think about the world in terms of JOINs? I sure don’t and it sounds like the Facebook team doesn’t either. Nick Schrock was frustrated the way SQL and REST represents the world. If all I want is names of my friends, why isn’t there an easy way to just get a list of names?

That led Facebook to start hacking on GraphQL. The dream is simplicity. Fewer calls, only get what you ask for, and zero effort to support legacy clients and queries.

That is a dream! After watching Nick’s talk, I’m really excited to poke around at GraphQL. From my brief flirt with it, it seems like it’s completely agnostic, and can sit on top of most existing infrastructure as a thin wrapper.

If you’re interested in giving it a try, head over to graphql.org.

But wait, there’s more!

Many more talks from @SCALE are online at their youtube channel. Check ‘em out! There’s a lot more worth viewing.

If social media’s your bag, hop over to their Facebook page, and let them know what you thought of the event.