GopherFest Summer 2016 Recap

GopherFest

Hundreds of Go enthusiasts gathered at the prestigious Bently Reserve in downtown San Francisco for a day full of talks about data science, scaling, testing, speed, code reuse and refactoring, all in the context of Golang. Below, a write-up of a few selected talks:

Built for Snappiness by Blake Mizerany @bmizerany

When I was in the Ruby community, I built Sinatra. I’ve been using Go since 2009. Now I’m the founder at backplane.io.

Snappy == happy users

Slowness is inexcusable. So much work has gone into allowing engineers to build snappy websites. If you have a powerful language that lets you do powerful things, and things are slow, it’s even more frustrating.

About 400ms is slow. We can’t use a CDN, because by the time the page is served, the data is stale.

We want you to be able to see what is happening (requests per second, backend health, last ping, uptime, load, routing configuration, routing graph) in near-real-time. We previously powered our API through some fairly slow Go endpoints, connected to a database. I ran into Dapper — Google’s distributed tracing system. It allows you to get insights into a distributed architecture. They’re building Lightstep and they started the initiative for OpenTracing.

Blake went through a number of ways they optimized their API calls.

Q: Have you had a problem by missing a broadcast?

A: This is a problem that is bound to happen eventually. If you lose the connection within 1s, you can have your instance reheat its cache itself.

Cleaning up your Go code with Interfaces by Faiq Raza @faiqus at Wercker

Interfaces are great because you don’t need to know much about the underlying system, you just need to know about a common interface to the system, similar to the way that if you can drive one car, you understand the interface to drive other cars.

When a user wants to push to a container, Wercker pings the Docker Hub registry for permissions. If permission is given, we’ll continue the build. It turns out this is hard! Docker came out with a new registry format, AWS has its own registry… but at the end of the day, this can all boil down to an interface. The Authenticator interface is a way to ping a registry and get a “yes” or “no” about whether a user can access that container.

Process Introspection in Production by Peter Sanford @psanford & Muir Manders

Peter and Muir are technical staff members for RetailNext, which has been using Go in production for three years.

Process introspection is looking inside a process to see what is going on; we’re not going to dive into arbitrary introspection, but using hooks to examine the behavior of your application.

Our standard library tools are net/http/pprof and expvar. expvar is less well-known, but it lets you expose metrics and counters outside your application.

Security Issues

There are a few issues running net/http/pprof and expvar in production. They expose their handlers to the client, which could be a security issues. We run our debug server on a UNIX socket, and we have a proxy tool that allows us to run ‘go tool pprof’ against it. This also allows us to define custom debug handlers.

We’ve open-sourced a version of our debug server called cannula. Check it out.

Graph Databases in Go by Barak Michener @barakmich

Barak is an engineer at CoreOS, and spoke about Cayley, a graph database written in Go.

The era of search results is over, and mobile killed it. We don’t want to look at ten blue links on our phone, we just want the answer. Directly answering the question involves knowing nouns.

Triples

Let’s talk about triples: Alice is friends with Bob. A list of triples is a graph: Alice is friends with Bob, Bob is friends with Alice, Bob is friends with Eve…

Graph databases are really nice when you’re changing requirements without having to change a schema. As an example, if you’re changing a SQL database column to have multiple JOINs. It’s tricky to predict the overhead of these JOINs. On the contrary, triples are simple, and expressive. Graphs are effectively many-to-many made easy.

There are already a number of graph databases out there: Neo4j, Titan, AllegroGraph, and many, many more. Barak wrote Cayley. His initiatl goals were simplicity and agnosticism to query languages, backends, and data. There was also an intent to “open-source graphd” and emphasize reuse and speed. Go has speed, and a great stdlib. The decision was made to write Cayley in Go. Cayley is easy to get started with, uses JavaScript as a query languages, and has multiple backend stores.

What cool things can I do with graphs?

“Big data” comes from multiple sources. Triples is a great interchange format, and as an example, ‘cat’ can merge graphs. The simplest unit of data is a binary relationship.

Remember: the schema is data. The schema can be precise, and self-describing. Graphs have the benefit of containing their own schema. Schema.org uses microformats to share schemas, hopefully eventually leading to common bootstraps.

My rule of thumb: the most interesting part of the program is one connection farther. Graph databases are great for these queries.

Refactor your Go code with Confidence by David Calavera @calavera

David Calavera is head of engineering at Netlify, and a contributor to the Docker project. One of his spikes in contributions came when a huge number of commits of his were merged into Docker: splitting the CLI into two. Feedback was good, some people had doubts; it was merged on December 9, 2015.

Docker has a great test suite, including regression tests and testing the CLI and API. The code compiled, and the tests passed. Unfortunately, the TLS and CA had been taken out, so it was not really secure.

This is not the best way to do refactoring. The advantage of doing this in Docker is that it follows a 2-month release process, so if I break something I can do another merge right away. This is a luxury that some people working with production systems do not have.

You can simulate a crummy network connection using the comcast library; but if you can afford to put something in production, you’ll be able to do basically the same thing.

Here’s a small technique I use. Martin Fowler’s BranchByAbstraction is a technique for making large scale changes to a production system by identifying changes that will happen when code is put into production by utilizing an abstraction layer.

Running Experiments using Scientist

First, let’s define an experiment with control code that we know works in production. We want to ensure that our users in production are always going to get the result of the control code.

Then, we define our experiments. If we want to try new code, we encapsulate it into a tryable. Users will not see that in the results in the end, even though the library is trying it at the same time. At this point, you have two codepaths in production.

Teach Alexa New Skills with Go by Mike Flynn @thatmikeflynn

Alexa’s skills are web services, but it’s a lot of work and we don’t want to do that every time when you’re making a bunch of those services. Amazon knows this, and they push you to Lambda. It’s a cool service, and they’ve built in a bunch of the boilerplate. That’s cool, but I want to write it in Go, not JavaScript! And I don’t want to write it on Amazon’s service! Luckily, Mike wrote the boilerplate for you.

Mike’s Go-alexa library lets you build out your Alexa skill without having to rewrite a lot of the boilerplate.

 

A Gopher’s Guide to Data by Daniel Whitenack @dwhitena

Daniel spoke about data science without going too crazy with equations. Go’s unique language features can be applied to data science in interesting ways. Data scientists focus on ETL, Data Cleaning, Organization; Parsing, extraction of patterns, and arithmetic. (This arithmetic could be a neural network, or you’re creating a bar chart.)

Scaling is one challenge; Reproducibility, deployment and maintenance is another. You don’t want your prediction model to be so complex you can’t change it or put it into production.

Data, Gopher Style

I want to inspire you to think that the way that we program in Go is very valuable in the data science context. Data-driven services can be predictably deployed. Coming from a Python data science background, I feel like when I’m deploying, I’m entering the war room and I’m putting all the pieces into place, managing dependencies, and at the end of the day I’m not totally confident that everything will work. When I came to Go, it was great to compile to a static binary and drop it in somewhere.