Howdy! I’m Paul, and I’m one of the newer faces here at iron.io. My plan is to blog, champion local meetups, and empower developers. That last one is a whiz-bang business term for “help folks learn how to do dead-simple job processing.”
Today, I’ll blog about an old love of mine, Common Table Expressions (CTEs). It’s a tool anyone who’s even sneezed near a SQL statement ought to know about. Just a select few SQL sorcerers seem to be aware of their existence. Let’s change that!
CTEs are an easy way to beautify and speed up SQL. They beautify by aliasing and separating small query chunks. They speed up the same way a temporary table would, but without the need for cleanup.
The net result of beauty and speed is maintainability. Which, is why I love CTEs.
Let’s take a look at a basic CTE (postgres syntax):
WITH table1 AS ( [SELECT statement here] ) , table2 AS ( [SELECT statement here] ) SELECT * FROM table1, table2 WHERE [...]
The basic components are:
- We tell the interpreter it’s a CTE by starting with
WITH. CTEs are sometimes called WITH queries for this reason.
- The names
table2are used as aliases. You can name just one table, or a bunch.
- The bottom-most
SELECTis the parent query. It joins everything together.
WITH first_acq_date AS ( SELECT company_permalink, min(acquired_quarter) AS min_acquired_quarter FROM tutorial.crunchbase_acquisitions GROUP BY company_permalink ) , raised_amount AS ( SELECT company_permalink, SUM(raised_amount_usd) FROM tutorial.crunchbase_investments WHERE raised_amount_usd is not null GROUP BY company_permalink ) , url AS ( SELECT permalink, homepage_url FROM tutorial.crunchbase_companies ) SELECT fad.*, ra.sum, url.homepage_url FROM first_acq_date fad, raised_amount ra, url WHERE fad.company_permalink = ra.company_permalink AND fad.company_permalink = url.permalink ORDER BY min_acquired_quarter, sum
The benefits of CTEs should be clear. If you name intelligently, aliasing each chunk of the query means CTEs are self-documenting. Martin Fowler is happy.
Follow the aliases in the above. We’re finding results for a company’s first acquisition date, the amount of funding raised, and a link to the company website. The parent query stitches the Frankenstein together, and voila! We’re done.
Thanks to Mode we get this snazzy time-series graph of capital raised, as measured in unicorns.
Adding market cap as a value would be interesting. Good news – since we’re using a WITH query that’s as simple as aliasing market cap, and updating the parent.
So, when should you use CTEs? They shine in queries with two or more joins or sub-selects. I also think they’re a fantastic way to write self-joins. This blurb from the postgres docs reveals an important use case:
A useful property of WITH queries is that they are evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries. Thus, expensive calculations that are needed in multiple places can be placed within a WITH query to avoid redundant work.
In other words, CTEs are a great candidate for speeding up expensive joins.
CTEs made my life better. Now that you know about them, they should do the same for you. Go forth! Write beautiful less ugly SQL. Here are docs in various SQL dialects: