Moving Past Averages in SQL (Postgres) – Percentiles

Often when you’re tracking a metric for the first time you take a look at your average. For example what is your ARPU – Average Revenue Per User. In theory this tells you if you can acquire new user how much you’ll make off that user. Or maybe what’s your average life time value of a customer. Yet, many that are more familiar looking and extracting meaning from data median or a few different looks at percentiles can be much more meaningful.

Read on

Upsert Lands in PostgreSQL 9.5 – a First Look

If you’ve followed anything I’ve written about Postgres, you know that I’m a fan. At the same time you know that there’s been one feature that so many other databases have, which Postgres lacks and it causes a huge amount of angst for not being in Postgres… Upsert. Well the day has come, it’s finally committed and will be available in Postgres 9.5.

Sure we’re still several months away from Postgres 9.5 being released, anywhere from 3-6 months as a best guess. That doesn’t mean we can’t take a first look at this feature. Though before we get into it a few special call outs of thanks to Peter Geoghegan of the Heroku Postgres team for being the primary author on it, Andres Freund who recently just joined Citus Data for his heavy contributions, and Heikki Linnakangas as well for his contributions.

Read on

A Product Management Blueprint

I find myself having more conversations with startups – both small and large – about product management. I’ve blogged about some of the tools in my chest here but I haven’t talked much about my “blueprint” for product management, which I find myself laying out in many conversations over coffee. What follows is this process I’ve used a few times over with new teams to get product and engineering moving together, shipping in a predictable manner, and tackling bigger and more strategic projects.

Read on

A Simple Guide for DB Migrations

Most web applications will add/remove columns over time. This is extremely common early on and even mature applications will continue modifying their schemas with new columns. An all too common pitfall when adding new columns is setting a not null constraint in Postgres.

Read on

My Wishlist for Postgres 9.5

As I followed along with the 9.4 release of Postgres I had a few posts of things that I was excited about, some things that missed, and a bit of a wrap-up. I thought this year (year in the sense of PG releases) I’d jump the gun and lay out areas I’d love to see addressed in PostgreSQL 9.5. And here it goes:

Read on

When to Ship It, When to Kill It

A few weeks ago at lunch I had the opportunity to catch up with a company in the current YC batch, building something very similar to dataclips. While we talked about a lot of things from what we’ve learned from dataclips, marketing, and other areas. One area we talked about was product and when to ship vs. when to kill things and I realized I hadn’t talked on my fairly simple but clear view on this publicly, so here it is.

A large credit to Adam Wiggins for giving this model early on in Heroku and his approach to shipping product.

Read on

Scaling Organizations - Scribing

In the process of growing a company there’s several hurdles based on the size of the company. What worked at 5 doesn’t work at 20, what works at 20 doesn’t work at 50, and what worked at 50 doesn’t work at 150. There’s a lot of talk about two pizza teams and scaling development teams out there. One thing I haven’t seen quite enough of is details around scribing and documenting things.

Read on

Postgres and Connection Pooling

Connection pooling is quickly becoming one of the more frequent questions I hear. So here’s a primer on it. If there’s enough demand I’ll follow up a bit further with some detail on specific Postgres connection poolers and setting them up.

The basics

For those unfamiliar, a connection pool is a group of database connections sitting around that are waiting to be handed out and used. This means when a request comes in a connection is already there whether in your framework or some other pooling process, and then given to your application for that specific request or transaction. In contrast, without any connection pooling your application will have to reach out to your database to establish a connection. While in the most basic sense you may thinking connecting to a database is quick, often theres some overhead here. An example is SSL negotiation that may have to occur which means you’re looking at not 1-2 ms but often closer to 30-50.

Read on

Personas, Data Science, K-means

If one of the industry lingo terms in title didn’t make your skin crawl a little then I need to try harder. At the same time you’ve probably heard someone use one of them in a non-trolling way in the last month. All three of these can often actually mean the same or similar things, it’s just people approach them differently from their world perspective.

Personas don’t have to be marketing only speak, and data science doesn’t have to be only for stats people. My goal here is to simply set a context for the rest of the meat which talks about how you can simply look at your data and let it surface things you may not have known. Read on

Postgres Datatypes – the Ones You’re Not Using.

Postgres has a variety of datatypes, in fact quite a few more than most other databases. Most commonly applications take advantage of the standard ones – integers, text, numeric, etc. Almost every application needs these basic types, the rarer ones may be needed less frequently. And while not needed on every application when you do need them they can be an extremely handy. So without further ado let’s look at some of these rarer but awesome types.


Yes, I’ve talked about this one before, yet still not enough people are using it. Of this list of datatypes this is one that could also have benefit for most if not all applications. Read on