An interview on what makes Postgres unique (extensions)

I’ve been at dinners before with developers that admitted developers, themselves included, can be a bit opinionated. In one case one said for example, “I love Postgres, but I have no idea why.” They were sitting at the wrong table to use Postgres as an example… But it is quite often that I am asked Why Postgres.

In fact a little over a year ago good friend Dimitri Fontaine asked if he could interview me for a book he’s working on for Postgres. I’ve long said their is a shortage of good books about Postgres and he’s done a great job with his in providing a guide targetted at developers, not just DBAs, that want to become better with their database. What follows is the excerpt of the interview from the book. And if you’re interested in picking up a copy he was friendly enough to share a discount code you can find below.

Intro

Craig heads up the Cloud team @citusdata now running product for Azure Postgres since being acquired by Microsoft. Citus extends Postgres to be a horizontally scalable distributed database. If you have a database, especially Postgres, that needs to scale beyond a single node (typically at 100GB and up) Craig is always happy to chat and see if Citus can help.

Previously Craig has spent a number of years @heroku, a platform-as-a-service, which takes much of the overhead out of IT and lets developers focus on building features and adding value. The bulk of Craig’s time at Heroku was spent running product and marketing for Heroku Data.

In your opinion, how important are extensions for the PostgreSQL open source project and ecosystem?

To me the extension APIs and growing ecosystem of extensions are the biggest advancement to Postgres in probably the last 10 years. Extensions have allowed Postgres to extend beyond a traditional relational database to much more of a data platform. Whether it’s the initial NoSQL datatypes (if we exclude XML that is) in hstore, to the rich feature set in geospatial with GIS, or approximation algorithms such as HyperLogLog or TopN you have extensions that now by themselves take Postgres into a new frontier.

Extensions allow the core to move at a slower pace, which makes sense. Each new feature in core means it has to be thoroughly tested and safe. That’s not to say that extensions don’t, but extensions that can exist outside core, then become part of the contrib provide a great on ramp for things to move much faster.

What are your favorite PostgreSQL extensions, and why?

My favorite three extensions are:

pg_stat_statements
Citus
HyperLogLog

pg_stat_statements is easily the most powerful extension for an application developer without having to understand deep database internals to get insights to optimize their database. For many application developers the database is a black box, but pg_stat_statements is a great foundation for AI for your database that I only expect to be improved upon in time.

Citus: I’m of course biased because I work there, but I followed Citus and pg_shard for 3 years prior to joining. Citus turns Postgres into a horizontally scalable database. Under the covers it’s sharded, but application developers don’t have to think or know about that complexity. With Citus Postgres is equipped to tackle larger workloads than ever before as previously Postgres was constrained to a single box or overly complicated architectures.

HyperLogLog: I have a confession to make. In part I just love saying it, but it also makes you seem uber-intelligent when you read about the algorithm itself. “K minimum value, bit observable patterns, stochastic averaging, harmonic averaging.” I mean who doesn’t want to use something with all those things in it? In simpler terms, it’s close enough approximate uniques that are compose-able with a really small footprint on storage. If you’re building something like a web analytics tool HyperLogLog is an obvious go to.

How do you typically find any extension you might need? Well, how do you know you might need a PostgreSQL extension in the first place?

pgxn.org and github are my two go-tos. Though Google also tends to work pretty well. And of course I stay up to date on new ones via PostgresWeekly.com.
Though in reality I often don’t always realize I need one. I search for the problem I’m trying to solve and discover it. I would likely never search for HyperLogLog, but a search for Postgres approximate count or approximate distincts would yield it pretty quickly.

Is there any downside you could think of when your application code base now relies on some PostgreSQL extension to run? I could think of extension’s availability in cloud and SaaS offerings, for instance.

It really depends. There are extensions that are much more bleeding edge, and ones that are more mature. Many of the major cloud providers support a range of extensions, but they won’t support any extension. If they do support it there isn’t a big downside to leveraging it. If they don’t you need to weigh the cost of running and managing Postgres yourself vs. how much value that particular extension would provide. As with all things managed vs. not, there is a trade-off there and you need to decide which one is right for you.

Though if something is supported and easy to leverage wherever you run, by all means, go for it.

If you’re looking for a deeper resource on Postgres I recommend the book The Art of PostgreSQL. It is by a personal friend that has aimed to create the definitive guide to Postgres, from a developer perspective. If you use code CRAIG15 you’ll receive 15% off as well.

2019-11-13