Dear Postgres

Dear Postgres,

I’ve always felt an affinity for you in my 9 years of working with you. I know others have known you longer, but that doesn’t mean they love you more. Years ago when others complained about your rigidness or that you weren’t as accommodating as others I found solace in your steadfast values:

  1. Don’t lose data
  2. Adhere to standards
  3. Move forward with a balancing act between new fads of the day while still continuously improving

You’ve been there and seen it all. Years ago you were being disrupted by XML databases. As companies made heavy investment into what such a document database would do for their organization you proceeded to “simply” add a datatype that accomplished the same and brought your years of progress along with it.

In the early years you had the standard format of index b-tree that most database engines leveraged. Then quietly but confidently you started adding more. Then came K-nearest neighbor, generalized inverted indexes (GIN), and generalized search-tree (GiST), only to be followed by space partitioned GiST and block range indexes (BRIN). Now the only question is which do I use?

All the while there was this other camp using for something that felt cool but outside my world: GIS. GIS, geographical information systems, I thought was something only civil engineers used. Then GPS came along, then the iPhone and location based devices came along and suddenly I wanted to find out the nearest path to my Peets, or manage geographical region for my grocery delivery service. PostGIS had been there all along building up this powerful feature set, sadly to this day I still mostly marvel from the sideline at this whole other feature set I long to take advantage of… one day… one day.

A little over 5 years ago I fell in love with your fastly improving analytical capabilities. No you weren’t an MPP system yet, but here came window functions and CTEs, then I almost understood recursive CTEs (still working on that one). I can iterate over data in a recursive fashion without PL/PgSQL? Yes please! I only want to use it more.

And then five years ago, document stores start taking over the world. I feel like I’ve seen this story before, wasn’t XML going to change the internet? Enter JSON, the JSON datatype, and JSONB. Wow, this is really nice to mix relational, document storage, join against things. I suddenly don’t get why more don’t take this flexible approach to building on a good foundation and layering on the refinements.

Extensions! Where have you been all my life? There’s Citus, and HyperLogLog, and ZomboDB, with each I can add functionality to Postgres without it being limited to the standard release, they can be in C or not. Wait, all along so much has been built on this foundation? PostGIS, full-text search, hstore? I like all those things, why didn’t you tell me all along about this foundation? Postgres, I like what I’m seeing how you’re allowing others to do more without having it be in the core of Postgres. This extension stuff is really kinda cool that it’s Postgres and then some, kinda like C and then ++, wait nevermind scratch that analogy.

Sorry, I’ve rambled a bit. You’re a little over twenty years old now. I’ve known you for nearly ten of those years so I know there’s so much about your background I don’t know, I hope we get to spend the time together to share it all. This 10 release is really an exciting one to me. We’ve spent all this time together and I feel like each passing year the bond grows fonder.

Now you’ve brought me better parallelism so I can further utilize my system resources. I now have partitioning. Thank you! I don’t have to roll my own hacks to help age out old data for my time series database. Logical replication will make so many other things possible, such as more online upgrades and integration with other systems.

Postgres, I just want to say thank you for the past ten years together. Thank you for all you’ve done and for all you’ll continue to do in the future.

Tracking and Managing Your Postgres Connections

Managing connections in Postgres is a topic that seems to come up several times a week in conversations. I’ve written some about scaling your connections and the right approach when you truly need a high level of connections, which is to use a connection pooler like pgBouncer. But what do you do before that point and how can you better track what is going on with your connections in Postgres? Read on

Better Database Migrations in Postgres

As your database grows and scales there are some operations that you need to take more care of than you did when you were just starting. When working with your application in your dev environment you may not be fully aware of the cost of some operations until you run them against production. And at some point most of us have been guilty of it, running some migration that starts at 5 minutes, then 15 minutes in it’s still running, and suddenly production traffic is impacted.

There are two operations that tend to happen quite frequently, each with some straightforward approaches to mitigate having any noticable amount of downtime. Let’s look at each of the operations, how they work and then how you can approach them in a safer way. Read on

Postgres Backups: Logical vs. Physical an Overview

It’s not a very disputed topic that you should backup your database, and further test your backups. What is a little less discussed, at least for Postgres, is the types of backups that exist. Within Postgres there are two forms of backups and understanding them is a useful foundation for anyone working with Postgres. The two backup types are

  1. Physical: which consist of the actual bytes on disk,
  2. Logical: which is a more portable format.

Let’s dig into each a bit more so you can better assess which makes sense for you. Read on

Postgres Open Silicon Valley Line-up: First Take

This year Postgres open and PGConf SV have combined to great a bigger and better conference right in downtown San Francisco. I’m obviously biased as I’m one of the co-chairs, and I know every conference organizer says picking the talks was hard, but I’m especially excited for the line-up this year. The hard part for me is going to be which talks do I miss out on because I’m sitting in the other session that’s ongoing. You can see the full list of talk and tutorial sessions, but I thought it’d be fun to do a rundown of some of my favorites. Read on

Working With Time in Postgres

A massive amount of reporting queries, whether really intensive data analysis, or just basic insights into your business involving looking at data over a certain time period. Postgres has really rich support for dealing with time out of the box, something that’s often very underweighted when dealing with a database. Sure, if you have a time-series database it’s implied, but even then how flexible and friendly is it from a query perspective? With Postgres there’s a lot of key items available to you, let’s dig in at the things that make your life easier when querying. Read on

Why Use Postgres (Updated for Last 5 Years)

Five years ago I wrote a post that got some good attention on why you should use Postgres. Almost a year later I added a bunch of things I missed. Many of those items bear repeating, and I’ll recap a few of those in the latter half of this post. But in the last 4-5 years there’s been a lot of improvements and more reasons added to the list of why you should use Postgres. Here’s the rundown of the things that make Postgres a great database you should consider using. Read on

Getting Started With JSONB in Postgres

JSONB is an awesome datatype in Postgres. I find myself using it on a weekly basis these days. Often in using some API (such as clearbit) I’ll get a JSON response back, instead of parsing that out into a table structure it’s really easy to throw it into a JSONB then query for various parts of it.

If you’re not familiar with JSONB, it’s a binary representation of JSON in your database. You can read a bit more about it vs. JSON here.

In working with JSONB here’s a few quick tips to get up and running with it even faster: Read on

Simple but Handy Postgres Features

It seems each week when I’m reviewing data with someone a feature comes up that they had no idea existed within Postgres. In an effort to continue documenting many of the features and functionality that are useful, here’s a list of just a few that you may find handy the next time your working with your data.

Psql, and \e

This one I’ve covered before, but it’s worth restating. Psql is a great editor that already comes with Postgres. If you’re comfortable on the CLI you should consider giving it a try. You can even setup you’re own .psqlrc for it so that it’s well customized to your liking. In particular turning \timing on is especially useful. But even with all sorts of customization if you’re not aware that you can use your preferred editor by using \e then you’re missing out. This will allow you to open up the last run query, edit it, save–and then it’ll run for you. Vim, Emacs, even Sublime text works just take your pick by setting your $EDITOR variable.

Read on

Syncing From Postgres to Salesforce - Data Mappings

For the second time now I’ve had to implement a system that syncs from my system of record into, the first at Heroku and now at Citus Data. The case here is pretty simple, I have a software-as-a-service, B2B product. It’s a homegrown application in these cases in Ruby, but could be Python, .Net, any language of your choosing. The problem is I don’t want to have to be rebuilding my own CRM, reporting, etc. on top of all of my internal database. And as soon as you’re at some reasonable size (sales guy of 1 or more) you need to be able to provide insights on what’s in that system of record database to others.

While my tooling isn’t a full fledged product by any means, here’s a bit of how I’ve developed this process a few times over and some of the annoying bits of code to help get you started. In this post I’ll walk through some of the basic datatypes, then we’ll follow-up with the overall architecture and tweaks you need to make to Salesforce, and finally we’ll provide some example code to help you create this setup yourself. Read on