Skip to main content

Craig Kerstiens

Category: Postgres

Simple but handy Postgres features

It seems each week when I’m reviewing data with someone a feature comes up that they had no idea existed within Postgres. In an effort to continue documenting many of the features and functionality that are useful, here’s a list of just a few that you may find handy the next time your working with your data.

Psql, and \e

This one I’ve covered before, but it’s worth restating. Psql is a great editor that already comes with Postgres. If you’re comfortable on the CLI you should consider giving it a try. You can even setup you’re own .psqlrc for it so that it’s well customized to your liking. In particular turning \timing on is especially useful. But even with all sorts of customization if you’re not aware that you can use your preferred editor by using \e then you’re missing out. This will allow you to open up the last run query, edit it, save–and then it’ll run for you. Vim, Emacs, even Sublime text works just take your pick by setting your $EDITOR variable.

A tour of Postgres' Foreign Data Wrappers

SQL can be a powerful language for reporting. Whether you’re just exploring some data, or generating reports that show month over month revenue growth it’s the lingua franca for data analysis. But, your data isn’t always in a SQL database, even then if you’re using Postgres you can still likely use SQL to analyze, query, even joing with that data. Foreign data wrappers have been around for years in Postgres, but are continuing to mature and be a great option for joining disparate systems.

Overview of foreign data wrappers

If you’re unfamiliar, foreign data wrappers, or FDW, allow you to connect from within Postgres to a remote system. Then you can query them from directly within Postgres. While there is an official Postgres FDW that ships with Postgres itself, that allows you to connect from one Postgres DB to another, there’s also a broad community of others.

At the core of it Postgres provides certain APIs under the covers which each FDW extension can implement. This can include the ability to map SQL to whatever makes sense for a given system, push down various operators like where clauses, and as of Postgres 9.3 can even write data.

Five mistakes beginners make when working with databases

When you start out as a developer there’s an overwhelming amount of things to grasp. First there’s the language itself, then all the quirks of the specific framework you’re using,and after that (or maybe before) we’ll throw front-end development into the mix, and somewhere along the line you have to decide to store your data somewhere.

Early on, with so many things to quickly master, the database tends to be an after-though in application design (perhaps because it doesn’t make an impact to end user experience). As a result there’s a number of bad practices that tend to get picked up when working with databases, here’s a rundown of just a few.

Notice: Much of this post still applies, but now applies more directly to Citus. Since this post originally published, pg_shard is now deprecated. You can find some further guidance for sharding on the Citus blog and docs

Back in 2012 I wrote an overview of database sharding. Since then I’ve had a few questions about it, which have really increased in frequency over the last two months. As a result I thought I’d do a deeper dive with some actual hands on for sharding. Though for this hands on, because I do value my time I’m going to take advantage of pg_shard rather than creating mechanisms from scratch.

For those unfamiliar pg_shard is an open source extension from Citus data who has a commerical product that you can think of is pg_shard++ (and probably much more). Pg_shard adds a little extra to let data automatically distribute to other Postgres tables (logical shards) and Postgres databases/instances (physical shards) thus letting you outgrow a single Postgres node pretty simply.

Alright, enough talk about it, let’s get things up and running.

Writing more legible SQL

A number of times in a crowd I’ve asked how many people enjoy writing SQL, and often there’s a person or two. The follow up is how many people enjoy reading other people’s SQL and that’s unanimously 0. The reason for this is that so many people write bad SQL. It’s not that it doesn’t do the job, it’s just that people don’t tend to treat SQL the same as other languages and don’t follow strong code formatting guidelines. So, of course here’s some of my own recommendations on how to make SQL more readable.

My top 10 Postgres features and tips for 2016

I find during the holiday season many pick up new books, learn a new language, or brush up on some other skill in general. Here’s my contribution to hopefully giving you a few new things to learn about Postgres and ideally utilize in the new year. It’s not in a top 10 list as much as 10 tips and tricks you should be aware of as when you need them they become incredibly handy. But, first a shameless plug if you find any of the following helpful, consider subscribing to Postgres weekly a weekly newsletter with interesting Postgres content.

Postgres 9.5 - The feature rundown

The headline of Postgres 9.5 is undoubtedly: Insert… on conflict do nothing/update or more commonly known as Upsert or Merge. This removes one of the last remaining features which other databases had over Postgres. Sure we’ll take a look at it, but first let’s browse through some of the other features you can look forward to when Postgres 9.5 lands:

Postgres and Node - Hands on using Postgres as a Document Store with MassiveJS

JSONB in Postgres is absolutely awesome, but it’s taken a little while for libraries to come around to make it as useful as would be ideal. For those not following along with Postgres lately, here’s the quick catchup for it as a NoSQL database.

  • In Postgres 8.3 over 5 years ago Postgres received hstore a key/value store directly in Postgres. It’s big limitation was it was only for text
  • In the years after it got GIN and GiST indexes to make queries over hstore extremely fast indexing the entire collection
  • In Postgres 9.2 we got JSON… sort of. Really this way only text validation, but allowed us to create some functional indexes which were still nice.
  • In Postgres 9.4 we got JSONB - the B stands for Better according to @leinweber. Essentially this is a full binary JSON on disk, which can perform as fast as other NoSQL databases using JSON.

Node, Postgres, MassiveJS - A better database experience

First some background–I’ve always had a bit of a love hate relationship with ORMs. ORMs are great for basic crud applications, which inevitably happens at some point for an app. The main two problems I have with ORMs is:

  1. They treat all databases as equal (yes, this is a little overgeneralized but typically true). They claim to do this for database portability, but in reality an app still can’t just up and move from one to another.
  2. They don’t handle complex queries well at all. As someone that sees SQL as a very powerful language, taking away all the power just leaves me with pain.

Of course these aren’t the only issues with them, just the two ones I personally run into over and over.

In some playing with Node I was optimistic to explore Massive.JS as it seems to buck the trend of just imitating all other ORMs. My initial results–it makes me want to do more with Node just for this library. After all the power of a language is the ecosystem of libraries around it, not just the core language. So let’s take a quick tour through with a few highlights of what makes it really great.

Moving past averages in SQL (Postgres) – Percentiles

Often when you’re tracking a metric for the first time you take a look at your average. For example what is your ARPU - Average Revenue Per User. In theory this tells you if you can acquire new user how much you’ll make off that user. Or maybe what’s your average life time value of a customer. Yet, many that are more familiar looking and extracting meaning from data median or a few different looks at percentiles can be much more meaningful.