Skip to main content

Craig Kerstiens

Category: Postgres

Explaining your PostgreSQL data

I’ve written a bit before about understanding the output from EXPLAIN and EXPLAIN ANALYZE in PostgreSQL. Though understandably getting a grasp on execution plans could probably use some more guidance. Yet, this time around I’m taking a bit of a cop out and highlighting a few tools instead of documenting myself, which I’ve done in a talk I’ve frequently given Postgres Demystified.

Explain explained

Postgres Indexing - A collection of indexing tips

Even from intial reviews of my previous post on expression based indexes I received a lot of questions and feedback around many different parts of indexing in Postgres. Here’s a mixed collection of valuable tips and guides around much of that. Unused Indexes In an earlier tweet I joked about some SQL that would generate the SQL to add an index to every column: # SELECT 'CREATE INDEX idx_' || table_name || '_' || column_name || ' ON ' || table_name || ' ("' || column_name || '");' FROM information_schema.

Postgres Indexes – Expression/Functional Indexing

Postgres is rich with options for indexing. First you’ve got a variety of types, and beyond that you can do a variety of things with each of these such as create unique indexes, use conditions to index only a portion of your data, or create indexes based on complex expressions or functions. In cases where you commonly use various PostgreSQL functions in your application or reporting you can get some great gains from this.

My SQL Bad Habits

I’m reasonably proficient at SQL – a coworker when pseudocoding some logic for him pointed out that my pseudocode is what he thought was executable SQL. I’m fully capable of writing clear and readable SQL – which most SQL is not. Despite that I still have several bad habits when it comes to SQL. Without further adieu heres some of my dirty laundry so hopefully others can not make the same mistakes.

Using array_agg in Postgres – powerful and flexible

In almost any application it’s common to want to aggregate some set of values together, commonly in a comma separated form. Most developers do this by running a query to get much of the raw data, looping over the data and pushing it into a set, appending each new value to the appropriate key. Hopefully, it’s not a surprise that there’s a much better way to do this with PostgreSQL.

Postgres has a flexible and robust array datatype that comes with a variety of functions. Even without taking advantage of the array datatype in your application, you can still take advantage of some of the functions to get the functionality you need. Lets take a look at an example schema and use case.

Getting more out of psql (The PostgreSQL CLI)

After my last post I had a variety of readers reach out about many different tweaks they’d made to their workflows using with psql. One people Grégoire Hubert had a wondeful extensive list of items. Grégoire has been a freelance in web development and he has worked with Postgresql for some time now in addition to being the author of Pomm. Without further ado heres what he has to say on how he uses psql:

Get the most of psql

Psql, the CLI postgreSQL client, is a powerful tool. Sadly, lot of developers are not aware of the features and instead look for a GUI to provide what they need. Let’s fly over what can psql do for you.

How I work with Postgres – psql, My PostgreSQL Admin

On at least a weekly basis and not uncommonly multiple times in a single week I get this question:

@neilmiddleton I’ve been hunting for a nice PG interface that works within other things. PGAdmin kinda works, except the SQL editor is a piece of shit

Sometimes it leans more to, what is the Sequel Pro equivilant for Postgres. My default answer is I just use psql, though I do have to then go on to explain how I use it. For those just interested you can read more below or just get the highlights here:

  • Set your default EDITOR then use \e
  • On postgres 9.2 and up \x auto is your friend
  • Set history to unlimited
  • \d all the things

Before going into detail on why psql works perfectly fine as an interface I want to rant for a minute about what the problems with current editors are and where I expect them to go in the future. First this is not a knock on the work thats been done on previous ones, for their time PgAdmin, phpPgAdmin, and others were valuable tools, but we’re coming to a point where theres a broader set of users of databases than ever before and empowering them is becoming ever more important.

Empowering developers, DBA’s, product people, marketers and others to be comfortable with their database will lead to more people taking advantage of whats in their data. pg_stat_statements was a great start to this laying a great foundation for valuable information being captured. Even with all of the powerful stats being captured in the statistics of PostgreSQL so many are still terrified when they see something like:

                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=4.25..8.62 rows=100 width=107) (actual time=0.126..0.230 rows=100 loops=1)
   Hash Cond: (purchases.user_id = users.id)
   ->  Seq Scan on purchases  (cost=0.00..3.00 rows=100 width=84) (actual time=0.012..0.035 rows=100 loops=1)
   ->  Hash  (cost=3.00..3.00 rows=100 width=27) (actual time=0.097..0.097 rows=100 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 6kB
         ->  Seq Scan on users  (cost=0.00..3.00 rows=100 width=27) (actual time=0.007..0.042 rows=100 loops=1)
 Total runtime: 0.799 ms
(7 rows)

Empowering more developers by surfacing this information in a digestable form, such as building on top of pg_stat_statements tools such as datascope by @leinweber and getting this to be part of the default admin we will truly begin empowering a new set of user.

But enough of a detour, those tools aren’t available today. If you’re interested in helping build those to make the community better please reach out. For now I live in a work where I’m quite content with simple ole psql here’s how:

More on Postgres Performance

If you missed my previous post on Understanding Postgres Performance its a great starting point. On this particular post I’m going to dig in to some real life examples of optimizing queries and indexes. It all starts with stats I wrote about some of the great new features in Postgres 9.2 in the recent announcement on support of Postgres 9.2 on Heroku. One of those awesome features, is pg_stat_statements. Its not commonly known how much information Postgres keeps about your database (beyond the data of course), but in reality it keeps a great deal.

Using Postgres Arrays in Django

A few weeks back I did a brief feature highlight on Postgres arrays. Since that time I’ve found myself using them with increasing regularity on small side projects. Much of this time I’m using Django and of course not opting to write raw SQL to be able to use arrays. Django actually makes it quite simple to work with Arrays in Postgres with a package by Andrey Antukh. Lets get started by installing two libraries:

Redis in my Postgres

Yesterday there was a post which hit Hacker News that talked about using SQL to access Mongo. While this is powerful I think much of the true value was entirely missed within the post.

SQL is an expressive language, though people are often okay with accessing Mongo data through its own ORM. The real value is that you could actually query the data from within Postgres then join across your data stores, without having to do some ETL process to move data around. Think… joining sales data from Postgres with user reviews stored in Mongo or searching for visits to a website (retained in redis) against purchases by user in Postgres.

The mechanism pointed out was a MongoDB Foreign Data Wrapper. A Foreign Data Wrapper or FDW essentially lets you connect to an external datastore from within a Postgres database. In addition to the Mongo FDW released the other day there’s many others. For example Postgres 9.0 and up ships with one called db_link, which lets you query and join across two different Postgres databases. Beyond that there’s support for a variety of other data stores including some you may have never expected:

Lets look at actually getting the Redis one running then see what some of the power of it really looks like. First we have to get the code then build it: