Exploring a new Postgres database

2020-11-14 [Last modified: 2024-10-21]

/2020/11/14/Exploring-a-new-Postgres-database/ map[name:Craig Kerstiens]

At past jobs I’d estimate we had 100 different production apps that in some way were powering key production systems. Sure some were backend/internal apps while others key production apps such as the dashboard itself. At other companies we had a smaller handful of Heroku apps that powered our cloud service, about 5-10 in total. Even just working with those internal apps it’s a number of things to keep context on. But when it comes to interacting with something you don’t know getting a lay of the land quickly is key. In helping a customer optimize and tune, or even just understand what is going on in their app an understanding of the data model is key.

As I just started a few months back at Crunchy Data I found myself digging into a lot of new systems and quickly trying to ramp up and get a feel for them.

Over the past 10 years I’ve pretty well codified my steps to getting a feel for a new database. While I’m not a DBA and a small portion of my job is spent inside a database being able to quickly navigate one saves me hours each month and days out of the year. I’m sure my process isn’t perfect, but hopefully it helps other when navigating a new Postgres database for the first time.

First the tooling

For any new database my go to tool is psql. The built-in Postgres CLI is going to be the quickest thing for me to navigate around. If you use a CLI for anything else then this should be your preference here as well. I’m also going to have a psqlrc file setup that has some good defaults. My go to defaults in my psqlrc are:

-- Automatically format output based on result length and screen
\x auto 


-- Prettier nulls
\pset null '#'

-- Save history based on database name
\set HISTFILE ~/.psql_history- :DBNAME

-- Turn on automatic query timing
\timing

Getting a feel for the tables

The first thing I’m going to do is just table a look at which objects exist within the database with \d. This will spit out a mix of tables, views, sequences all within your database. A cleaner version of this may be \dt which is only tables:


\d 

                         List of relations
 Schema |             Name              |   Type   |     Owner
--------+-------------------------------+----------+----------------
 public | pg_stat_statements            | view     | postgres
 public | schema_migrations             | table    | postgres
 public | sessions                      | table    | postgres
 public | sessions_id_seq               | sequence | postgres
 public | tasks                         | table    | postgres
 public | teams                         | table    | postgres
 public | users                         | table    | postgres
(7 rows)

\dt

                         List of relations
 Schema |             Name              |   Type   |     Owner
--------+-------------------------------+----------+----------------
 public | schema_migrations             | table    | postgres
 public | sessions                      | table    | postgres
 public | tasks                         | table    | postgres
 public | teams                         | table    | postgres
 public | users                         | table    | postgres
(5 rows)

We can also use the describe operation (\d) on specific relations as well such as tables to get a feel for how they look:


\d users

                                       Table "public.users"
          Column           |           Type           | Collation | Nullable |      Default
---------------------------+--------------------------+-----------+----------+--------------------
 id                        | uuid                     |           | not null | uuid_generate_v4()
 email                     | text                     |           | not null | ''::text
 encrypted_password        | text                     |           | not null | ''::text
 reset_password_token      | text                     |           |          |
 reset_password_sent_at    | timestamp with time zone |           |          |
 remember_created_at       | timestamp with time zone |           |          |
 last_sign_in_at           | timestamp with time zone |           |          |
 created_at                | timestamp with time zone |           | not null |
 updated_at                | timestamp with time zone |           | not null |
 name                      | text                     |           | not null |
 team_id                   | uuid                     |           | not null |
 deleted_at                | timestamp with time zone |           |          |
Indexes:
    "users_pkey" PRIMARY KEY, btree (id)
    "index_users_on_email" UNIQUE, btree (email)
    "index_users_on_reset_password_token" UNIQUE, btree (reset_password_token)
Foreign-key constraints:
    "users_team_id_fkey" FOREIGN KEY (team_id) REFERENCES teams(id)

Human readable output

Of course you may want to go one step further and actually get a sense for the data. Here a basic SELECT tends to work with a limit 1. As you don’t quite know the shape of the data this is where having \x auto setup within your .psqlrc file is helpful to autoformat the output to your screen. You can also just manually run \x auto in your SQL session to get cleaner output.


SELECT *
FROM users
LIMIT 1;

-[ RECORD 1 ]-------------+--------------------------------------------------------------
id                        | 0a7a3cde-3613-4073-86a7-6a19b4e62bbe
email                     | craig.kerstiens@gmail.com
encrypted_password        | $#IJ937Gmsdf00297sEmdfu12234
reset_password_token      | #
reset_password_sent_at    | #
remember_created_at       | 2016-07-14 14:31:01.414795+00
last_sign_in_at           | 2020-02-12 21:32:53.629246+00
created_at                | 2016-02-18 03:03:26.403108+00
updated_at                | 2020-02-14 23:16:16.080729+00
name                      | Craig
team_id                   | d46e864-1886-45e6-b538-8991562d2e99
deleted_at                | #

Time: 91.592 ms

Most databases I work with now leverage JSONB. It’s a great tool for mixing semi-structured data with more structured data.

If you’re using JSON or JSONB then there is also a handy utility function to clean up that output - jsonb_pretty(yourcolumnhere). This will take care of making that huge JSON blob nice and readable.

Feel more at home when you encounter a new database

It doesn’t take hours of reading an ERD or schema files. In about 5-10 minutes of connecting to a new database I’m able to get a sense of how things are structured and able to actually start digging in. Don’t get me wrong, I’m by no means and expert in that time, but knowing some of these basic commands will really help the next time you encounter a database and are asked to help out or glean insights from it.