Postgres backups: Logical vs. Physical an overview
It’s not a very disputed topic that you should backup your database, and further test your backups. What is a little less discussed, at least for Postgres, is the types of backups that exist. Within Postgres there are two forms of backups and understanding them is a useful foundation for anyone working with Postgres. The two backup types are
- Physical: which consist of the actual bytes on disk,
- Logical: which is a more portable format.
Let’s dig into each a bit more so you can better assess which makes sense for you.
Logical backups are the most well known type within Postgres. This is what you get when you run
pg_dump against a database. There are a number of different formats you can get from logical backups and Postgres does a good job of making it easy to compress and configure this backup how you see fit.
When a logical backup is run against a database it is not throttled, this introduces a noticable load on your database.
As it’s reading the data from disk and generating (in layman terms) a bunch of SQL
INSERT statements, it has to actually see the data. It’s of note that older Postgres databases (read: prior to 9.3) there were no checksums against your database. Checksums are just one tool for you to help check against data corruption. Because a logical dump has to actually read and generate the data to insert it will discover any corruption that exists for you.
This portable format is also very useful to pull down copies from production to different environments. I.e. if you need a copy of production data down on your local laptop
pg_dump is the way to do it. Logical backups are also database specific, but then allow you to dump only certain tables.
All in all logical backups bring some good features, but come at two cost:
- Load on your system
- The backup contains data as of the time when it ran
Physical backups are another option when it comes to backing up your database. As we mentioned earlier it is the physical bytes on disk. To understand physical backups we need to know a bit more under the covers about how Postgres works.
Postgres, under the covers, is essentially one giant append only log. When you insert data it gets written to the log known as the write-ahead log (commonly called WAL). When you update data a new record gets written to the WAL. When you delete data a new record gets written to the WAL. Nearly all changes in Postgres including to indexes and otherwise cause an update to the WAL.
With physical backups what you require to be able to create a restore of your database is two things:
base backup, which is a copy of the bytes on disk as of that point and time
- Additional segments of the WAL to put the database in some consistent state.
A physical backup only requires a small amount of WAL to restore the database to some valid state, but this also gives you some new flexibility. With a base backup plus WAL you can start to replay transactions up to a specific point in time. This is often how point-in-time recovery is performed within Postgres. If you accidentally drop a table, yes… it happens, you can:
- Find a base backup before you dropped the table
- Restore that base backup
- Replay wal segments up to roughly that time just before you dropped the table.
If you’re considering setting up physical backups, consider using a tool like WAL-G to help.
Logical vs. Physical which to choose
Both are useful and provide different benefits. At smaller scale, say under 100 GB of data logical backups via
pg_dump are something you should absolutely be doing. Because backups happen quickly on smaller databases you may be able to get out without functionality like point-in-time recovery. At larger scale, as you approach 1 TB physical backups start to become your only option. Because of the load introduced by logical backups and the time lapse between capturing them they become less suitable for production.
Hopefully this primer helps provide a high level overview of the two primary types of backups that exist as options for Postgres. Of course there is much deeper you can go on each, but consider ensuring you have at least one of the two if not both in place. Oh and make sure to test them, an un-tested backup isn’t a backup at all.