Postgres 15 on Encrypted ZFS and Ubuntu 22.04 LTS
I’m using Postgres for my new startup. The startup is still in stealth mode, so I can’t talk about it much. But, given the focus of the startup is on communication platforms, Postgres made a lot of sense. This is the first time I’ve ever used Postgres, and honestly, I was quite impressed by just the sheer amount of online documentation and help available to get started.
My first impressions of Postgres is another blog post on its own. So, I won’t go into all that here. Right now, I wanted to get Postgres set up correctly in production. And that meant, running it on top of ZFS.
An Encrypted ZFS Pool
For this setup, I’d first create a ZFS pool, specifically to be used by Postgres. For a database, it’s safer to have a ZFS pool which is encrypted by default.
The full instructions on how to create a zpool is in my detailed guide to ZFS. But, I’d show you the basic commands that you need to create a ZFS pool. In this case, I’m creating a mirror pool named struct
over 3 drives.
sudo zpool create -f -o ashift=12 -O encryption=on -O keylocation=prompt -O
keyformat=passphrase -O compression=lz4 struct mirror /dev/disk/by-id/drive1
/dev/disk/by-id/drive2 /dev/disk/by-id/drive3
The above would ask you a passphrase to use for encryption. Do remember to store the passphrase that you used. I’ve been using 1Password for over 5 years now, and highly recommend it.
Note that we set ashift=12
and compression=lz4
. Both are recommended options and can just be left on by default for any new ZFS pools that you create. Rest of the options are specific to encryption.
Now, this pool wouldn’t automatically mount when you restart the machine. To mount it, you’d need to export and reimport the pool. When importing, specify the -l
flag. From man zpool-import
:
Indicates that this command will request encryption keys for all encrypted datasets it attempts to mount as it is bringing the pool online. Note that if any datasets have a keylocation of prompt this command will block waiting for the keys to be entered. Without this flag encrypted datasets will be left un‐ available until the keys are loaded.
sudo zpool export struct
sudo zpool import -l struct
# When prompted, enter the passphrase
This should now mount and make the pool available.
ZFS Settings for Postgres
For postgres data and to enforce corresponding settings, I want to create a specific pool within struct
called pgdata
. Do remember to replace the names with whatever is relevant to you.
$ sudo zfs set atime=off struct/pgdata
$ sudo zfs set compression=lz4 struct/pgdata
$ sudo zfs set recordsize=16K struct/pgdata
$ sudo zfs set primarycache=metadata struct/pgdata
$ sudo zfs set logbias=throughput struct/pgdata
$ sudo zfs get atime,compression,primarycache,recordsize,logbias struct/pgdata
NAME PROPERTY VALUE SOURCE
struct/pgdata atime off local
struct/pgdata compression lz4 inherited from struct
struct/pgdata primarycache metadata local
struct/pgdata recordsize 16K local
struct/pgdata logbias throughput local
Given Postgres would have its own data cache and is a lot closer to the actual queries, we don’t need ZFS primarycache
to be set to all
(the default). Instead we can just set it to metadata
, which stores things like file names, permissions, inode information, block pointers, space maps, properties etc. Having metadata in memory avoids disk reads — so even though we don’t want the actual file contents in memory, we can benefit from having their metadata stay in memory.
Next up is zfs_txg_timeout
. zfs_txg_timeout
is a module parameter in ZFS that controls the maximum time a transaction group (TXG) can stay open before being committed to disk. By default, it’s 5 seconds. We’d set it to 1 second, which means, any pending writes would be flushed to disk every second. That way, the max data loss that can have in case of a power outage would be limited to last 1 second worth of writes.
$ sudo su # Become root
# cat /sys/module/zfs/parameters/zfs_txg_timeout
5
# echo 1 > /sys/module/zfs/parameters/zfs_txg_timeout
# cat /sys/module/zfs/parameters/zfs_txg_timeout
1
# echo 'options zfs zfs_txg_timeout=1' >> /etc/modprobe.d/zfs.conf
With this setting, we can then set Postgres synchronous_commit = off
. We don’t need Postgres to tell ZFS when to flush files to disk. Instead, ZFS is doing that every second anyway, and therefore, the max amount of data loss that we can have in Postgres would also be 1s.
If you’re OK with losing the last second of writes in the worst case scenario, then this is a great way to achieve a potential 10x performance gains over synchronous writes.
Next up, let’s ensure that our DB never runs out of disk.
$ sudo zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
struct 928G 1.70M 928G - - 0% 0% 1.00x ONLINE -
# Setting it to just under 80% of the pool size.
$ sudo zfs set quota=736G struct
$ sudo zfs get quota struct
NAME PROPERTY VALUE SOURCE
struct quota 736G local
$ sudo zfs list
NAME USED AVAIL REFER MOUNTPOINT
struct 1.72M 736G 236K /struct
struct/pgdata 192K 736G 192K /struct/pgdata
We’d use the same struct/pgdata
for the Postgres state data and the write-ahead logs.
Set up Postgres
To setup and run Postgres, follow this guide: The Definitive Guide to Running a PostgreSQL Instance as non-Postgres user.
The only things which you would want to change from that guide are the config settings in /etc/postgresql/15/struct/postgresql.conf
, which we’ll adjust to use ZFS features.
Postgres Settings For ZFS
With ZFS, you don’t need to rely on the database to do integrity checks on the files and sectors. You can instead rely on ZFS, which does these things:
- Always writes a full block (full_page_writes)
- Always ensures filesystem consistency
- Provides compression at file system level (lz4 or zstd)
full_page_writes = off # ZFS always writes the full block anyway
wal_init_zero = off # No need to store
wal_recycle = off
synchronous_commit = off # ZFS flushing to disk every 1s
Postgres Settings with Tailscale
Given that we’re using Tailscale for our VPN and all inter-server communication, there’s no need for Postgres to do SSL communication. It’s already encrypted. So, we turn off the ssl setting.
ssl = off # Rely on Tailscale VPN
And for the rest you can follow the Postgres guide I linked above. Thanks!
Resources
- https://people.freebsd.org/~seanc/postgresql/scale15x-2017-postgresql_zfs_best_practices.pdf
- Recorded talk