Postgres 15 on Encrypted ZFS and Ubuntu 22.04 LTS

I’m using Postgres for my new startup. The startup is still in stealth mode, so I can’t talk about it much. But, given the focus of the startup is on communication platforms, Postgres made a lot of sense. This is the first time I’ve ever used Postgres, and honestly, I was quite impressed by just the sheer amount of online documentation and help available to get started.

My first impressions of Postgres is another blog post on its own. So, I won’t go into all that here. Right now, I wanted to get Postgres set up correctly in production. And that meant, running it on top of ZFS.

An Encrypted ZFS Pool

For this setup, I’d first create a ZFS pool, specifically to be used by Postgres. For a database, it’s safer to have a ZFS pool which is encrypted by default.

The full instructions on how to create a zpool is in my detailed guide to ZFS. But, I’d show you the basic commands that you need to create a ZFS pool. In this case, I’m creating a mirror pool named struct over 3 drives.

sudo zpool create -f -o ashift=12 -O encryption=on -O keylocation=prompt -O
keyformat=passphrase -O compression=lz4 struct mirror /dev/disk/by-id/drive1
/dev/disk/by-id/drive2 /dev/disk/by-id/drive3

The above would ask you a passphrase to use for encryption. Do remember to store the passphrase that you used. I’ve been using 1Password for over 5 years now, and highly recommend it.

Note that we set ashift=12 and compression=lz4. Both are recommended options and can just be left on by default for any new ZFS pools that you create. Rest of the options are specific to encryption.

Now, this pool wouldn’t automatically mount when you restart the machine. To mount it, you’d need to export and reimport the pool. When importing, specify the -l flag. From man zpool-import:

Indicates that this command will request encryption keys for all encrypted datasets it attempts to mount as it is bringing the pool online. Note that if any datasets have a keylocation of prompt this command will block waiting for the keys to be entered. Without this flag encrypted datasets will be left un‐ available until the keys are loaded.

sudo zpool export struct
sudo zpool import -l struct
# When prompted, enter the passphrase

This should now mount and make the pool available.

ZFS Settings for Postgres

For postgres data and to enforce corresponding settings, I want to create a specific pool within struct called pgdata. Do remember to replace the names with whatever is relevant to you.

$ sudo zfs set atime=off struct/pgdata
$ sudo zfs set compression=lz4 struct/pgdata
$ sudo zfs set recordsize=16K struct/pgdata
$ sudo zfs set primarycache=metadata struct/pgdata
$ sudo zfs set logbias=throughput struct/pgdata

$ sudo zfs get atime,compression,primarycache,recordsize,logbias struct/pgdata
NAME           PROPERTY      VALUE           SOURCE
struct/pgdata  atime         off             local
struct/pgdata  compression   lz4             inherited from struct
struct/pgdata  primarycache  metadata        local
struct/pgdata  recordsize    16K             local
struct/pgdata  logbias       throughput      local

Given Postgres would have its own data cache and is a lot closer to the actual queries, we don’t need ZFS primarycache to be set to all (the default). Instead we can just set it to metadata, which stores things like file names, permissions, inode information, block pointers, space maps, properties etc. Having metadata in memory avoids disk reads — so even though we don’t want the actual file contents in memory, we can benefit from having their metadata stay in memory.

Next up is zfs_txg_timeout. zfs_txg_timeout is a module parameter in ZFS that controls the maximum time a transaction group (TXG) can stay open before being committed to disk. By default, it’s 5 seconds. We’d set it to 1 second, which means, any pending writes would be flushed to disk every second. That way, the max data loss that can have in case of a power outage would be limited to last 1 second worth of writes.

$ sudo su # Become root
# cat /sys/module/zfs/parameters/zfs_txg_timeout
5
# echo 1 > /sys/module/zfs/parameters/zfs_txg_timeout
# cat /sys/module/zfs/parameters/zfs_txg_timeout
1
# echo 'options zfs zfs_txg_timeout=1' >> /etc/modprobe.d/zfs.conf

With this setting, we can then set Postgres synchronous_commit = off. We don’t need Postgres to tell ZFS when to flush files to disk. Instead, ZFS is doing that every second anyway, and therefore, the max amount of data loss that we can have in Postgres would also be 1s.

If you’re OK with losing the last second of writes in the worst case scenario, then this is a great way to achieve a potential 10x performance gains over synchronous writes.

Next up, let’s ensure that our DB never runs out of disk.

$ sudo zpool list
NAME     SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
struct   928G  1.70M   928G        -         -     0%     0%  1.00x    ONLINE  -

# Setting it to just under 80% of the pool size.
$ sudo zfs set quota=736G struct
$ sudo zfs get quota struct
NAME    PROPERTY  VALUE  SOURCE
struct  quota     736G   local

$ sudo zfs list
NAME            USED  AVAIL     REFER  MOUNTPOINT
struct         1.72M   736G      236K  /struct
struct/pgdata   192K   736G      192K  /struct/pgdata

We’d use the same struct/pgdata for the Postgres state data and the write-ahead logs.

Set up Postgres

To setup and run Postgres, follow this guide: The Definitive Guide to Running a PostgreSQL Instance as non-Postgres user.

The only things which you would want to change from that guide are the config settings in /etc/postgresql/15/struct/postgresql.conf, which we’ll adjust to use ZFS features.

Postgres Settings For ZFS

With ZFS, you don’t need to rely on the database to do integrity checks on the files and sectors. You can instead rely on ZFS, which does these things:

  • Always writes a full block (full_page_writes)
  • Always ensures filesystem consistency
  • Provides compression at file system level (lz4 or zstd)
full_page_writes = off # ZFS always writes the full block anyway
wal_init_zero = off # No need to store
wal_recycle = off
synchronous_commit = off # ZFS flushing to disk every 1s

Postgres Settings with Tailscale

Given that we’re using Tailscale for our VPN and all inter-server communication, there’s no need for Postgres to do SSL communication. It’s already encrypted. So, we turn off the ssl setting.

ssl = off # Rely on Tailscale VPN

And for the rest you can follow the Postgres guide I linked above. Thanks!

Resources



Date
April 28, 2023