This is pretty interesting, because the effects of migrating from lz4 to zstd were:
- Total storage usage reduced by ~21% (for our dataset, this is on the order of petabytes)
- Average write operation duration decreased by 50% on our fullest machines
- No observable query performance effects
It seems like the better compression ratio and resulting reduced IO more than makes up for increased CPU compared to lz4. I wish they had mentioned the actual effect on CPU.
Compare to the recent thread "The LZ4 introduced in PostgreSQL 14 provides faster compression" [0] where the loudest voices were saying that zstd would not work due to increased CPU. This is a different layer (filesystem compression vs db compression), but this article represents an interesting data point in the conversation.
Author here - it's difficult to provide a single number to summarize what we've observed re: CPU, but one data point is that average CPU utilization across our cluster increased from ~40% to ~50%. This effect is more pronounced during NA daylight hours.
Worth noting that part of the reason this is relatively low impact for our read queries is that the hot portion of our dataset is usually in Postgres page cache where the data is already decompressed (we see a 95-98% cache hit rate under normal conditions). We've noticed the impact more for operations that involve large scans - in particular, backups and index builds have become more expensive.
Hey thanks for the clarification. That seems like a worthwhile tradeoff in your case.
For backups in particular, are ZFS snapshots alone not suitable to serve as a backup? Is there something else that the pg backup process does that is not covered by a "dumb" snapshot?
We use wal-g and extensively leverage its archive/point-in-time restore capabilities. I think it would be tricky to manage similar functionality with snapshots (and possibly more expensive if archival involved syncing to a remote pool).
That being said, wal-g has worked well enough for us that we haven't put a ton of time into investigating alternatives yet, so I can't say for sure whether snapshots would be a better option.
I'd personally recommend pgBackRest as a wal-g replacement. We (Covalent) started with wal-g ourselves, but pgBackRest does full backup and restore so much faster. Besides the parallelism (which is great), pgBackRest's backups are manifests, symbolically mapping to the individual objects in storage that may have come from previous backups. Which means that a differential or incremental backup doesn't need to be "replayed after" a full backup, but instead is just like a git commit, pointing to some newer and some older objects.
Also, auto-expiry of no-longer-needed WAL segments (that we use due to our reliance on async hot standbys) along with previous backups is pretty great.
And we haven't even started taking advantage of pgBackRest's ability to do incremental restore — i.e. to converge a dataset already on disk, that may have fallen out of sync, with as few updates as possible. We're thinking we could use this to allow data science use-cases that would involve writing to a replica, by promoting the replica, allowing the writes, and then converging it back to an up-to-date replica after the fact.
Have you looked at postgres row/column compression? Obviously, compressing the same data twice won't be too helpful, but maybe there are more wins to be had.
After working for years with both, I'd say that PostgreSQL is much more friendlier to the developer and more pleasant to work with. In any area: documentation, features, error messages, available SQL features, available extensions, available docs, available books.
One tiny example: I prefer to work with databases using CLI interfaces (mysql and psql).
psql CLI is a tool which is pleasant to use, has no bugs in the interface and it even gets improvements from time to time.
mysql CLI is awful to use (e.g. doesn't display long lines properly, has difficulties with history editing, etc) and looks like there wasn't a single improvement since 1996 (I'm sure there were, I just never felt the effect of such improvements).
Actually, there was a significant regression. Many many years ago, Oracle decided to drop the support for the gpl-licensed readline altogether, likely because they can't ship it with MySQL Enterprise. To this day, Percona still carries a small patch to add that functionality back, which is great because I wouldn't touch any CLI without readline.
Yep Postgres seems to have more features too for sure.
Also, a fun little limitation of the MySQL CLI is that it truncates hostnames to 100 characters - not usually a problem, but AWS database VPC hostnames easily hit that limit, and it just silently truncates rather than failing.
Not MariaDB but after working with Postgres for six years I'm now using MySQL and... the error messages are useless, and comparing different types works in strange ways rather than failing like in Postgres.
https://blog.cloudflare.com/squeezing-the-firehose/ is our story of how we moved from lz4 to zstd (with a stop at snappy in between) in our kafka deployments. Results are/were similar to what Heap is reporting here.
Indeed. It looks like they went from ZFS 0.8.x (lz4) to 2.x (zstandard) in one go. That would be too yolo for my taste, but the guys were under time pressure due to the rather sudden performance degradation. Nodes replacement would probably buy some time, to allow the upgrade to be done in 2 phases.
zstd is almost always a better choice than lz4 if you have a superscalar, vectorizing CPU. This is certainly the case for any datacenter, desktop, or modern laptop CPU out there (although you need to tune zstd accordingly). Even newer phone CPUs are probably better off with zstd. The only place where lz4 is truly appropriate is in embedded/very low power applications, and maybe for opportunistic compression in general purpose applications where it's not known how compressible the data will be. It's worth noting that both algorithms were created by the same person, Yann Collet.
If your system experiences periods of greater and lesser load, then using the rest of whatever is its load capacity, during periods of lesser load, on further compressing its contents might be worth the bother.
Perhaps better than stepping to a different compression algorithm, zstd has multiple levels of compression that might be used at different times. The advantage there is that the same decompression algorithm works for all.
One might reasonably hope that decompression tables may be shared amongst multiple of the 64k raw blocks, to further squeeze usage.
- Total storage usage reduced by ~21% (for our dataset, this is on the order of petabytes)
- Average write operation duration decreased by 50% on our fullest machines
- No observable query performance effects
It seems like the better compression ratio and resulting reduced IO more than makes up for increased CPU compared to lz4. I wish they had mentioned the actual effect on CPU.
Compare to the recent thread "The LZ4 introduced in PostgreSQL 14 provides faster compression" [0] where the loudest voices were saying that zstd would not work due to increased CPU. This is a different layer (filesystem compression vs db compression), but this article represents an interesting data point in the conversation.
[0]: https://news.ycombinator.com/item?id=29147656