This is pretty interesting, because the effects of migrating from lz4 to zstd we...

jhk727 · on Nov 9, 2021

Author here - it's difficult to provide a single number to summarize what we've observed re: CPU, but one data point is that average CPU utilization across our cluster increased from ~40% to ~50%. This effect is more pronounced during NA daylight hours.

Worth noting that part of the reason this is relatively low impact for our read queries is that the hot portion of our dataset is usually in Postgres page cache where the data is already decompressed (we see a 95-98% cache hit rate under normal conditions). We've noticed the impact more for operations that involve large scans - in particular, backups and index builds have become more expensive.

infogulch · on Nov 9, 2021

Hey thanks for the clarification. That seems like a worthwhile tradeoff in your case.

For backups in particular, are ZFS snapshots alone not suitable to serve as a backup? Is there something else that the pg backup process does that is not covered by a "dumb" snapshot?

jhk727 · on Nov 9, 2021

We use wal-g and extensively leverage its archive/point-in-time restore capabilities. I think it would be tricky to manage similar functionality with snapshots (and possibly more expensive if archival involved syncing to a remote pool).

That being said, wal-g has worked well enough for us that we haven't put a ton of time into investigating alternatives yet, so I can't say for sure whether snapshots would be a better option.

derefr · on Nov 9, 2021

I'd personally recommend pgBackRest as a wal-g replacement. We (Covalent) started with wal-g ourselves, but pgBackRest does full backup and restore so much faster. Besides the parallelism (which is great), pgBackRest's backups are manifests, symbolically mapping to the individual objects in storage that may have come from previous backups. Which means that a differential or incremental backup doesn't need to be "replayed after" a full backup, but instead is just like a git commit, pointing to some newer and some older objects.

Also, auto-expiry of no-longer-needed WAL segments (that we use due to our reliance on async hot standbys) along with previous backups is pretty great.

And we haven't even started taking advantage of pgBackRest's ability to do incremental restore — i.e. to converge a dataset already on disk, that may have fallen out of sync, with as few updates as possible. We're thinking we could use this to allow data science use-cases that would involve writing to a replica, by promoting the replica, allowing the writes, and then converging it back to an up-to-date replica after the fact.

eptcyka · on Nov 9, 2021

Have you looked at postgres row/column compression? Obviously, compressing the same data twice won't be too helpful, but maybe there are more wins to be had.

TedDoesntTalk · on Nov 9, 2021

how/why did you choose Postgres over MariaDB? I am facing such a decision now.

anticodon · on Nov 10, 2021

After working for years with both, I'd say that PostgreSQL is much more friendlier to the developer and more pleasant to work with. In any area: documentation, features, error messages, available SQL features, available extensions, available docs, available books.

One tiny example: I prefer to work with databases using CLI interfaces (mysql and psql).

psql CLI is a tool which is pleasant to use, has no bugs in the interface and it even gets improvements from time to time.

mysql CLI is awful to use (e.g. doesn't display long lines properly, has difficulties with history editing, etc) and looks like there wasn't a single improvement since 1996 (I'm sure there were, I just never felt the effect of such improvements).

throwdbaaway · on Nov 10, 2021

Actually, there was a significant regression. Many many years ago, Oracle decided to drop the support for the gpl-licensed readline altogether, likely because they can't ship it with MySQL Enterprise. To this day, Percona still carries a small patch to add that functionality back, which is great because I wouldn't touch any CLI without readline.

JonyEpsilon · on Nov 10, 2021

Dropping a link to `rlwrap` in case anyone is not familiar with it:

https://github.com/hanslub42/rlwrap

Note that I've never tried it myself with the mysql/mariadb CLI, but I have used it with other tools, and it's brilliant.

jamesfinlayson · on Nov 10, 2021

Yep Postgres seems to have more features too for sure. Also, a fun little limitation of the MySQL CLI is that it truncates hostnames to 100 characters - not usually a problem, but AWS database VPC hostnames easily hit that limit, and it just silently truncates rather than failing.

agent327 · on Nov 10, 2021

"It silently does something unexpected rather than failing" succinctly summarizes all the reasons you shouldn't be using MySQL.

NegativeLatency · on Nov 10, 2021

I’ve been using a Postgres foreign data wrapper to interact with a MySQL database and it’s much nicer for interactive use.

jamesfinlayson · on Nov 9, 2021

Not MariaDB but after working with Postgres for six years I'm now using MySQL and... the error messages are useless, and comparing different types works in strange ways rather than failing like in Postgres.

1500100900 · on Nov 10, 2021

To add, execution plans presented by MariaDB are also nearly useless.

TedDoesntTalk · on Nov 10, 2021

User grants in Maria/MySQL drive me nuts. I hope it is better in Postgres

unixhero · on Nov 10, 2021

Postgres always

matsur · on Nov 9, 2021

https://blog.cloudflare.com/squeezing-the-firehose/ is our story of how we moved from lz4 to zstd (with a stop at snappy in between) in our kafka deployments. Results are/were similar to what Heap is reporting here.

pdimitar · on Nov 9, 2021

Amazing read, thank you for the share!

throwdbaaway · on Nov 10, 2021

> Average write operation duration decreased by 50% on our fullest machines

This is probably not a fair comparison.

On the existing lz4 machines, the zfs pools are already badly fragmented, making it difficult to find regular blocks: https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSGangBloc...

On the new zstd machines, the zfs pools are still pristine, after being restored from backup.

If we want to isolate the effects of migrating from lz4 to zstd, we also need some new lz4 machines for comparison.

abrookewood · on Nov 10, 2021

That's a great point. Potentially they could implement regular replacement of nodes to obtain the benefits of de-fragmented pools.

throwdbaaway · on Nov 10, 2021

Indeed. It looks like they went from ZFS 0.8.x (lz4) to 2.x (zstandard) in one go. That would be too yolo for my taste, but the guys were under time pressure due to the rather sudden performance degradation. Nodes replacement would probably buy some time, to allow the upgrade to be done in 2 phases.

Ford__Prefect · on Nov 10, 2021

For read frag on ssd shouldn matter, and for write since zfs is tree struct the allocator would create few more nodes and links but no big deal ?

ericbarrett · on Nov 10, 2021

Link is 404 for me.

phaer · on Nov 10, 2021

The link seems to be truncated above: https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSGangBloc...

ak217 · on Nov 10, 2021

zstd is almost always a better choice than lz4 if you have a superscalar, vectorizing CPU. This is certainly the case for any datacenter, desktop, or modern laptop CPU out there (although you need to tune zstd accordingly). Even newer phone CPUs are probably better off with zstd. The only place where lz4 is truly appropriate is in embedded/very low power applications, and maybe for opportunistic compression in general purpose applications where it's not known how compressible the data will be. It's worth noting that both algorithms were created by the same person, Yann Collet.

walrus01 · on Nov 9, 2021

I wonder what, if any, further improvement would be had by comparing xzip vs zstd.

Obviously you need a LOT of CPU to throw at xzip if you want to use it.

zstd is very much more optimized for compression at speeds comparable to traditional gzip.

I use xzip primarily for things that will get compressed to long term storage and the time to create the archive isn't a really important factor.

in this test: https://sysdfree.wordpress.com/2020/01/04/293/

zstd level 19 wins on time vs. xz levels 5 through 9, but the xz ultimate compressed file size is definitely smaller.

ncmncm · on Nov 9, 2021

If your system experiences periods of greater and lesser load, then using the rest of whatever is its load capacity, during periods of lesser load, on further compressing its contents might be worth the bother.

Perhaps better than stepping to a different compression algorithm, zstd has multiple levels of compression that might be used at different times. The advantage there is that the same decompression algorithm works for all.

One might reasonably hope that decompression tables may be shared amongst multiple of the 64k raw blocks, to further squeeze usage.

namibj · on Nov 10, 2021

The problem with xz is also partially that zstd decompresses far faster.

mayli · on Nov 9, 2021

xzip wasn't an option for zfs.