Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is pretty interesting, because the effects of migrating from lz4 to zstd were:

- Total storage usage reduced by ~21% (for our dataset, this is on the order of petabytes)

- Average write operation duration decreased by 50% on our fullest machines

- No observable query performance effects

It seems like the better compression ratio and resulting reduced IO more than makes up for increased CPU compared to lz4. I wish they had mentioned the actual effect on CPU.

Compare to the recent thread "The LZ4 introduced in PostgreSQL 14 provides faster compression" [0] where the loudest voices were saying that zstd would not work due to increased CPU. This is a different layer (filesystem compression vs db compression), but this article represents an interesting data point in the conversation.

[0]: https://news.ycombinator.com/item?id=29147656



Author here - it's difficult to provide a single number to summarize what we've observed re: CPU, but one data point is that average CPU utilization across our cluster increased from ~40% to ~50%. This effect is more pronounced during NA daylight hours.

Worth noting that part of the reason this is relatively low impact for our read queries is that the hot portion of our dataset is usually in Postgres page cache where the data is already decompressed (we see a 95-98% cache hit rate under normal conditions). We've noticed the impact more for operations that involve large scans - in particular, backups and index builds have become more expensive.


Hey thanks for the clarification. That seems like a worthwhile tradeoff in your case.

For backups in particular, are ZFS snapshots alone not suitable to serve as a backup? Is there something else that the pg backup process does that is not covered by a "dumb" snapshot?


We use wal-g and extensively leverage its archive/point-in-time restore capabilities. I think it would be tricky to manage similar functionality with snapshots (and possibly more expensive if archival involved syncing to a remote pool).

That being said, wal-g has worked well enough for us that we haven't put a ton of time into investigating alternatives yet, so I can't say for sure whether snapshots would be a better option.


I'd personally recommend pgBackRest as a wal-g replacement. We (Covalent) started with wal-g ourselves, but pgBackRest does full backup and restore so much faster. Besides the parallelism (which is great), pgBackRest's backups are manifests, symbolically mapping to the individual objects in storage that may have come from previous backups. Which means that a differential or incremental backup doesn't need to be "replayed after" a full backup, but instead is just like a git commit, pointing to some newer and some older objects.

Also, auto-expiry of no-longer-needed WAL segments (that we use due to our reliance on async hot standbys) along with previous backups is pretty great.

And we haven't even started taking advantage of pgBackRest's ability to do incremental restore — i.e. to converge a dataset already on disk, that may have fallen out of sync, with as few updates as possible. We're thinking we could use this to allow data science use-cases that would involve writing to a replica, by promoting the replica, allowing the writes, and then converging it back to an up-to-date replica after the fact.


Have you looked at postgres row/column compression? Obviously, compressing the same data twice won't be too helpful, but maybe there are more wins to be had.


how/why did you choose Postgres over MariaDB? I am facing such a decision now.


After working for years with both, I'd say that PostgreSQL is much more friendlier to the developer and more pleasant to work with. In any area: documentation, features, error messages, available SQL features, available extensions, available docs, available books.

One tiny example: I prefer to work with databases using CLI interfaces (mysql and psql).

psql CLI is a tool which is pleasant to use, has no bugs in the interface and it even gets improvements from time to time.

mysql CLI is awful to use (e.g. doesn't display long lines properly, has difficulties with history editing, etc) and looks like there wasn't a single improvement since 1996 (I'm sure there were, I just never felt the effect of such improvements).


Actually, there was a significant regression. Many many years ago, Oracle decided to drop the support for the gpl-licensed readline altogether, likely because they can't ship it with MySQL Enterprise. To this day, Percona still carries a small patch to add that functionality back, which is great because I wouldn't touch any CLI without readline.


Dropping a link to `rlwrap` in case anyone is not familiar with it:

https://github.com/hanslub42/rlwrap

Note that I've never tried it myself with the mysql/mariadb CLI, but I have used it with other tools, and it's brilliant.


Yep Postgres seems to have more features too for sure. Also, a fun little limitation of the MySQL CLI is that it truncates hostnames to 100 characters - not usually a problem, but AWS database VPC hostnames easily hit that limit, and it just silently truncates rather than failing.


"It silently does something unexpected rather than failing" succinctly summarizes all the reasons you shouldn't be using MySQL.


I’ve been using a Postgres foreign data wrapper to interact with a MySQL database and it’s much nicer for interactive use.


Not MariaDB but after working with Postgres for six years I'm now using MySQL and... the error messages are useless, and comparing different types works in strange ways rather than failing like in Postgres.


To add, execution plans presented by MariaDB are also nearly useless.


User grants in Maria/MySQL drive me nuts. I hope it is better in Postgres


Postgres always


https://blog.cloudflare.com/squeezing-the-firehose/ is our story of how we moved from lz4 to zstd (with a stop at snappy in between) in our kafka deployments. Results are/were similar to what Heap is reporting here.


Amazing read, thank you for the share!


> Average write operation duration decreased by 50% on our fullest machines

This is probably not a fair comparison.

On the existing lz4 machines, the zfs pools are already badly fragmented, making it difficult to find regular blocks: https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSGangBloc...

On the new zstd machines, the zfs pools are still pristine, after being restored from backup.

If we want to isolate the effects of migrating from lz4 to zstd, we also need some new lz4 machines for comparison.


That's a great point. Potentially they could implement regular replacement of nodes to obtain the benefits of de-fragmented pools.


Indeed. It looks like they went from ZFS 0.8.x (lz4) to 2.x (zstandard) in one go. That would be too yolo for my taste, but the guys were under time pressure due to the rather sudden performance degradation. Nodes replacement would probably buy some time, to allow the upgrade to be done in 2 phases.


For read frag on ssd shouldn matter, and for write since zfs is tree struct the allocator would create few more nodes and links but no big deal ?


Link is 404 for me.



zstd is almost always a better choice than lz4 if you have a superscalar, vectorizing CPU. This is certainly the case for any datacenter, desktop, or modern laptop CPU out there (although you need to tune zstd accordingly). Even newer phone CPUs are probably better off with zstd. The only place where lz4 is truly appropriate is in embedded/very low power applications, and maybe for opportunistic compression in general purpose applications where it's not known how compressible the data will be. It's worth noting that both algorithms were created by the same person, Yann Collet.


I wonder what, if any, further improvement would be had by comparing xzip vs zstd.

Obviously you need a LOT of CPU to throw at xzip if you want to use it.

zstd is very much more optimized for compression at speeds comparable to traditional gzip.

I use xzip primarily for things that will get compressed to long term storage and the time to create the archive isn't a really important factor.

in this test: https://sysdfree.wordpress.com/2020/01/04/293/

zstd level 19 wins on time vs. xz levels 5 through 9, but the xz ultimate compressed file size is definitely smaller.


If your system experiences periods of greater and lesser load, then using the rest of whatever is its load capacity, during periods of lesser load, on further compressing its contents might be worth the bother.

Perhaps better than stepping to a different compression algorithm, zstd has multiple levels of compression that might be used at different times. The advantage there is that the same decompression algorithm works for all.

One might reasonably hope that decompression tables may be shared amongst multiple of the 64k raw blocks, to further squeeze usage.


The problem with xz is also partially that zstd decompresses far faster.


xzip wasn't an option for zfs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: