There are many reasons why UTF-8 is a better encoding but UTF-16 does at least have the benefit of being simpler. Every scalar value is either encoded as a single unit or a pair of units (leading surrogate + trailing surrogate).
However, Powershell (or more often the host console) has a lot of issues with handling Unicode. This has been improving in recent years but it's still a work in progress.
UTF-16 only makes sense if you were sure UCS-2 would be fine, and then oops, Unicode is going to be more than 16-bits and so UCS-2 won't work and you need to somehow cope anyway. It makes zero sense to adopt this in greenfield projects today, whereas Java and Windows, which had bought into UCS-2 back in the early-mid 1990s, needed UTF-16 or else they would need to throw all their 16-bit text APIs away and start over.
UTF-32 / UCS-4 is fine but feels very bloated especially if a lot of your text data is more or less ASCII, which if it's not literally human text it usually will be, and feels a bit bloated even on a good day (it's always wasting 11-bits per character!)
UTF-8 is a little more complicated to handle than UTF-16 and certainly than UTF-32 but it's nice and compact, it's pretty ASCII compatible (lots of tools that work with ASCII also work fine with UTF-8 unless you insist on adding a spurious UTF-8 "byte order mark" to the front of text) and so it was a huge success once it was designed.
So I guess if you're archiving pure CJK text, maybe you could get a 10% benefit, though I suspect non-Unicode encodings of that text would be more compact anyway.
From what I can remember, UTF-8 consumes more CPU as it's more complex to process, has space savings for mostly ascii & European codepages, but can significantly bloat storage sizes for character sets that consistently require 3 or 4 bytes per character.
My team managed a system that did a read from user data, doing input validation. One day we got a smart quote character that happened to be > U+10000. But because the data validation happened in chunks, we only got half of it. Which was an invalid character, so input validation failed.
In UTF-8, partial characters happen so often, they're likely to get tested. In UTF-16, they are more rarely seen, so things work until someone pastes in emoji and then it falls apart.
> There are many reasons why UTF-8 is a better encoding but UTF-16 does at least have the benefit of being simpler. Every scalar value is either encoded as a single unit or a pair of units (leading surrogate + trailing surrogate).
UTF16 is really not noticeably simpler. Decoding UTF8 is really rather straightforward in any language which has even minimal bit-twiddling abilities.
And that’s assuming you need to write your own encoder or decoder, which seems unlikely.
However, Powershell (or more often the host console) has a lot of issues with handling Unicode. This has been improving in recent years but it's still a work in progress.