One can imagine that in a world that had taken a turn towards Pascal strings, we...

theamk · on April 25, 2023

I cannot imagine varints working well for in-memory structures, especially in C. Iterating over string with x[i] is (was?) a very common operation, and with varint length there is an extra memory fetch, extra math and a possible jump. On later CPUs, there is also unaligned reads and possible pipeline-stalling data dependency. Smart optimizers can help, but when C was designed, the compilers were pretty simple. And there are other problems -- for example, a simple act of adding a character to string can cause shift-by-1 invalidating existing pointers.

Varints can be useful on disk/network streram, but I've never seen them used for actively modified in-memory storage.

addaon · on April 25, 2023

Specifically in the case of string length encodings, these objections don't seem to apply. The memory fetch is from a cache line that is already needed for the (beginning of the) string data, so only adds ops in the case of pure substring access (in which case C patterns of holding offset/length out of band apply). Similarly you're going to be accessing a string -- unaligned reads are the least of your problem if you're doing byte-at-a-time ops; and you can always align the string to a word boundary to get the varint to start at a word boundary.

I totally agree that as a general tool, varints have failures. But for (a) encoding pascal-style string lengths; (b) encoding often-small numbers on the wire; and (c) encoding codepoints they seem to apply fine.