If any cloudflare employees end up here who helped decide on Capn Proto over other stuff (e.g. protobuf), what considerations went into that choice? I'm curious if the reasons will be things important to me, or things that you don't need to worry about unless you deal with huge scale.
To this day, Cloudflare's data pipeline (which produces logs and analytics from the edge) is largely based on Cap'n Proto serialization. I haven't personally been much involved with that project.
As for Cloudflare Workers, of course, I started the project, so I used my stuff. Probably not the justification you're looking for. :)
That said, I would argue the extreme expressiveness of Cap'n Proto's RPC protocol compared to alternatives has been a big help in implementing sandboxing in the Workers Runtime, as well as distributed systems features like Durable Objects. https://blog.cloudflare.com/introducing-workers-durable-obje...
I don't work at Cloudflare but follow their work and occasionally work on performance sensitive projects.
If I had to guess, they looked at the landscape a bit like I do and regarded Cap'n Proto, flatbuffers, SBE, etc. as being in one category apart from other data formats like Avro, protobuf, and the like.
So once you're committed to record'ish shaped (rather than columnar like Parquet) data that has an upfront parse time of zero (nominally, there could be marshalling if you transmogrify the field values on read), the list gets pretty short.
Cap'n Proto was originally made for https://sandstorm.io/. That work (which Kenton has presumably done at Cloudflare since he's been employed there) eventually turned into Cloudflare workers.
To summarize something from a little over a year after I joined there: Cloudflare was building out a way to ship logs from its edge to a central point for customer analytics and serving logs to enterprise customers. As I understood it, the primary engineer who built all of that out, Albert Strasheim, benchmarked the most likely serialization options available and found Cap'n Proto to be appreciably faster than protobuf. It had a great C++ implementation (which we could use from nginx, IIRC with some lua involved) and while the Go implementation, which we used on the consuming side, had its warts, folks were able to fix the key parts that needed attention.
Anyway. Cloudflare's always been pretty cost efficient machine wise, so it was a natural choice given the performance needs we had. In my time in the data team there, Cap'n Proto was always pretty easy to work with, and sharing proto definitions from a central schema repo worked pretty well, too. Thanks for your work, Kenton!
Albert here :) Decision basically came down to the following: we needed to build a 1 million events/sec log processing pipeline, and had only 5 servers (more were on the way, but would take many months to arrive at the data center). So v1 of this pipeline was 5 servers, each running Kafka 0.8, a service to receive events from the edge and put it into Kafka, and a consumer to aggregate the data. To squeeze all of this onto this hardware footprint, I spent about a week looking for a format that optimized for deserialization speed, since we had a few thousand edge servers serializing data, but only 5 deserializing. Capnproto was a good fit :)
The article says they were using it before hiring him though, so there must have been some prior motivation:
> In fact, you are using Cap’n Proto right now, to view this site, which is served by Cloudflare, which uses Cap’n Proto extensively (and is also my employer, although they used Cap’n Proto before they hired me)