Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think you're defining consumer as the literal line of code where the field is read, whereas a more natural definition would be something like "the moment the data structure is deserialized". After all it's usually better to abort early than half way through an operation.

It was quite realistic to improve protobufs to help dig web search out of their "everything+dog consumes an enormous monolithic datastructure" problem, assuming that's what you're thinking of (my memory of the details of this time is getting fuzzy).

A simple brute-force fix for their situation would have been to make validation of required fields toggle-able on a per-parse level, so they could disable validation for their own parts of the stack without taking it away for everyone else (none of the projects I worked on had problems with required fields that I can recall).

A better fix would have been for protobufs to support composition. They could then have started breaking down the mega-struct into overlapping protos, with the original being defined as a recursive merge of them. That'd have let them start narrowing down semantically meaningful views over what the programs really needed.

The worst fix was to remove validation features from the language, thus forcing everyone to manually re-add them without the help of the compiler.

Really, the protobuf type system was too simple for Google even in 2006. I recall during training wondering why it didn't have a URL type given that this was a web-centric company. Shortly after I discovered a very simple and obvious bug in web search in which some local business results were 404s even though the URL existed. It had been there for months, maybe years, and I found it by reading the user support forums (nobody else did this, my manager considered me way out of my lane for doing so). The bug was that nothing anywhere in the pipeline checked that the website address entered by the business owner started with https://, so when the result was stuffed into an <a> tag it turned into <a href="www.business.com"> and so the user ended up at https://www.google.com/www.business.com. Oops. These bad strings made it all the way from the business owner, through the LBC frontend, the data pipeline, the intermediate microservices and the web search frontends to the user's browser. The URL did pass crawl validation because when loaded into a URL type, the missing protocol was being added. SREs were trained to do post-mortems, so after it got fixed and the database was patched up, I naively asked whether there was a systematic fix for this, like maybe adding a URL type to protobufs so data would be validated right at the start. The answer was "it sounds like you're asking how to not write bugs" and nothing was done, sigh. It's entirely possible that similar bugs reoccurred dozens of times without being detected.

Those are just a couple of cases where the simplicity (or primitivity) of the protobuf type system led to avoidable problems. Sure, there are complexity limits too, but the actual languages Googlers were using all had more sophisticated type systems than protobuf and bugs at the edges weren't uncommon.



> I think you're defining consumer as the literal line of code where the field is read

I am.

> After all it's usually better to abort early than half way through an operation.

I realize this goes against common wisdom, but I actually disagree.

It's simply unrealistic to imagine that we can fully determine whether an operation will succeed by examining the inputs upfront. Even if the inputs are fully valid, all sorts of things can go wrong at runtime. Maybe a database connection is randomly dropped. Maybe you run out of memory. Maybe the power goes out.

So we already have to design our code to be tolerant to random failures in the middle. This is why we try to group our state changes into a single transaction, or design things to be idempotent.

Given we already have to do all that, I think trying to validate input upfront creates more trouble than it solves. When your validation code is far away from the code that actually processes the data, it is easier to miss things and harder to keep in sync.

To be clear, though, this does not mean I like dynamic typing. Static types are great. But the reason I like them is more because they make programming easier, letting you understand the structure of the data you're dealing with, letting the IDE implement auto-complete, jump-to-definition, and error checking, etc.

Consider TypeScript, which implements static typing on JavaScript, but explicitly does not perform any runtime checks whatsoever validating types. It's absolutely possible that a value at runtime does not match the type that TypeScript assigned to it. The result is a runtime exception when you try to access the value in a way that it doesn't support (even though its type says it should have). And yet, people love TypeScript, it clearly provides value despite this.

This stuff makes a programming language theorist's head explode but it practice it works. Look, anything can be invalid in ways you never thought of, and no type system can fully defend you from that. You gotta get comfortable with the idea that exceptions might be thrown from anywhere, and design systems to accommodate failure.


I agree with a lot of this, but:

1. The advantage of having it in the type system is the compiler can't forget.

2. It's quite hard to unwind operations in C++. I think delaying validation to the last moment is easier when you have robust exceptions. At the top level the frameworks can reject RPCs or return a 400 or whatever it is you want to do, if it's found out 20 frames deep into some massive chunk of code then you're very likely to lose useful context as the error gets unwound (and worse error messages).

On forgetting, the risky situation is something like this:

    message FooRequest {
        required string query = 1;
        optional list<string> options = 2;   // added later
    }
The intention is: in v1 of the message there's some default information returned, but in v2 the client is given more control including the ability to return less information as well as more. In proto2 you can query if options is set, and if not, select the right default value. In proto3 you can't tell the difference between an old client and a client that wants no extra information returned. That's a bug waiting to happen: the difference between "not set" and "default value" is important. Other variants are things like adding "int32 timeout" where it defaults to zero, or even just having a client that forgets to set a required field by mistake.

TypeScript does indeed not do validation of type casts up front, but that's more because it's specifically designed to be compatible with JavaScript and the runtime doesn't do strong typing. People like it compared to raw JS.


> Consider TypeScript, which implements static typing on JavaScript, but explicitly does not perform any runtime checks whatsoever validating types. It's absolutely possible that a value at runtime does not match the type that TypeScript assigned to it. The result is a runtime exception when you try to access the value in a way that it doesn't support (even though its type says it should have). And yet, people love TypeScript, it clearly provides value despite this.

> This stuff makes a programming language theorist's head explode but it practice it works. Look, anything can be invalid in ways you never thought of, and no type system can fully defend you from that. You gotta get comfortable with the idea that exceptions might be thrown from anywhere, and design systems to accommodate failure.

It's only possible if you're doing something wrong type-wise. In particular, when ingesting an object you're supposed to validate it before/as you assign the type to it. Delaying the error until the particular field is accessed is bad TypeScript! Those kinds of exceptions aren't supposed to be thrown from anywhere.


I think this comes from everyone wanting to use the same schema and parser. For example, a text editor and a compiler have obvious differences in how to deal with invalid programs.

Maybe there need to be levels of validation, like "it's a text file" versus "it parses" versus "it type checks."


Sure, that would also have been a fine solution. There are lots of ways to tackle it really and some of it is just very subjective. There's a lot of similarities here between the NoSQL vs SQL debates. Do you want a schemaless collection of JSON documents or do you want enforced schemas, people can debate this stuff for a long time.

You can also see it as a version control and awareness problem rather than a schema or serialization problem. The issues don't occur if you always have full awareness of what code is running and what's consuming what data, but that's hard especially when you take into account batch jobs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: