I think you're defining consumer as the literal line of code where the field is ...

kentonv · on July 28, 2023

> I think you're defining consumer as the literal line of code where the field is read

I am.

> After all it's usually better to abort early than half way through an operation.

I realize this goes against common wisdom, but I actually disagree.

It's simply unrealistic to imagine that we can fully determine whether an operation will succeed by examining the inputs upfront. Even if the inputs are fully valid, all sorts of things can go wrong at runtime. Maybe a database connection is randomly dropped. Maybe you run out of memory. Maybe the power goes out.

So we already have to design our code to be tolerant to random failures in the middle. This is why we try to group our state changes into a single transaction, or design things to be idempotent.

Given we already have to do all that, I think trying to validate input upfront creates more trouble than it solves. When your validation code is far away from the code that actually processes the data, it is easier to miss things and harder to keep in sync.

To be clear, though, this does not mean I like dynamic typing. Static types are great. But the reason I like them is more because they make programming easier, letting you understand the structure of the data you're dealing with, letting the IDE implement auto-complete, jump-to-definition, and error checking, etc.

Consider TypeScript, which implements static typing on JavaScript, but explicitly does not perform any runtime checks whatsoever validating types. It's absolutely possible that a value at runtime does not match the type that TypeScript assigned to it. The result is a runtime exception when you try to access the value in a way that it doesn't support (even though its type says it should have). And yet, people love TypeScript, it clearly provides value despite this.

This stuff makes a programming language theorist's head explode but it practice it works. Look, anything can be invalid in ways you never thought of, and no type system can fully defend you from that. You gotta get comfortable with the idea that exceptions might be thrown from anywhere, and design systems to accommodate failure.

mike_hearn · on July 28, 2023

I agree with a lot of this, but:

1. The advantage of having it in the type system is the compiler can't forget.

2. It's quite hard to unwind operations in C++. I think delaying validation to the last moment is easier when you have robust exceptions. At the top level the frameworks can reject RPCs or return a 400 or whatever it is you want to do, if it's found out 20 frames deep into some massive chunk of code then you're very likely to lose useful context as the error gets unwound (and worse error messages).

On forgetting, the risky situation is something like this:

    message FooRequest {
        required string query = 1;
        optional list<string> options = 2;   // added later
    }

The intention is: in v1 of the message there's some default information returned, but in v2 the client is given more control including the ability to return less information as well as more. In proto2 you can query if options is set, and if not, select the right default value. In proto3 you can't tell the difference between an old client and a client that wants no extra information returned. That's a bug waiting to happen: the difference between "not set" and "default value" is important. Other variants are things like adding "int32 timeout" where it defaults to zero, or even just having a client that forgets to set a required field by mistake.

TypeScript does indeed not do validation of type casts up front, but that's more because it's specifically designed to be compatible with JavaScript and the runtime doesn't do strong typing. People like it compared to raw JS.

Dylan16807 · on July 29, 2023

> Consider TypeScript, which implements static typing on JavaScript, but explicitly does not perform any runtime checks whatsoever validating types. It's absolutely possible that a value at runtime does not match the type that TypeScript assigned to it. The result is a runtime exception when you try to access the value in a way that it doesn't support (even though its type says it should have). And yet, people love TypeScript, it clearly provides value despite this.

> This stuff makes a programming language theorist's head explode but it practice it works. Look, anything can be invalid in ways you never thought of, and no type system can fully defend you from that. You gotta get comfortable with the idea that exceptions might be thrown from anywhere, and design systems to accommodate failure.

It's only possible if you're doing something wrong type-wise. In particular, when ingesting an object you're supposed to validate it before/as you assign the type to it. Delaying the error until the particular field is accessed is bad TypeScript! Those kinds of exceptions aren't supposed to be thrown from anywhere.

skybrian · on July 28, 2023

I think this comes from everyone wanting to use the same schema and parser. For example, a text editor and a compiler have obvious differences in how to deal with invalid programs.

Maybe there need to be levels of validation, like "it's a text file" versus "it parses" versus "it type checks."

mike_hearn · on July 28, 2023

Sure, that would also have been a fine solution. There are lots of ways to tackle it really and some of it is just very subjective. There's a lot of similarities here between the NoSQL vs SQL debates. Do you want a schemaless collection of JSON documents or do you want enforced schemas, people can debate this stuff for a long time.

You can also see it as a version control and awareness problem rather than a schema or serialization problem. The issues don't occur if you always have full awareness of what code is running and what's consuming what data, but that's hard especially when you take into account batch jobs.