There were two things I think went extremely poorly here:
1) Lack of validation of the configuration file.
Rolling out a config file across the global network every 5 minutes is extremely high risk. Even without hindsight, surely one would see then need for very careful validation of this file before taking on that risk?
There were several things "obviously" wrong with the file that validation should have caught:
- It was much bigger than expected.
- It had duplicate entries.
- Most importantly, when loaded into the FL2 proxy, the proxy would panic on every request. At the very least, part of the validation should involve loading the file into the proxy and serving a request?
2) Very long time to identify and then fix such a critical issue.
I can't understand the complete lack of monitoring or reporting? A panic in Rust code, especially from an unwrap, is the application screaming that there's a logic error! I don't understand how that can be conflated with a DDoS attack. How are your logs not filled with backtraces pointing to the exact "unwrap" in question?
Then, once identified, why was it so hard to revert to a known good version of the configuration file? How did noone foresee the need to roll back this file when designing a feature that deploys a new one globally every 5 minutes?
IMO, safety and "idiomatic-ness" of Rust code are two separate concerns, with the former being easier to automate.
In most C code I've read, the lifetimes of pointers are not that complicated. They can't be that complicated, because complex lifetimes are too error prone without automated checking. That means those lifetimes can be easily expressed.
In that sense, a fairly direct C to Rust translation that doesn't try to generate idomatic Rust, but does accurately encode the lifetimes into the type system (ie. replacing pointers with references and Box) is already a huge safety win, since you gain automatic checking of the rules you were already implicitly following.
If that can be automated (which seems increasingly plausible) then the need to do such a translation incrementally also goes away.
Making it idiomatic would be a case of recognising higher level patterns that couldn't be abstracted away in C, but can be turned into abstractions in Rust, and creating those abstractions. That is a more creative process that would require something like an LLM to drive, but that can be done incrementally, and provides a different kind of value from the basic safety checks.
> In that sense, a fairly direct C to Rust translation that doesn't try to generate idomatic Rust, but does accurately encode the lifetimes into the type system (ie. replacing pointers with references and Box) is already a huge safety win, since you gain automatic checking of the rules you were already implicitly following.
Unfortunately, there's a lot of non-trivial C code that really does not come close to following the rules of existing Safe Rust, even at their least idiomatic. Giving up on idiomaticness can be very helpful at times, but it's far from a silver bullet. For example, much C code that uses "shared mutable" data makes no effort to either follow the constraints of Rust Cell<T> (which, loosely speaking, require get or set operations to be tightly self-contained, where the whole object is accessed in one go) or check for the soundness of ongoing borrows at runtime ala RefCell<T> - the invariants involved are simply implied in the flow of the C code. Such code must be expressed using unsafe in Rust.
Even something as simple (to C coders) as a doubly-linked list involves a kind of fancy "static Rc" where two pointers jointly "own" a single list node.
Borrowing patterns can be decoupled and/or "branded" in a way that needs "qcell" or the like in Rust, which we still don't really know how to express idiomatically, etc.
This is not to say that you can't translate such patterns to some variety of Rust, but it will be non-trivial and involve some kind of unsafe code.
I've found Gemini to be much better at completing tasks and following instructions. For example, let's say I want to extract all the questions from a word document and output them as a CSV.
If I ask ChatGPT to do this, it will do one of two things:
1) Extract the first ~10-20 questions perfectly, and then either just give up, or else hallucinate a bunch of stuff.
2) Write code that tries to use regex to extract the questions, which then fails because the questions are too free-form to be reliably matched by a regex.
If I ask Gemini to do the same thing, it will just do it and output a perfectly formed and most importantly complete CSV.
That would be because package version flexibility is an entirely orthogonal concept to lock files, and to conflate them shows a lack of understanding.
pyproject.toml describes the supported dependency versions. Those dependencies are then resolved to some specific versions, and the output of that resolution is the lock file. This allows someone else to install the same dependencies in a reproducible way. It doesn't prevent someone resolving pyproject.toml to a different set of dependency versions.
If you are building a library, downstream users of your library won't use your lockfile. Lockfiles can still be useful for a library: one can use multiple lockfiles to try to validate its dependency specifications. For example you might generate a lockfile using minimum-supported-versions of all dependencies and then run your test suite against that, in addition to running the test suite against the default set of resolved dependencies.
Agree, but I think there is a point to be made here: Go as a language has more subtle runtime invariants that must be upheld compared to other languages, and this has led to a relatively large number of really nasty bugs (eg. there have also been several bugs relating to native function calling due to stack space issues and calling convention differences). By "nasty" I mean ones that are really hard to track down if you don't have the resources that a company like CF does.
To me this points to a lack of verification, testing, and most importantly awareness of the invariants that are relied on. If the GC relies on the stack pointer being valid at all times, then the IR needs a way to guarantee that modifications to it are not split into multiple instructions during lowering. It means that there should be explicit testing of each kind of stack layout, and tests that look at the real generated code and step through it instruction by instruction to verify that these invariants are never broken...
It doesn't link two versions of `rand-core`. That's not even possible with rust (you can only link two semver-incompatible versions of the same crate). And dependency specifications in Rust don't work like that - unless you explicitly override it, all dependencies are semver constraints, so "0.9.0" will happily match "0.9.3".
So there's no difference at all between "0", "0.9" and "0.9.3" in cargo.toml (Since semver says only major version numbers are breaking)? As a decently experienced Rust developer, that's deeply surprising to me.
What if devs don't do a good job of versioning and there is a real incompatibility between 0.9.3 and 0.9.4? Surely there's some way to actually require an exact version?
Notice how the the minimum bound changes while the upper bound is the same for all of them.
The reason for this is that unless otherwise specified, the ^ operator is used, so "0.9" is actually "^0.9", which then gets translated into the kind of range specifier I showed above.
There are other operators you can use, these are the common ones:
(default) ^ Semver compatible, as described above
>= Inclusive lower bound only
< Exclusive upper bound only
= Exact bound
Note that while an exact bound will force that exact version to be used, it still doesn't allow two semver compatible versions of a crate to exist together. For example. If cargo can't find a single version that satisfies all constraints, it will just error.
For this reason, if you are writing a library, you should in almost all cases stick to regular semver-compatible dependency specifications.
For binaries, it is more common to want exact control over versions and you don't have downstream consumers for whom your exact constraints would be a nightmare.
Note that in the output, there is rand 0.9.0, and two instances of rand_core 0.9.3. You may have thought it selected two versions because you missed the _core there.
> So there's no difference at all between "0", "0.9" and "0.9.3" in cargo.toml
No, there is a difference, in particular, they all specify different minimum bounds.
The trick is that these are using the ^ operator to match, which means that the version "0.9.3" will satisfy all of those constraints, and so Cargo will select 0.9.3 (the latest version at the time I write this comment) as the one version to satisfy all of them.
Cargo will only select multiple versions when it's not compatible, that is, if you had something like "1.0.0" and "0.9.0".
> Surely there's some way to actually require an exact version?
Yes, you'd have to use `=`, like `=0.9.3`. This is heavily discouraged because it would lead to a proliferation of duplication in dependency versions, which aren't necessarily unless you are trying to avoid some sort of specific bugfix. This is sometimes done in applications, but basically should never be done in libraries.
Sorry, I don't understand the "^ operator" in this context. Do I understand correctly that cargo will basically select the latest release that matches within a major version, so if I have two crates that specify "0.8" and "0.7.1" as dependencies then the compiler will use "0.8.n" for both? And then if I add a new dependency that specifies "0.9.5", all three crates would use "0.9.5"? Assuming I have that right, I'm quite surprised that it works in practice.
Semver specifies versions. These are the x.y.z (plus other optional stuff) triples you see. Nothing should be complicated there.
Tools that use semver to select versions also define syntax for defining which versions are acceptable. npm calls these “ranges”, cargo calls them “version requirements”, I forget what other tools call them. These are what you actually write in your Cargo.toml or equivalent. These are not defined by the semver specification, but instead, by the tools. They are mostly identical across tools, but not always. Anyway, they often use operators to define the ranges (that’s the name I’m going to use in this post because I think it makes the most sense.) So for example, ‘>3.0.0’ means “any version where x >= 3.” “=3.0.0” means “any version where x is 3, y is 0, and z is 0” which 99% of the time means only one version.
When you write “0.9.3” in a Cargo.toml, you’re writing a range, not a version. When you do not specify an operator, Cargo treats that as if you use the ^ operator. So “0.9.3” is equivalent to “^0.9.3” what does ^ do? It means two things, one if x is 0 and one if x is nonzero. Since “^0.9.3” has x of zero, this range means “any version where x is 0, y is 9, and z is >= 3.” Likewise, “0.9” is equivalent to “^0.9.0” which is “any version where x is 0, y is 9, and z is >=0.”
Putting these two together:
0.9.0 satisfies the latter, but not the former
0.9.1 satisfies the latter, but not the former
0.9.2 satisfies the latter, but not the former
0.9.3 satisfies both
Given that 0.9.3 is a version that has been released, if one package depends on “0.9” and another depends on “0.9.3”, version 0.9.3 satisfies both constraints, and so is selected.
If we had “0.8” and “0.7.1”, no version could satisfy both simultaneously, as “y must be 8” and “y must be 7” would conflict. Cargo would give you both versions in this case, whichever y=8 and y=7 versions have the highest z at the time.
Awesome. Thanks for taking the time. Glad to understand all of this better. I feel a bit silly now meticulously going through and changing all of my "0.9.3"s to "0.9" in the past, but at least now I know better.
It is true that, if the change works on z < 3, you are expanding the possible set of versions a bit, so it's not useless; one could argue that you should only depend on z != 1 if there's a bug you want to make sure that you use the versions past when it works, otherwise, no reason to restrict yourself, but it's not a big deal either way :)
Within a crate graph, for any given major version of a crate (eg. D v1) only a single minor version can exist. So if B depends on D v1.x, and C depends on D v2.x, then two versions of D will exist. If B depends on Dv1.2 and C depends on Dv1.3, then only Dv1.3 will exist.
I'm over-simplifying a few things here:
1. Semver has special treatment of 0.x versions. For these crates the minor version depends like the major version and the patch version behaves like the minor version. So technically you could have v0.1 and v0.2 of a crate in the same crate graph.
2. I'm assuming all dependencies are specified "the default way", ie. as just a number. When a dependency looks like "1.3", cargo actually treats this as "^1.3", ie. the version must be at least 1.3, but can be any semver compatible version (eg. 1.4). When you specify an exact dependency like "=1.3" instead, the rules above still apply (you still can't have 1.3 and 1.4 in the same crate graph) but cargo will error if no version can be found that satisfies all constraints, instead of just picking a version that's compatible with all dependents.
can does not mean must. Cargo attempts to unify (aka deduplicate) dependencies where possible, and in this case, it can find a singular version that satisfies the entire thing.
Also, isn't the way browsers interpolate colors in sRGB just a bug that I assume is retained for backwards compatibility? sRGB is a logarithmic encoding, you were never supposed to interpolate between colors directly in that encoding - the spec says you're suppose to convert to linear RGB first and do the interpolation there...
It's not a bug, its a property of the colour space. Which is partially tied to how the colour is represented (RGB). When doing linear interpolation through the RGB cube (for eg a gradient), you normally pick the shortest path. It just so happens that sometimes that path passes thorough some shade of gray as different colour components are scaled.
Usually you fix it by moving your point through a different colour space. Choice depends on your requirements and mediums you're working with (normally different types of light sources or screens).
I had to write a low level colour interpolation librar for a few interactive art projects, so I dipped a bit into this, but I'm no colour expert
No, sRGB refers to both a colour space and an encoding of that colour space. The encoding is non-linear to make best use of the 256 levels available per channel, but you were never supposed to interpolate sRGB by linearly interpolating the encoded components: you're supposed to apply the transfer function, perform the linear interpolation at higher precision, and then convert back down into the non-linear encoding.
Failure to do this conversion is what leads to the bad results when interpolating: going from red to green will still go through grey but it should go through a much lighter grey compared to what happens if you get the interpolation wrong.
I think GP is referring to the difference between "normal" (gamma-encoded) sRGB and linear sRGB. Though it's not logarithmic but a power law. In any case linear interpolation done in non-linear sRGB gives you intermediate colors that are darker than they should (though historically it's been so common in computer graphics that people are accustomed to it).
Documents don't contain calls to action like "Download X" or "Tell me more about Y", so your argument falls down in relation to the examples presented by W3C.
1) Lack of validation of the configuration file.
Rolling out a config file across the global network every 5 minutes is extremely high risk. Even without hindsight, surely one would see then need for very careful validation of this file before taking on that risk?
There were several things "obviously" wrong with the file that validation should have caught:
- It was much bigger than expected.
- It had duplicate entries.
- Most importantly, when loaded into the FL2 proxy, the proxy would panic on every request. At the very least, part of the validation should involve loading the file into the proxy and serving a request?
2) Very long time to identify and then fix such a critical issue.
I can't understand the complete lack of monitoring or reporting? A panic in Rust code, especially from an unwrap, is the application screaming that there's a logic error! I don't understand how that can be conflated with a DDoS attack. How are your logs not filled with backtraces pointing to the exact "unwrap" in question?
Then, once identified, why was it so hard to revert to a known good version of the configuration file? How did noone foresee the need to roll back this file when designing a feature that deploys a new one globally every 5 minutes?