Hacker Newsnew | past | comments | ask | show | jobs | submit | dcsommer's commentslogin

We must not continue to develop media codecs in memory unsafe languages. Small, auditable sections can opt-out perhaps, but choosing default-unsafe for this type of software is close to professional negligence.

Cryptography and video codecs are notable exceptions, they put a lot of effort to making the code provably memory safe: no recursion, limited use of stack variables, no dynamic allocations, etc. As a result, memory safe languages bring nothing but trouble by making it non deterministic, that’s especially true for crypto where compiler “optimisations” guarantee you side channels attacks.

Thank you for mentioning this.

I wonder IFF Rust had an effects system that a Jasmin MIR transform (ie like SPIRV is for shaders) would be useful?

https://github.com/jasmin-lang/jasmin


Video codecs just don't need to do dynamic allocations because it's not relevant to the problem. There's still certainly plenty of opportunities for memory bugs because there's a lot of pointer math.

How is this POV compatible with the exploitable vulnerabilities, caused by memory safety, found in openh264, x264, dav1d, and practically every video decoder out there?

Easily. It's a tradeoff.

What in the world do you mean by “non-deterministic”?

C compilers, Rust compilers, and assemblers are all deterministic.


In cryptography, you want operations to run in constant time, even if it’s wasteful, otherwise an attacker could guess information about the key or plaintext by measuring execution times.

Modern compilers are extremely clever and will produce machine code that takes full advantage of modern CPU branch predictors, and reorder instructions to better take advantage of pipelining. This in itself will make the same code run at different speeds depending on the input data.

Then there is the whole issue of compiler version roulette. As a developer you have no idea which version of compilers your users and distros will use, and what new and wonderful optimisation they will bring.


I know that, but none of that makes the compiler output non-deterministic.

Determinism does not mean “easy to predict”, it just means “predictable”.


> C compilers, Rust compilers, and assemblers are all deterministic.

Within a version, yes, but not cross version. Different versions of GCC/Clang etc can give you completely different code.


For the codec itself, the majority of it is performance sensitive and often has a significant amount of assembly even, so a memory safe language doesn't change much.

However for the container/extractor... those should absolutely be in a memory safe language, and those are were a lot of the exploits/crashes are, too, as metadata is more fuzzy.

As a practical example of this see something like CrabbyAVIF. All the parser code is rust, but it delegates to dav1d for the actual codec portion


Of the 3 software AV1 encoders, the only one that is fully dead is the Rust encoder (rav1e). If people truly wanted memory safe encoders/decoders, they would fund and develop them.

https://github.com/memorysafety/rav1d got funded and developed. it is unfortunately a bit slower (typically by a single-digit percentage) than dav1d.

I can totally understand why people would want a memory-safe decoder, but a memory-safe encoder is niche. Finding a memory-safety bug in a decoder is a matter of finding a single unchecked integer field somewhere; finding a memory-safety bug in an encoder requires first finding some sort of logic bug in the encoder and then crafting an adversarial input that survives a number of highly lossy transformations.

Compare the number of CVEs against x264 (included decoders don't count!) and FFmpeg's H.264 decoder.


Fully dead in what sense? Seems like it still has active development to me.

It hasn't had any proper quality/speed improvements in years. Only thing that has changed is updating deps and some bug fixes.

Encoding is a way, way less risky thing to be doing compared to decoding.

There are many paths to memory safety, even if the one Rust project seems to be going nowhere.

There's other memory-safe languages, and there's formal verification.

e.g. seL4 favors pancake.


> If people truly wanted memory safe encoders/decoders

Really? How many codecs have your neighbors contributed money for the development of, just curious.


I think these conversations are directed by the parties funding the efforts. Example: "we (large company) want a fast AV2 decoder" -> they pay a specialized team to do it -> this team works in C for the most part, so it is done in C. If there were financial incentives to do it in Rust, they'd pay more for a Rust decoder.

I'm more interested in the idea of general "people" (the commons) funding complex video encoders. I do wish that was the world we lived in, however :)

Given Netflix's involvement with SV1-AV1, (not even that) indirectly, at least 1.

Are you part of any codec development team to use "we" here?

Decoders written in Rust will be a lot slower than the equivalents in assembly.

I think GP is simply identifying a potential popular niche that could be satisfied in a future city builder game, which seems quite on topic.


Yeah, I don't want my preferred playstyle to be favoured, just treated fairly:

* Add all of the non-car transport options: walking paths (including underground and raised paths for walking between large buildings in the winter, a la PATH in Toronto), bike paths, buses, streetcars, light rail, subways, inter-city trains, high speed rail, ferries

* Add parking lots as a feature to all commercial and residential construction, require every car to be parked somewhere when not in use, but allow residents/property owners to decide whether to build parking or not

* Allow land-value taxes as an alternative to property taxes, as well as the possibility for things like street parking and pollution ordinances to give you levers to incentivize/disincentivize the construction of parking lots

* Simulate emissions appropriately from all transport methods

* Parking lots and heavy traffic should lower property values, as citizens complain about the ugliness, pollution, noise, and danger of excessive traffic

There's a whole conversation to be had about the design of games like SimCity and how it affects future urban planners, but that may be going too far afield. Still, I think it would be nice to have a game that doesn't reward car-centric planning while burying the drawbacks.


Just for reference, Wamedia ships on the major Meta apps and on iOS, Android, Desktop, and Web platforms.


We invested a lot into build system optimizations to bring this number down over time, although we did accept on the order of 200 KiB size overhead initially for the stdlib. We initially launched using a Gradle + CMake + Cargo with static linking of the stdlib and some basic linker optimizations. Transitioning WhatsApp Android to Buck2 has helped tremendously to bring the size down, for instance by improving LTO and getting the latest clang toolchain optimizations. Buck2 also hugely improved build times.


Thanks!


Great reading to see beyond the clichéd, sanitized retellings of that era. It really makes you consider the prices paid for what some call progress.


That's the first piece by Didion I've read, after her death, I'd always meant to read her more. The mask-less account was refreshing, the only counter-weight to flower-power I knew about was Altamont, was getting heavy Hunter S. Thompson vibes.


Great work by the MS team. It is great progress to shift OOB access into a controlled crash. These kinds of panic bugs are then easy to remediate, with clear stack traces, as we see in the turn around time from the report.


This is my experience as well: Writing parsers for complex file formats in Rust often leaves a few edge cases which might cause controlled panics. But controlled panics are essentially denial of service attacks. And panics have good logging, making them easy to debug. Plus, you can fuzz for them at scale easily, using tools like "cargo fuzz".

This is a substantial improvement over the status quo.

Tools like WUFFS may be more appropriate for low level parsing logic when you're not willing to risk controlled panics, however.


That's true, but really this kind of problem screams out for the approach taken in WUFFS. Have the programmer who is Wrangling Untrusted File Formats prove that what they wrote is correct as part of that exercise.


> basically fine

How many type confusion 0 days and memory safety issues have we had in dynamic language engines again? I've really lost count.


Are you counting ones that involve running malicious code in a sandbox and not just trusted code on untrusted input? Because then I'd agree, but that's a much harder and different problem.

My impression is that for the trusted code untrusted input case it hasn't been that many, but I could be wrong.


It depends, what language was the sandbox written in?


Sandboxes are difficult independent of language, see all the recent speculation vulnerabilities for instance. Sure, worse languages make it even harder, but I think we're straying from the original topic of "python/ruby" by considering sandboxes at all.


How many ways to cause a segmentation fault in CPython, that don't start with deliberate corruption of the bytecode, are you aware of?

How is "type confusion" a security issue?


Is there a straightforward path to building Zig with polyglot build systems like Bazel and Buck2? I'm worried Zig's reliance on Turing complete build scripts will make building (and caching) such code difficult in those deterministic systems. In Rust, libraries that eschew build.rs are far preferable for this reason. Do Zig libraries typically have a lot of custom build setup?


For bazel:

https://github.com/aherrmann/rules_zig

Real world projects like ZML uses it:

https://github.com/zml/zml


FYI, build scripts are completely optional. Zig can build and run individual source code files regardless of build scripts (`build.zig`). You may need to decipher the build script to extract flags, but that's pretty much it. You can integrate Zig into any workflow that accepts GCC and Clang. (Note: `zig` is also a drop-in replacement C compiler[1])

[1]: https://andrewkelley.me/post/zig-cc-powerful-drop-in-replace...


It would be cool to build a "library clout" measure for all open source software. First collect for all deployed software systems measures of usage per platform and along other interesting dimensions like how that system relates to others (is it a common dependency or platform for other deployed software). Use this to generate "clout" at a deployed software unit level. Then detect all open source libraries compiled in it by binary signature matching or through the software's own build system if it is open. Then a library's "clout" is built from the clout of the projects that use it.

This clout score might be used to guide investments in a non-profit for funding critical OSS. Data collection would be challenging though, as would callibrating need.

Basically make a rigorous score to track some of the intuition from https://xkcd.com/2347/


There is one, though, focused on security: https://openssf.org/projects/criticality-score/


Sounds like tidelift


This is interesting if true, but without data I can't take this claim at face value.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: