More

dcsommer · 2026-05-02T18:58:28 1777748308

We must not continue to develop media codecs in memory unsafe languages. Small, auditable sections can opt-out perhaps, but choosing default-unsafe for this type of software is close to professional negligence.

fguerraz · 2026-05-02T19:14:43 1777749283

Cryptography and video codecs are notable exceptions, they put a lot of effort to making the code provably memory safe: no recursion, limited use of stack variables, no dynamic allocations, etc. As a result, memory safe languages bring nothing but trouble by making it non deterministic, that’s especially true for crypto where compiler “optimisations” guarantee you side channels attacks.

WhatIsDukkha · 2026-05-02T20:59:56 1777755596

Thank you for mentioning this.

I wonder IFF Rust had an effects system that a Jasmin MIR transform (ie like SPIRV is for shaders) would be useful?

https://github.com/jasmin-lang/jasmin

astrange · 2026-05-02T22:43:19 1777761799

Video codecs just don't need to do dynamic allocations because it's not relevant to the problem. There's still certainly plenty of opportunities for memory bugs because there's a lot of pointer math.

dcsommer · 2026-05-03T21:53:32 1777845212

How is this POV compatible with the exploitable vulnerabilities, caused by memory safety, found in openh264, x264, dav1d, and practically every video decoder out there?

izacus · 2026-05-04T11:04:18 1777892658

Easily. It's a tradeoff.

simonask · 2026-05-02T22:51:41 1777762301

What in the world do you mean by “non-deterministic”?

C compilers, Rust compilers, and assemblers are all deterministic.

fguerraz · 2026-05-03T06:19:44 1777789184

In cryptography, you want operations to run in constant time, even if it’s wasteful, otherwise an attacker could guess information about the key or plaintext by measuring execution times.

Modern compilers are extremely clever and will produce machine code that takes full advantage of modern CPU branch predictors, and reorder instructions to better take advantage of pipelining. This in itself will make the same code run at different speeds depending on the input data.

Then there is the whole issue of compiler version roulette. As a developer you have no idea which version of compilers your users and distros will use, and what new and wonderful optimisation they will bring.

simonask · 2026-05-03T12:32:48 1777811568

I know that, but none of that makes the compiler output non-deterministic.

Determinism does not mean “easy to predict”, it just means “predictable”.

adgjlsfhk1 · 2026-05-03T02:57:45 1777777065

> C compilers, Rust compilers, and assemblers are all deterministic.

Within a version, yes, but not cross version. Different versions of GCC/Clang etc can give you completely different code.

kllrnohj · 2026-05-02T20:39:43 1777754383

For the codec itself, the majority of it is performance sensitive and often has a significant amount of assembly even, so a memory safe language doesn't change much.

However for the container/extractor... those should absolutely be in a memory safe language, and those are were a lot of the exploits/crashes are, too, as metadata is more fuzzy.

As a practical example of this see something like CrabbyAVIF. All the parser code is rust, but it delegates to dav1d for the actual codec portion

fishgoesblub · 2026-05-02T19:28:37 1777750117

Of the 3 software AV1 encoders, the only one that is fully dead is the Rust encoder (rav1e). If people truly wanted memory safe encoders/decoders, they would fund and develop them.

dataking · 2026-05-03T09:45:41 1777801541

https://github.com/memorysafety/rav1d got funded and developed. it is unfortunately a bit slower (typically by a single-digit percentage) than dav1d.

Sesse__ · 2026-05-03T10:47:21 1777805241

I can totally understand why people would want a memory-safe decoder, but a memory-safe encoder is niche. Finding a memory-safety bug in a decoder is a matter of finding a single unchecked integer field somewhere; finding a memory-safety bug in an encoder requires first finding some sort of logic bug in the encoder and then crafting an adversarial input that survives a number of highly lossy transformations.

Compare the number of CVEs against x264 (included decoders don't count!) and FFmpeg's H.264 decoder.

vlovich123 · 2026-05-02T20:08:54 1777752534

Fully dead in what sense? Seems like it still has active development to me.

fishgoesblub · 2026-05-02T20:11:27 1777752687

It hasn't had any proper quality/speed improvements in years. Only thing that has changed is updating deps and some bug fixes.

simonask · 2026-05-02T22:53:48 1777762428

Encoding is a way, way less risky thing to be doing compared to decoding.

snvzz · 2026-05-03T09:48:19 1777801699

There are many paths to memory safety, even if the one Rust project seems to be going nowhere.

There's other memory-safe languages, and there's formal verification.

e.g. seL4 favors pancake.

esseph · 2026-05-02T19:31:20 1777750280

> If people truly wanted memory safe encoders/decoders

Really? How many codecs have your neighbors contributed money for the development of, just curious.

computerbuster · 2026-05-02T20:46:12 1777754772

I think these conversations are directed by the parties funding the efforts. Example: "we (large company) want a fast AV2 decoder" -> they pay a specialized team to do it -> this team works in C for the most part, so it is done in C. If there were financial incentives to do it in Rust, they'd pay more for a Rust decoder.

esseph · 2026-05-03T15:34:44 1777822484

I'm more interested in the idea of general "people" (the commons) funding complex video encoders. I do wish that was the world we lived in, however :)

Telaneo · 2026-05-02T19:44:41 1777751081

Given Netflix's involvement with SV1-AV1, (not even that) indirectly, at least 1.

izacus · 2026-05-03T17:35:02 1777829702

Are you part of any codec development team to use "we" here?

maxloh · 2026-05-02T20:51:23 1777755083

Decoders written in Rust will be a lot slower than the equivalents in assembly.

dcsommer · 2026-02-09T05:11:18 1770613878

I think GP is simply identifying a potential popular niche that could be satisfied in a future city builder game, which seems quite on topic.

chongli · 2026-02-09T23:14:23 1770678863

Yeah, I don't want my preferred playstyle to be favoured, just treated fairly:

* Add all of the non-car transport options: walking paths (including underground and raised paths for walking between large buildings in the winter, a la PATH in Toronto), bike paths, buses, streetcars, light rail, subways, inter-city trains, high speed rail, ferries

* Add parking lots as a feature to all commercial and residential construction, require every car to be parked somewhere when not in use, but allow residents/property owners to decide whether to build parking or not

* Allow land-value taxes as an alternative to property taxes, as well as the possibility for things like street parking and pollution ordinances to give you levers to incentivize/disincentivize the construction of parking lots

* Simulate emissions appropriately from all transport methods

* Parking lots and heavy traffic should lower property values, as citizens complain about the ugliness, pollution, noise, and danger of excessive traffic

There's a whole conversation to be had about the design of games like SimCity and how it affects future urban planners, but that may be going too far afield. Still, I think it would be nice to have a game that doesn't reward car-centric planning while burying the drawbacks.

dcsommer · 2026-01-28T18:06:49 1769623609

Just for reference, Wamedia ships on the major Meta apps and on iOS, Android, Desktop, and Web platforms.

dcsommer · 2026-01-28T17:43:07 1769622187

We invested a lot into build system optimizations to bring this number down over time, although we did accept on the order of 200 KiB size overhead initially for the stdlib. We initially launched using a Gradle + CMake + Cargo with static linking of the stdlib and some basic linker optimizations. Transitioning WhatsApp Android to Buck2 has helped tremendously to bring the size down, for instance by improving LTO and getting the latest clang toolchain optimizations. Buck2 also hugely improved build times.

palata · 2026-01-28T22:49:56 1769640596

Thanks!

dcsommer · 2026-01-22T04:31:00 1769056260

Great reading to see beyond the clichéd, sanitized retellings of that era. It really makes you consider the prices paid for what some call progress.

hatmanstack · 2026-01-22T04:51:26 1769057486

That's the first piece by Didion I've read, after her death, I'd always meant to read her more. The mask-less account was refreshing, the only counter-weight to flower-power I knew about was Altamont, was getting heavy Hunter S. Thompson vibes.

dcsommer · 2025-11-16T15:35:45 1763307345

Great work by the MS team. It is great progress to shift OOB access into a controlled crash. These kinds of panic bugs are then easy to remediate, with clear stack traces, as we see in the turn around time from the report.

ekidd · 2025-11-16T18:28:02 1763317682

This is my experience as well: Writing parsers for complex file formats in Rust often leaves a few edge cases which might cause controlled panics. But controlled panics are essentially denial of service attacks. And panics have good logging, making them easy to debug. Plus, you can fuzz for them at scale easily, using tools like "cargo fuzz".

This is a substantial improvement over the status quo.

Tools like WUFFS may be more appropriate for low level parsing logic when you're not willing to risk controlled panics, however.

tialaramex · 2025-11-16T16:02:21 1763308941

That's true, but really this kind of problem screams out for the approach taken in WUFFS. Have the programmer who is Wrangling Untrusted File Formats prove that what they wrote is correct as part of that exercise.

dcsommer · 2025-11-01T16:08:26 1762013306

> basically fine

How many type confusion 0 days and memory safety issues have we had in dynamic language engines again? I've really lost count.

gpm · 2025-11-01T16:22:03 1762014123

Are you counting ones that involve running malicious code in a sandbox and not just trusted code on untrusted input? Because then I'd agree, but that's a much harder and different problem.

My impression is that for the trusted code untrusted input case it hasn't been that many, but I could be wrong.

bigyabai · 2025-11-01T17:44:41 1762019081

It depends, what language was the sandbox written in?

gpm · 2025-11-01T17:47:53 1762019273

Sandboxes are difficult independent of language, see all the recent speculation vulnerabilities for instance. Sure, worse languages make it even harder, but I think we're straying from the original topic of "python/ruby" by considering sandboxes at all.

zahlman · 2025-11-02T01:59:27 1762048767

How many ways to cause a segmentation fault in CPython, that don't start with deliberate corruption of the bytecode, are you aware of?

How is "type confusion" a security issue?

dcsommer · 2025-10-04T01:38:29 1759541909

Is there a straightforward path to building Zig with polyglot build systems like Bazel and Buck2? I'm worried Zig's reliance on Turing complete build scripts will make building (and caching) such code difficult in those deterministic systems. In Rust, libraries that eschew build.rs are far preferable for this reason. Do Zig libraries typically have a lot of custom build setup?

rockwotj · 2025-10-04T02:20:57 1759544457

For bazel:

https://github.com/aherrmann/rules_zig

Real world projects like ZML uses it:

https://github.com/zml/zml

esjeon · 2025-10-04T14:50:12 1759589412

FYI, build scripts are completely optional. Zig can build and run individual source code files regardless of build scripts (`build.zig`). You may need to decipher the build script to extract flags, but that's pretty much it. You can integrate Zig into any workflow that accepts GCC and Clang. (Note: `zig` is also a drop-in replacement C compiler[1])

[1]: https://andrewkelley.me/post/zig-cc-powerful-drop-in-replace...

dcsommer · 2025-09-12T04:54:59 1757652899

It would be cool to build a "library clout" measure for all open source software. First collect for all deployed software systems measures of usage per platform and along other interesting dimensions like how that system relates to others (is it a common dependency or platform for other deployed software). Use this to generate "clout" at a deployed software unit level. Then detect all open source libraries compiled in it by binary signature matching or through the software's own build system if it is open. Then a library's "clout" is built from the clout of the projects that use it.

This clout score might be used to guide investments in a non-profit for funding critical OSS. Data collection would be challenging though, as would callibrating need.

Basically make a rigorous score to track some of the intuition from https://xkcd.com/2347/

phi-go · 2025-09-12T05:08:36 1757653716

There is one, though, focused on security: https://openssf.org/projects/criticality-score/

soulcutter · 2025-09-12T06:31:24 1757658684

Sounds like tidelift

dcsommer · 2025-08-30T02:36:49 1756521409

This is interesting if true, but without data I can't take this claim at face value.