More

kevincox · 2026-03-26T14:46:38 1774536398

Random idea: If you have a known sentinel value for empty could you avoid the reader needing to read the writer's index? Just try to read, if it is empty the queue is empty, otherwise take the item and put an empty value there. Similarly for writing you can check the value, if it isn't empty the queue is full.

It seems that in this case as you get contention the faster end will slow down (as it is consuming what the other end just read) and this will naturally create a small buffer and run at good speeds.

The hard part is probably that sentinel and ensuring that it can be set/cleared atomically. On Rust you can do `Option<T>` to get a sentinel for any type (and it very often doesn't take any space) but I don't think there is an API to atomically set/clear that flag. (Technically I think this is always possible because the sentinel that Option picks will always be small even if the T is very large, but I don't think there is an API for this.)

loeg · 2026-03-26T15:03:56 1774537436

Yeah, or you could put a generation number in each slot adjacent to T and a read will only be valid if the slot's generation number == the last one observed + 1, for example. But ultimately the reader and writer still need to coordinate here, so we're just shifting the coordination cache line from the writer's index to the slot.

kevincox · 2026-03-26T15:08:01 1774537681

I think the key difference is that they only need to coordinate when the reader and writer are close together. If that slows one end down they naturally spread apart. So you don't lose throughput, only a little latency in the contested case.

loeg · 2026-03-26T15:10:25 1774537825

> I think the key difference is that they only need to coordinate when the reader and writer are close together.

This was already the case with the cached index design at the end of the article, though. (Which doesn't require extra space or extra atomic stores.)

kevincox · 2026-03-26T15:51:30 1774540290

That's a good point. They are very similar. I guess the sentinel design in theory doesn't need to synchronize at all as long as there is a decent buffer between them. But the cached design synchronizes less commonly the more space there is which sounds like it would be very similar in practice. The sentinel design might also have a few thrashing issues when the reader and writer are on the same page which would probably be a bit less of an issue with the cached index design.

kevincox · 2026-03-22T23:30:07 1774222207

Programming in assembly isn't really "hard" it mostly takes lots of discipline. Consistency and patterns are key. The language also provides very little implicit documentation, so always document which arguments are passed how and where, what registers are caller and callee saved. Of course it is also very tedious.

Now writing very optimized assembly is very hard. Because you need to break your consistency and conventions to squeeze out all the possible performance. The larger "kernel" you optimize the more pattern breaking code you need to keep in your head at a time.

sroerick · 2026-03-24T13:31:05 1774359065

This makes sense and it's really that last step. It's one thing to do pattern matching or bit flipping routines. It's a whole different ballgame to build a game engine. Maybe if I knew gamedev better I wouldn't be as intimidated by it, but it really does seem like a herculean task.

I think it'd be cool to do assembler on a Pi Pico or something, that seems like it would be a fun exercise.

kevincox · 2026-03-22T14:45:42 1774190742

But it is about code syntax. Languages like Haskell make it part of the language by only supporting single-argument functions. So currying is the default behaviour for programmers.

I think you are focusing on the theoretical aspect of partial application and missing the actual argument of the article which having it be the default, implicit way of defining and calling functions isn't a good programming interface.

bbkane · 2026-03-22T14:54:25 1774191265

Similar to how lambda calculus "just is" (and it's very elegant and useful for math proofs), but nobody writes non-trivial programs in it...

tromp · 2026-03-22T18:21:16 1774203676

Make that almost nobody.

I wrote a non-trivial lambda program [1] which enumerates proofs in the Calculus of Constructions to demonstrate [2] that BBλ(1850) > Loader's Number.

[1] https://github.com/tromp/AIT/blob/master/fast_growing_and_co...

[2] https://codegolf.stackexchange.com/questions/176966/golf-a-n...

kevincox · 2026-03-21T22:49:00 1774133340

It's still not very useful to hide the length. If you don't know the length and just start guessing with passwords of length 0 it only adds about 1/N extra guesses where N is the alphabet size compared to guessing strictly the right length. So it is a very small savings to know the password length.

It might matter a bit more for dictionary-based attacks (you don't have to bother hashing dictionary permutations that don't match the expected length) but I still suspect it doesn't save you much.

aidenn0 · 2026-03-21T23:19:34 1774135174

That's only for targeted attacks.

For opportunistic attacks, this could help you identify those with short passwords and only attack them. This is a factor of N speedup where N is the pool of people you are interested in attacking.

kevincox · 2026-03-19T23:06:14 1773961574

I think the more important aspect is that people will have 24h to slow down, think, and realize that they are being scammed. Urgency and pressure is one of the top tactics used by scammers.

Scammers will definitely call back the next day to continue. But it is quite possible that by then the victim has realized, or talked to someone who helped them realize that they are being scammed.

dminik · 2026-03-19T23:25:38 1773962738

There's been some reporting recently where I live about a case of some woman being scammed.

She went to a bank to transfer the scammer money. They told her no. She came back the next day. The police got involved and explained everything to her. Then she came back the next day. After that, she apparently found another location which let her transfer the money.

There's basically zero chance a 24 hour (or any amount of a) cool off period will help these people.

kevincox · 2026-03-19T23:29:11 1773962951

Just because you have one example of someone who would not realize doesn't mean that the number of people who would realize is zero.

dminik · 2026-03-19T23:33:18 1773963198

It's not one example. The scammers purposefully target people like these. That's their business.

Like, I'm sure there's a small amount of people who normally wouldn't get scammed but fall for it in a panic. But, is that really such a big concern for Google that they absolutely must continue stripping user freedoms from us? Is the current 30s popup which needs 3 confirmations not enough? Will the new one really work?

kevincox · 2026-03-19T23:45:16 1773963916

Yes the most likely to fall are going to be targeted, but if you make that group of people 90% smaller with a delay that is still beneficial.

Whether the feature is beneficial overall is a different story. But helping some people is great even if it doesn't help everyone.

dminik · 2026-03-20T00:05:19 1773965119

> helping some people is great even if it doesn't help everyone

It's kind of funny, but I very much agree with this. It's just in this case, it's hurting everyone (in ways most don't even realize) so that you can help a few people.

It's like putting everyone in prison, because some people might commit a crime and this would save some victims. A bit of an overreaction, no?

johnnyanmac · 2026-03-20T01:24:48 1773969888

I'm not convinced it's 90% smaller.

>Whether the feature is beneficial overall is a different story.

It's the entore story in my eyes. Hell paved with good intentions (and I don't even think Google's intentions are good).

kevincox · 2026-03-18T18:36:42 1773859002

Completely free as in you don't have to give them money.

But you need to give personal information which also has value.

schmookeeg · 2026-03-18T18:41:56 1773859316

More personal information than you provide them to purchase the ticket to use the free starlink?

tjoff · 2026-03-18T19:13:01 1773861181

Regardless one of the conditions surely is giving them permissions to sell this to starlink as and everyone else. So whether the information is the same is probably irrelevant, how they are using it is.

kevincox · 2026-03-18T18:43:41 1773859421

Probably, because you are now associating your internet browsing with your personal information. (I don't know if they have the sophistication to actually do this, but it is very possible.)

theultdev · 2026-03-18T19:15:45 1773861345

The people concerned with that hypothetical can use a VPN.

At most they could see domains, ip addresses, timestamps, and http-only sites (are there any left?)

But the person sitting next to you can see everything.

LoganDark · 2026-03-19T03:08:08 1773889688

> But the person sitting next to you can see everything.

Privacy filters are a thing.

theultdev · 2026-03-19T18:42:19 1773945739

you're literally an inch away from someone.

they are essentially looking head on at it.

that may work for business class, but not economy.

privacy filters aren't magic.

raw_anon_1111 · 2026-03-19T02:03:34 1773885814

You realize you have to give them the same informaron to even step foot on the plane?

chronic20001 · 2026-03-18T21:20:08 1773868808

> But you need to give personal information which also has value.

You also give up your personal information when you step outside, take a bus, train, or drive a car.

Hell, even if you stay at home, you are giving up information for free that you are NOT outside.

bs7280 · 2026-03-19T15:25:06 1773933906

That's why I exclusively pay cash and don't show an ID when I fly /s

kevincox · 2026-03-18T17:12:40 1773853960

I actually disagree with Rule 3! While numbers are usually small being fast on small cases generally isn't as important as performing acceptably on large cases. So I prefer to take the better big-O so that it doesn't slow down unacceptably on real-world edge-case stresses. (The type of workloads that the devs often don't experience but your big customers will.)

Of course there is a balance to this, the engineering time to implement both options is an important consideration. But given both algorithms are relatively easy to implement I will default to the one that is faster at large sizes even if it is slower at common sizes. I do suspect that there is an implicit assumption that "fancy" algorithms take longer and are harder to implement. But in many cases both algorithms are in the standard library and just need to be selected. If this post focused on "fancy" in terms of actual time to implement rather than speed for common sizes I would be more inclined to agree with it.

I wrote an article about this a while back: https://kevincox.ca/2023/05/09/less-than-quadratic/

dkarl · 2026-03-19T16:49:54 1773938994

I think it's important to think about architectural and domain bounds on problems and check if the big-O-optimal algorithm ever comes out on top. I remember Bjarne Stroustrup did a lecture where he compared a reasonably-implemented big-O-optimal algorithm on linked lists to a less optimal algorithm using arrays, and he used his laptop to test at what data size the big-O-optimal algorithm started to beat the less optimal algorithm. What he found was that the less optimal algorithm beat the big-O-optimal algorithm for every dataset he could process on the laptop. In that case, architectural bounds meant that the big-O-optimal algorithm was strictly worse. That was an extreme case, but it shows the value of testing.

Domain bounds can be dangerous to rely on, but not always. For example, the number of U.S. states is unlikely to change significantly in the lifetime of your codebase.

Jensson · 2026-03-18T18:05:16 1773857116

Rule 3 was true 1989, back then computers were so slow and had barely any ram so most things you did only was reasonable for small number of inputs. Today we almost always have large amounts of inputs so its different.

danielmarkbruce · 2026-03-18T19:09:34 1773860974

This very much depends on where you work... and basically isn't true for most people. It's extremely true for some people.

cjwoodall · 2026-03-18T20:14:39 1773864879

Rule 3 is still very much real. Fancy fast algorithms often have other trade-offs. The best algorithm for the job is the one that meets all requirements well... Big-O is one aspect, data is another, determinism of the underlying things that are needed (dynamic memory allocation, etc) can be another.

It is important to remember that the art of sw engineering (like all engineering) lives in a balance between all these different requirements; not just in OPTIMIZE BIG-O.

danielmarkbruce · 2026-03-18T22:06:01 1773871561

Sure but the default (and usually correct) assumption when working at google (as an example) is basically "all numbers are big", so you you have to cluey about algorithms and data structures and not default to brute forcing something.

At 99% of shops it should be the other way around .

jltsiren · 2026-03-18T22:48:07 1773874087

Even when you are working with large numbers, most numbers are usually small. Most of the code is probably not dealing with the large things, and a large thing may consist of a large number of instances that are individually small.

I've personally found it useful to always have concrete numbers in mind. An algorithm or data structure designed for N will probably be fine for N/10 and 10N, but it will often be inefficient for N/1000 or 1000N.

kevincox · 2026-03-18T15:16:55 1773847015

That's not quite true. Only one pixel is being activated at a time but the phosphors continue to emit light for many pixels. In practice you get a handful of lines lit to varying degrees at at time. Maybe 1-2 lines quite brightly lit and then a trail of lines that are fading pretty significantly (but still emitting light). They yes, our persistence of vision fills in the rest to provide the appearance of a fully lit screen.

This video has some great slow-mo of CRTs https://www.youtube.com/watch?v=3BJU2drrtCM&t=160s

kevincox · 2026-03-10T10:38:28 1773139108

The scale on the left was also very stuttery. Even when scrolling slow I could see the distance at the bottom updating at a very high frame rate and the scale on the left only moved occasionally which felt awful.

dolin_ch · 2026-03-10T12:10:31 1773144631

I'll make a note of that and try to fix it, thank you.

kevincox · 2026-03-09T15:10:50 1773069050

I find it hard to be too upset, better late than never. Would it have been better to upstream shortly after they wrote the code? Yes. Would it have been better if they also made a sizable contribution to fmmpeg? Yes. But at the end of the day they did contribute back valuable code and that is worth celebrating even if it was done purely because of the benefit to them. Let's hope that this is a small step and they do even more in the future.

EdNutting · 2026-03-09T15:17:18 1773069438

As I said, the contribution is good, it's the communication via this blog post that I don't entirely like. It could have been different. It could have acknowledged better ways of engaging with ffmpeg (that would've benefitted both Meta and ffmpeg/the community, not _just_ ffmpeg).

But corporate blog posts often go this way. I'm not mad at them or anything. Just a mild dislike ;)

kevincox · 2026-03-09T15:20:34 1773069634

Yeah, I see what you mean. It basically shows that they contributed to ffmpeg purely because it helped them, but then they wrote this post to get good will for that contribution.

EdNutting · 2026-03-09T15:23:51 1773069831

:thumbs-up:

arcfour · 2026-03-09T15:55:29 1773071729

I'm glad to know that outcomes are affected by having pure intentions. /s

pyrolistical · 2026-03-09T19:30:55 1773084655

I’ll take it. Metas purpose isnt to help the community, it’s to make money. Sucks to hear that out loud, but that is how capitalism works.

But you can use that to steer Meta. Explain how doing x (which also helps the community) makes them more money.