Hacker Newsnew | past | comments | ask | show | jobs | submit | reitzensteinm's commentslogin

I'm a little nervous about the correctness of the memory orderings in this project, e.g.

Two acquires back to back are unnecessary here. In general, fetch_sub and fetch_add should give enough guarantees for this file in Relaxed. https://github.com/frostyplanet/crossfire-rs/blob/master/src...

Congest is never written to with release, so the Acquire is never used to form a release chain: https://github.com/frostyplanet/crossfire-rs/blob/dd4a646ca9...

The queue appears to close the channel twice (once per rx/tx), which is discordant with the apparent care taken with the fencing. https://github.com/frostyplanet/crossfire-rs/blob/dd4a646ca9...

The author also suggests an incorrect optimization to Tokio here which suggests a lack of understanding of what the specific guarantees given are: https://github.com/tokio-rs/tokio/pull/7622

The tests do not appear to simulate the queue in Loom, which would be a very, very good idea.

This stuff is hard. I almost certainly made a mistake in what I've written above (edit: I did!). In practice, the queue is probably fine to use, but I wouldn't be shocked if there's a heisenbug lurking in this codebase that manifests something like: it all works fine now, but in the next LLVM version an optimization pass is added which breaks it on ARM in release mode, and after that the queue yields duplicate values in a busy loop every few million reads which is only triggered on Graviton processors.

Or something. Like I said, this stuff is hard. I wrote a very detailed simulator for the Rust/C++ memory model, have implemented dozens of lockless algorithms, and I still make a mistake every time I go to write code. You need to simulate it with something like Loom to have any hope of a robust implementation.

For anyone interested in learning about Rust's memory model, I can't recommend enough Rust Atomics and Locks:

https://marabos.nl/atomics/


> The tests do not appear to simulate the queue in Loom, which would be a very, very good idea.

Loom is apparently this: https://github.com/tokio-rs/loom I've used tokio a bit in the past, but wasn't aware of that tool at all, looks really useful and probably I'm not alone in never hearing about it before. Any tips&tricks or gotchas with it one should know beforehand?


I'm not going to thumb my nose at CPU design content from folks that aren't good at public speaking. They're almost entirely distinct skill sets.


Also, the Venn Diagram between (good public speech) and (good public speech which also looks good when transcribed) is probably pretty thin.


It's too early to tell.


This comment is a nugget of gold - I hadn't thought about it in those terms before but it makes total sense. Thank you!


One of my favourite YouTubers, Matt Orchard, did a video on cults that included Heaven's Gate. He interviews a surviving member. If this article interested you, it's worth a watch (the beginning is a bit silly):

https://www.youtube.com/watch?v=L9F-vb7s3DE


OpenAI automatically caches prompt prefixes on the API. Caching an infrequently changing internally controlled system prompt is trivial by comparison.


See Tokio's Loom as an example: https://github.com/tokio-rs/loom

In development, you import Loom's mutex. In production, you import a regular mutex. This of course has zero overhead, but the simulation testing itself is usually quite slow. Only one thread can execute at a time, and many iterations are required.


This is a misreading of their website. On the left, they compare the EPYC 9965 (launched 10/10/24) with the Xeon Platinum 8280 (launched Q2 '19) and make a TCO argument for replacing outdated Intel servers with AMD.

On the right, they compare the EPYC 9965 (launched 10/10/24) with the Xeon Platinum 8592+ (launched Q4 23), a like for like comparison against Intel's competition at launch.

The argument is essentially in two pieces - "If you're upgrading, you should pick AMD. If you're not upgrading, you should be."


It’s true that they compare to different Intel CPUs in different parts of the webpage, and I don’t always understand the intentions behind those comparisons.

Still, if you decode the unreadable footnotes 2 & 3 in the bottom of the page - a few things stand out: avoiding AMX, using CPUs with different core-counts & costs, and even running on a different Linux kernel version, which may affect scheduling…


Or maybe Quack III: Arena. https://m.slashdot.org/story/21054


Ooh, I remember this, but actually the thing is older than it.

First, nVidia and ATI used executable names for detecting games, then they started to add heuristics.

If you think they stopped the practice, you're very mistaken. Every AMD and nVidia driver has game and app specific fixes and optimizations.

nVidia cheated in 3D Mark that way, so they patched/changed their benchmark to prevent it. Also, again nVidia, patched their drivers so some of the more expensive but visually invisible calls like scene flushes in a particular game is batched (e.g. do all 50 flushes at the 50th call) to prevent the game becoming a slide show on expensive hardware.

This is also why AMDs and Intel's open source drivers under Linux a success, because they are vanilla drivers written from scratch per spec, and if your code calls OpenGL/Vulkan to spec, then you're golden.

Even some companies cross compile AMD's Linux drivers for windows on embedded systems since they're free from useless optimizations from them.


Aah, that brings back memories...

Interestingly, most benchmark controversies back in the day are now expected behaviour, i.e. game-specific optimizations with no (well, in this age of upscalers and other lossy optimization techniques, probably even somewhat) visible image degradation. A gaming-specific driver with no game-specific improvements in its changelog would be considered strange, and it very much works with executable detection.

Back in the day, there was still the argument that drivers should not optimize for benchmarks even when visually identical, because it wouldn't show the hardware's real world potential. Kinda cute from today's perspective. :)

But of course there were the obvious cases...

The Quack3 lowering filtering quality as shown above, of course (at least that one was put into the driver as a togglable setting later on).

But the most cheeky one has to be nVidia's 3dmark03 "optimizations", where they blatantly put static clip planes into the scenes so that everything outside the predefined camera path from the benchmark sequence would simply be cut from the scene early (which e.g. fully broke the freelook patched into 3dmark and would generally break any interactive application)


You beat me to it. Grrr...

Just kidding, nice to see another person who remembers these things. Want some root beer?


Now I want a Quake shooter but with ducks.


Not ducks, but chickens, was very popular in Germany back in the day: https://en.wikipedia.org/wiki/Crazy_Chicken


Oh wow, that was a blast from the past. The Moorhuhn craze!

Many people, including me, didn't have an internet connection back in the day. The Sneakernet went into overdrive so get everyone a copy!


A Duck Hunt, if you will…


I think that was the first case (to go public), but I remember reading about this in game magazines a couple times after this, for both ATI and nvidia.


But humans do quite intelligently pathfind around objects they're aware of, and update their mental models with new information. The front door is locked? I'll go around the back.

You can achieve exactly what you're describing by hiding information entities do not have from their pathfinding.

Graphs aren't the problem, and thinking along those lines won't get you where you're trying to go.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: