More

exDM69 · 2026-02-27T13:13:05 1772197985

Nice and concise description of quadtrees implemented with classical pointer chasing data structures.

A faster and (arguably) simpler way to construct quad/octrees is using morton codes, sorting and searching with flat arrays [0]. It's also probably easier to implement.

The gist of it is that you quantize the coordinates, do bit interleaving to construct morton code and then sort. The sorting is using numerical keys so you can use a radix sort for O(n) complexity which is much faster on a GPU (but on single CPU core a comparison based sort will probably win if the array isn't huge).

Now everything on the top half of the space is in the beginning of the array and the bottom half is in the end of the array (assuming yxyxyxyx morton code bits). Each of those halves is then split left/right and then each level below that alternates between vertical and horizontal splits.

To find the split point in the array, look at the first and last entry of the (sub)array you're looking at, and look for the first differing bit with (first ^ last).leading_zeros(). Then binary search for the first entry where that bit is high.

To traverse the quad/octree, repeat this process with the two halves you found. This can be done without recursion in O(1) memory using a fixed size stack because you know the depth of the "recursion" is at most half the number of bits in the morton code.

If you used radix sorting for building the array, you can avoid the binary search if you store the histograms from the counting phase of the sorting. Storing the whole histogram may be too much, but just a few highest order bits can already help.

Although I've found experimentally that for small (less that 10000 objects) just sorting and searching is faster if the whole array fits in L2 cache. On my 2015 laptop a single core can sort 10k objects with 64 bit keys in 1 millisecond. Traversing the tree for frustum culling is about 5x faster than just going through the entire array because a lot of geometry can be discarded very quickly.

With a good comparison based sort rebuilding the array after some changes is mighty fast because modern sorting algorithms are much faster for "almost sorted" inputs.

For range queries ("Find everything in the region" in the article), you can probably get better performance by using the BIGMIN/LITMAX method [1].

Now here's a brain teaser to delight your Friday: the article and the method I describe above is for storing points/centroids, but often you need to store axis aligned boxes instead (AABB). There is a very clever trick for this that I discovered independently but later found in some research papers. Can you come up with a (very) small change in the algorithm above to extend it to AABBs instead of centroids?

[0] Karras: Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees - https://research.nvidia.com/sites/default/files/pubs/2012-06... [1] https://en.wikipedia.org/wiki/Z-order_curve#Use_with_one-dim...

mmorse1217 · 2026-02-27T18:55:34 1772218534

Discretize the AABB with a certain length proportional to the morton code grid size so that each sample point will land in a distinct but continuous sequence of morton codes enclosing the AABB, compute the hash of each of those points, then dump all of those morton codes into the sort and reverse lookup the other AABBs with the same morton code :)

The problem comes at scale, when you have many AABBs with different sizes. Memory pressure can be punishing and you are bottlenecked by the network for distributed solutions. Is there a faster way with fewer points? I would love to know!

exDM69 · 2026-02-28T10:30:33 1772274633

Good attempt, but it's simpler than that.

For each AABB, you take the morton code of the minimum and maximum and store them in the array.

But for sorting and searching, construct a sort key from the pair of morton codes so that the most significant bits are the common prefix of the two morton codes and the least significant bits are the "level" (length of common prefix in bits).

E.g. for a 3d octree with 64 bit keys, you'd have 1 unused bit, 57 = 19 * 3 bits of morton code prefix and 6 bits of level. A 2d quadtree with 32b keys would have 1 unused bit, 26 = 2 * 13 bits of prefix and 5 bits of level.

You could also store just the sort key, but I've stored both AABB morton codes so that I can reconstruct the quantized AABBs for culling.

When you sort by this key, you'll get a linear octree where the beginning of the array is all the AABBs that overlap the subdivision plane, then everything on the left side of the plane and the rest is right side of the plane. When traversing the tree, you need two searches per node to find the entries belonging to this node and to find the split point between the left and the right side.

Having the "level" in the least significant bits works as a tiebreaker in the sorting to ensure that entries closer to the root of the tree are in the beginning of the array.

mmorse1217 · 2026-03-01T16:18:00 1772381880

Wow thats very clever with much better constants; thanks for this! I will need to give this a try.

pixelpoet · 2026-02-27T14:46:03 1772203563

I wrote about this solution vaguely the last time this link came up (wasn't that long ago either, I think?) and you've filled in all the details and the essential Karras paper link; great post :)

mandarax8 · 2026-02-27T18:02:20 1772215340

I give up, tell me the AABB trick please

exDM69 · 2026-02-28T10:31:17 1772274677

I have shared the trick in this thread above.

exDM69 · 2026-02-19T11:26:29 1771500389

> notorious Vulkan shader compilation lag spikes.

Vulkan gives all the tools to avoid any "lag spikes" from shader compiling. In fact, causing them is much more difficult than OpenGL where they could happen in surprising places (and only on certain hardware).

The issue is two fold: 1. Some engines produce a lot of shader permutations. Some AAA titles can have 60000 different shaders compiled. 2. Some GPU rasterizer states (such as color blending) are implemented as shader epilogues.

In Vulkan 1.0 almost all of the pipeline state had to be pre-baked into a pipeline state object compiled ahead of time. This lead to a "shader permutation explosion" where different states need different pipelines.

This requires the game engine to either a) compile all the pipeline combinations ahead of time (slow loading time) or b) compile them as needed (lag spikes).

The core issue for this was solved years ago and now most of the pipeline states can be added to the command buffers ("dynamic states"). This solves the permutation explosion. But at the same time it opens the door for issue 2: some states (blending in particular) can cause a state-based recompile (like ye olde OpenGL days) at runtime.

The only solution to the second problem is not to use the dynamic states that trigger recompiling. That's basically only blending as far as I know. You can't even have dynamic blend state on all GPUs.

For maximum developer flexibility there's the shader object extension that allows mixing and matching shaders and pipeline states any way you want. This will cause state based recompiles at unpredictable times but it's an opt-in feature and easy to avoid if lag spikes are not wanted.

tl;dr: shader recompilation is easy to avoid in Vulkan but porting legacy engine code or art content may take you off the happy path.

exDM69 · 2026-02-11T08:23:34 1770798214

> or you leave performance on the floor

The dynamic rendering local read feature is for this. You can get tiler gpus working on the happy path without the need for verbose render passes.

But mobile driver support being what it is, it'll take time until it's widespread in consumer devices (which don't get driver updates).

exDM69 · 2026-02-11T08:13:39 1770797619

> look up the extension on vulkan.gpuinfo.org and see ... currently 0.3% of all devices support it.

Afaik the extension isn't even finalized yet and they are pre-releasing it to gather feedback.

And you can't use gpuinfo for assessing how widely available something is or isn't. The stats contain reports from old drivers too so the numbers you see are no indication of hardware support.

To assess how widely supported something is, you need to look at gpuinfo, sort by date or driver version and cross reference something like steam hardware survey.

exDM69 · 2026-02-10T16:21:51 1770740511

That's the right way to go for simple use cases and especially getting started on a new project.

exDM69 · 2026-02-10T14:16:10 1770732970

You can use descriptor heaps with existing bindless shaders if you configure the optional "root signature".

However it looks like it's simpler to change your shaders (if you can) to use the new GLSL/SPIR-V functionality (or Slang) and don't specify the root signature at all (it's complex and verbose).

Descriptor heaps really reduce the amount of setup code needed, with pipeline layouts gone you can drop like third of the code needed to get started.

Similar in magnitude to dynamic rendering.

flohofwoe · 2026-02-10T15:10:37 1770736237

Having quite recently written a (still experimental) Vulkan backend for sokol_gfx.h, my impression is that starting with `VK_EXT_descriptor_buffer` (soon-ish to be replaced with `VK_EXT_descriptor_heap`), the "core API" is in pretty good shape now (with the remaining problem that all the outdated and depreciated sediment layers are still part of the core API, this should really be kicked out - e.g. when I explicitly request a specific API version like 1.4 I don't care about any features that have been deprecated in versions up to 1.4 and I don't care about any extensions that have been incorporated into the core API up until 1.4, so I'd really like to have them at least not show up in the Vulkan header so that code completion cannot sneak in outdated code (like EXT/KHR postfixes for things that have been moved into core).

The current OpenGL-like sediment-layer-model (e.g. never remove old stuff) is extremely confusing when not following Vulkan development very closely since 2016, since there's often 5 ways to do the same thing, 3 of which are deprecated - but finding out whether a feature is deprecated is its own sidequest.

What I actually wrestled with most was getting the outer frame-loop right without validation layer errors. I feel like this should be the next thing which the "Eye of Khronos" should focus on.

All official tutorial/example code I've tried doesn't run without swapchain-sync-related validation errors on one or another configuration. Even this 'best practices' example code which demonstrates how to do the frame-loop scaffolding correctly produces valiation layer errors, so it's also quite useless:

https://docs.vulkan.org/guide/latest/swapchain_semaphore_reu...

What's worse: different hardware/driver combos produce different validation layer errors (even in the swapchain-code which really shouldn't have different implementations across GPU vendors - e.g. shouldn't Khronos provide common reference code for those GPU-independent parts of drivers?). I wonder if there is actually any Vulkan code out there which is completely validation-layer-clean across all possible configs (I seriously doubt it).

Also the VK_[EXT/KHR]_swapchain_maintenance1 extension which is supposed to fix all those little warts has such a low coverage that it's not worth supporting (but it should really be part of the core API by now - the extension is from 2019).

Anyway... baby steps into the right direction, only a shame that it took a decade ;)

reactordev · 2026-02-10T15:53:09 1770738789

Vulkan is by far the most powerful and the most pain in the ass API I've ever worked with. I agree on every point you just made.

jorvi · 2026-02-10T17:31:49 1770744709

Isn't the idea that 99% of people use a toolkit atop of Vulkan?

Like, these days game devs just use Unreal Engine, which abstracts away having to work with the PS5 / PS4, DirectX 12, and Vulkan APIs.

I imagine unless it's either for A. edification or B. very bespoke purpose code, you're not touching Vulkan.

flohofwoe · 2026-02-10T17:47:39 1770745659

> Isn't the idea that 99% of people use a toolkit atop of Vulkan?

This idea creates a serious chicken-egg-problem.

Two or three popular engine code bases sitting on top of Vulkan isn't enough 'critical mass' to get robust and high performance Vulkan drivers. When there's so little diversity in the code hammering on the Vulkan API it's unlikely that all the little bugs and performance problems lurking in the drivers will be triggered and fixed, especially when most Unity or Unreal game projects will simply select the D3D11 or D3D12 backend since their main target platform on PC is Windows.

Similar problem to when GLQuake was the only popular OpenGL game, as soon as your own code used the GL API in a slightly different way than Quake did all kinds of things broke since those GL drivers only properly implemented and tested the GL subset used by GLQuake, and with the specific function call patterns of GLQuake.

From what I've seen so far, the MESA Vulkan drivers on Linux seem to be in much better shape than the average Windows Vulkan driver. The only explanation I have for this is that there are hardly any Windows games running on top of Vulkan (instead they use D3D11 or D3D12), while running those same D3D11/D3D12 games on Linux via Proton always goes through the Vulkan driver. So on Linux there may be more 'evolutionary pressure' to get high quality Vulkan drivers indirectly via D3D11/D3D12 games that run via Proton.

jorvi · 2026-02-10T20:04:52 1770753892

You might be unaware of this, but Vulkan Video Decode is slowly but surely replacing the disparate bespoke video decode acceleration on almost all platforms.

Vulkan is mature. It has been used in production since 2013 (!) in the form of Mantle. I have no idea why all the Vulkan doomsayers here think it still needs a half-to-whole decade to be 'useful'.

reactordev · 2026-02-10T19:41:02 1770752462

>”hardly any Windows games running on top of Vulkan”

I run all my windows games on Vulkan.

https://www.pcgamingwiki.com/wiki/List_of_Vulkan_games

flohofwoe · 2026-02-10T21:29:10 1770758950

280 games over 10 years really isn't impressive (2.5x less than even D3D8 which was an unpopular 'inbetween' D3D version and only relevant for about 2 years). D3D12 (890 games) isn't great either when compared to D3D11 (4.6k) or D3D9 (3.3k), it really demonstrates what a massive failure the modern 3D APIs are for real-world usage :/

I don't think those lists are complete, but they seem to show the right relative amount of 3D API usage across PC games.

reactordev · 2026-02-10T21:39:04 1770759544

I’m just pointing out that Vulkan is supported on all major modern engines, internal and public. Some also go so far as to do DX12 (fine, it’s a similar feeling API) but what’s really amazing is taking all of those games that run on OpenGL, DirectX, etc and forcing them to run on Vulkan…

Proton is amazing and Wine project deserves your support.

pjmlp · 2026-02-11T09:36:41 1770802601

What deserves our support is pushing for native Linux games, not helping to keep Windows status quo by making use of Wine and Proton.

It is no different from being happy to use Amiga, Atari, Nintendo, PlayStation, Spectrum, C64, NeoGeo emulators on Linux.

imtringued · 2026-02-11T10:17:04 1770805024

Video games are entertainment. In the old days you inserted a cartridge or optical disc into a physical device. You play the game, finish it and then move on. They are always self contained experiences with a custom UI independent of the OS.

In the best case, explicit Linux support does not affect the experience in a positive or negative way. In the worst case, explicit Linux support means the game can't be played anymore.

pjmlp · 2026-02-11T10:23:48 1770805428

> They are always self contained experiences with a custom UI independent of the OS.

Not really, as each OS and hardware provided different capabilities that made some games only possible in specific platforms.

Additionally depending on the platform, some ports were great, others were money thrown into the garbage bin.

jorvi · 2026-02-11T19:57:49 1770839869

Doing it this way actually makes games more stable on Linux. Often, Linux ports of games would be riddled with bugs because the QA just isn't worth it. Especially because desktop Linux is always in a fast flux of changes. Hence the joke that "Win32 is the only stable Linux ABI."

Now game studios can just develop for windows, work out all the bugs. And then Proton has a broad set of compatibility patches that can be applied to those Windows games.

Doing it this way also unlocks a gigantic library of old games that otherwise would have been unplayable on Linux.

So, no. No native Linux games please.

pjmlp · 2026-02-12T06:59:39 1770879579

Thus keeping Windows the main platform for game studios and gamers.

Enjoy those 3% market share.

jplusequalt · 2026-02-10T20:22:08 1770754928

>Like, these days game devs just use Unreal Engine

This is not true in the slightest. There are loads of custom 3D engines across many many companies/hobbyists. Vulkan has been out for a decade now, there are likely Vulkan backends in many (if not most) of them.

m-schuetz · 2026-02-10T17:37:47 1770745067

Many people need something in-between heavy frameworks and engines or oppinionated wrappers with questionable support on top of Vulkan; and Vulkan itself. OpenGL served that purpose perfectly, but it's unfortunately abandoned.

quantummagic · 2026-02-10T19:47:15 1770752835

Isn't that what the Zink, ANGLE, or GLOVE projects meant to provide? Allow you to program in OpenGL, which is then automatically translated to Vulkan for you.

reactordev · 2026-02-10T20:04:53 1770753893

Those are mostly designed for back porting and not new projects. OpenGL is dead for new projects.

m-schuetz · 2026-02-10T23:02:19 1770764539

I do all my new projects in OpenGL and Cuda since I'm not smart enough for Vulkan.

fc417fc802 · 2026-02-10T22:46:20 1770763580

> OpenGL is dead for new projects.

Says who? Why?

It looks long term stable to me so I don't see the issue.

reactordev · 2026-02-10T22:56:19 1770764179

DirectX 9 is long term stable so I don't see the issue...

No current gen console supports it. Mac is stuck on OpenGL 4.1 (you can't even compile anything OpenGL on a Mac without hacks). Devices like Android run Vulkan more and more and are sunsetting OpenGLES. No, OpenGL is dead. Vulkan/Metal/NVN/DX12/WebGPU are the current.

pjmlp · 2026-02-11T09:43:57 1770803037

Nintendo Switch supports OpenGL, across all its models.

https://www.khronos.org/conformance/adopters/conformant-prod...

reactordev · 2026-02-11T12:36:07 1770813367

It’s also almost 10 years old and not a current gen system.

pjmlp · 2026-02-11T13:03:35 1770815015

Owners of Switch 2 beg to differ.

And secondly that doesn't matter when we look at the amount of sold units.

Valve can only dream to sell a quarter of those numbers.

reactordev · 2026-02-13T18:57:00 1771009020

Valve owns the market, it doesn't have to...

Also, if we're talking Switch 2, you have Vulkan support on it so odds are you would choose 2x-10x performance gains over OpenGL. It isn't that powerful. My 6 year old iPhone is on par.

jorvi · 2026-02-14T16:51:51 1771087911

People generally don't realize how much more CPU efficient calls are with Vulkan, DX12 or Metal. And especially on handhelds, the SoC is always balancing its power envelope between the CPU and GPU so the more efficient you are with the CPU, the more power you can dedicate to the GPU.

fc417fc802 · 2026-02-10T23:09:26 1770764966

The aforementioned abstraction layers exist. You had dismissed those as only suitable for backporting. Can you justify that? What exactly is wrong with using a long term stable API whether via the native driver or an abstraction layer?

Edit: By the same logic you could argue that C89 is dead for new projects but that's obviously not true. C89 is eternal and so is OpenGL now that we've got decent hardware independent implementations.

jplusequalt · 2026-02-10T20:23:16 1770754996

Wasn't it announced last year that it was getting a new mesh shader extension?

m-schuetz · 2026-02-10T19:58:48 1770753528

I don't see the point of those when I can just directly use OpenGL. Any translation layer typically comes with limitations or issues. Also, I'm not that glued to OpenGL, I do think it's a terrible API, but there just isn't anything better yet. I wanted Vulkan to be something better, but I'm not going to use an API with entirely pointless complexity with zero performance benefits for my use cases.

pjmlp · 2026-02-11T09:42:02 1770802922

That is the approach Google is taking, by making Vulkan the only 3D API on Android.

See Android ANGLE on Vulkan roadmap,

https://developer.android.com/games/develop/vulkan/overview#...

reactordev · 2026-02-10T19:37:57 1770752277

No.

There are literally dozens of in-house engines that run on Vulkan. Not everything is Unreal or Unity.

sho_hn · 2026-02-10T15:38:41 1770737921

Are there any good Vulkan tutorials that are continuously updated to reflect these advancement and ease of use improvements?

It's a similar challenge to the many different historical strata of C++ resources.

jsheard · 2026-02-10T15:49:46 1770738586

https://howtovulkan.com is a recent one which targets the modern flavour of Vulkan that everything supports today.

Well, all desktop hardware and drivers at least. God help you if you want to ship on Android.

dismalaf · 2026-02-10T16:20:05 1770740405

The one on Vulkan.org recently got updated to use dynamic rendering and a bunch of the newest features (plus modern C++, Slang instead of glsl, etc...).

https://docs.vulkan.org/tutorial/latest/00_Introduction.html

positron26 · 2026-02-10T16:02:32 1770739352

Finding the optimal sub-language is about API coupling with client code, making a moving sweet spot for where bread & butter techniques live.

exDM69 · 2026-01-26T13:56:28 1769435788

The GPS jamming maps are based on commercial air traffic flying in the area.

While that gives some ideas of how widespread the jamming is, it won't give accurate information about the range (air traffic avoids areas with jamming) of the interference or any information from places where there is no commercial air traffic (war zones, etc).

exDM69 · 2026-01-26T13:34:17 1769434457

Starlink was also blocked by radio frequency interference.

Granted that can't possibly cover the entire area of the country.

exDM69 · 2026-01-11T09:23:38 1768123418

Using printf in shaders is awesome, it makes a huge difference when writing and debugging shaders. Vulkan and GLSL (and Slang) have a usable printf out of the box, but HLSL and D3D do not.

Afaik the way it works in Vulkan is that all the string formatting is actually done on the CPU. The GPU writes only writes the data to buffers with structs based on the format string.

All the shader prints are captured by tools such as Renderdoc, so you can easily find the vertex or pixel that printed something and then replay the shader execution in a debugger.

I only wish that we would've had this 20 years ago, it would have saved me so much time, effort and frustration.

pjmlp · 2026-01-11T19:49:11 1768160951

Maybe because Pix is quite good?

exDM69 · 2026-01-12T10:04:42 1768212282

Using printf and debuggers are complementary.

Finding which pixel to debug, or just dumping some info from the pixel under mouse cursor (for example) is better done with a simple printf. Then you can pick up the offending pixel/vertex/mesh/compute in the debugger if you still need it.

You get both, a debugger and printf related tooling in Renderdoc and it's better than either of those alone.

I've been writing a lot of GPU code over the past few years (and the few decades before it) and shader printf has been a huge productivity booster.

exDM69 · 2026-01-08T12:13:13 1767874393

Maybe this would be a suitable application for "Fibonacci hashing" [0][1], which is a trick to assign a hash table bucket from a hash value. Instead of just taking the modulo with the hash table size, it first multiplies the hash with a constant value 2^64/phi where phi is the golden ratio, and then takes the modulo.

There may be better constants than 2^64/phi, perhaps some large prime number with roughly equal number of one and zero bits could also work.

This will prevent bucket collisions on hash table resizing that may lead to "accidentally quadratic" behavior [2], while not requiring rehashing with a different salt.

I didn't do detailed analysis on whether it helps on hash table merging too, but I think it would.

[0] https://probablydance.com/2018/06/16/fibonacci-hashing-the-o... [1] https://news.ycombinator.com/item?id=43677122 [2] https://accidentallyquadratic.tumblr.com/post/153545455987/r...

SkiFire13 · 2026-01-08T12:47:17 1767876437

> This will prevent bucket collisions on hash table resizing

Fibonacci hashing is really adding another multiplicative hashing step followed by dropping the bottom bits using a shift operation instead of the top bits using an and operation. Since it still works by dropping bits, items that were near before the resize will still be near after the resize and it won't really change anything.

attractivechaos · 2026-01-08T14:17:41 1767881861

Exactly. And khashl uses Fibonacci hashing. Without salting, it has the same problem.