More

mbid · 2026-04-25T18:32:06 1777141926

How many systems are there that can't just spawn a thread for each task they have to work on concurrently? This has to be a system that is A) CPU or memory bound (since async doesn't make disk or network IO faster) and B) must work on ~tens of thousands of tasks concurrently, i.e. can't just queue up tasks and work on only a small number concurrently. The only meaningful example I can come up with are load balancers, embedded software and perhaps something like browsers. But e.g. an application server implementing a REST API that needs to talk to a database anyway to answer each request doesn't really qualify, since the database connection and the work the database itself does are likely much more resource intensive than the overhead of a thread.

anonymars · 2026-04-25T18:47:53 1777142873

I'm not sure this is correct mental model of what async solves

Async precisely improves disk/network I/O-bound applications because synchronous code has to waste a whole thread sitting around waiting for an I/O response (each with its own stack memory and scheduler overhead), and in something like an application server there will be many incoming requests doing so in parallel. Cancellation is also easier with async

CPU-bound code would not benefit because the CPU is already busy, and async adds overhead

See e.g. https://learn.microsoft.com/en-us/aspnet/web-forms/overview/... and https://learn.microsoft.com/en-us/aspnet/web-forms/overview/...

likeabbas · 2026-04-25T20:23:58 1777148638

I have some test code that runs a comparison of Hyper pre-async (aka thread per request) vs async (via Tokio), and the pre-async version is able to process more requests per second in every scenario (I/o, CPU complex tasks, shared memory).

I'll publish my results shortly. I did these as baselines because I'm testing finishing the User Managed Concurrency Groups proposal to the linux kernel which is an extension to provide faster kernel threads (which beat both of them)

mbid · 2026-04-26T07:17:55 1777187875

Relevant prior work: https://github.com/jimblandy/context-switch

likeabbas · 2026-04-26T13:35:21 1777210521

Thank you for this! This is really helpful.

The UMCG implementation allows kernel thread context switches to happen in 150-200 microseconds, compared to the 1500-2000 microseconds for normal kernel thread context switches. My goal is to show that if UMCG could be merged into the Linux run time then then it would be competitive with async rust without the headache.

ngrilly · 2026-04-26T15:35:46 1777217746

Did you mean nanoseconds instead of microseconds?

likeabbas · 2026-04-26T21:26:20 1777238780

Yes - brain fart.

otabdeveloper4 · 2026-04-26T02:16:01 1777169761

Async only exists because languages like Python and Javascript have global interpreter locks that don't play nice with threads.

Using async for languages like Rust or C++ is cargo cult by people who don't know what the hell they're doing.

[Caveat: there's a use case for async if you're doing embedded development where you don't have threads or call stacks at all.]

iknowstuff · 2026-04-25T21:20:57 1777152057

How many concurrent requests?

likeabbas · 2026-04-25T22:48:36 1777157316

I'll have to check my work computer on Monday. It was 8 cpu virtual machine on a m1 Mac. the UMCG and normal threads were 1024 set on the server, the Tokio version was 2 threads per core. Just from the top of my head - the I/O bound requests topped out around 40k/second for the Tokio version, 60k/second for the normal hyper version, and 80k/second for the UMCG hyper version.

I'm pretty close to being done - I'm hoping to publish the entire GitHub repository with tests for the community to validate by next week.

UMCG is essentially an open source version of Google Fibers, which is their internal extension to the linux core for "light weight" threads. It requires you to build a user space scheduler, but that allows you to create different types of schedulers. I can not remember which scheduler showed ^ results but I have at least 6 different UMCG schedulers I was testing.

So essentially you get the benefits of something like tokio where you can have different types of schedulers optimized for different use cases, but the power of kernel threads which means easy cancellation, easy programming (at least in rust). It's still a linux thread with an entire 8mb(?) stack size, but from my testing it's far faster than what Tokio can provide, without the headache of async/await programming.

mbid · 2026-04-25T18:53:49 1777143229

I read this argument ("async is for I/O-bound applications") often, but it makes no sense to me. If your app is I/O bound, how does reducing the work the (already idling!) CPU has to spend on context switching improve the performance of the system?

ndriscoll · 2026-04-25T19:04:16 1777143856

IO bound might mean latency but not throughput, so you can up concurrency and add batching, both of which require more concurrent requests in flight to hit your real limit. IO bound might also really mean contention for latches on the database, and different types of requests might hit different tables. Basically, I see people say they're IO bound long before they're at the limit of a single disk, so obviously they are not IO bound. Modern drives are absurdly fast. If everyone were really IO bound, we'd need 1/1000 the hardware we needed 10-15 years ago.

anonymars · 2026-04-25T19:04:39 1777143879

It sounds like you're assuming both pieces are running on the same server, which may not be the case (and if you're bottlenecked on the database it probably shouldn't be, because you'd want to move that work off the struggling database server)

Assuming for the sake of argument that they are together, you're still saving stack memory for every thread that isn't created. In fact you could say it allows the CPU to be idle, by spending less time context switching. On top of that, async/await is a perfect fit for OS overlapped I/O mechanisms for similar reasons, namely not requiring a separate blocking thread for every pending I/O (see e.g. https://en.wikipedia.org/wiki/Overlapped_I/O, https://stackoverflow.com/a/5283082)

mbid · 2026-04-25T19:30:15 1777145415

Right, I think the argument should be that transitioning from a synchronous to asynchronous programming model can improve the performance of a previously CPU/Memory-bound system so that it saturates the IO interface.

anonymars · 2026-04-25T20:53:56 1777150436

If the system is CPU-bound doing useful work, that's not the case. Async shines when there are a lot of "tasks" that are not doing useful work, because they are waiting (e.g. on I/O). Waiting threads waste resources. That's what async greatly improves.

charlieflowers · 2026-04-25T19:20:43 1777144843

The simplest example is that you can easily be wasteful in your use of threads. If you just write blocking code, you will block the thread while waiting on io, and threads are a finite resource.

So avoiding that would mean a server can handle more traffic before running into limits based on thread count.

pocksuppet · 2026-04-25T20:25:41 1777148741

Inversion of thought pattern: Why is a thread such a waste that we can't have one per concurrent request? Make threads less wasteful instead. Go took things in this direction.

anonymars · 2026-04-25T21:04:37 1777151077

How do you suggest we just "make threads less wasteful"?

I mean, I suppose we could move the scheduling and tracking out of kernel mode and into user mode...

But then guess what we've just reinvented?

rerdavies · 2026-04-26T14:09:45 1777212585

Colored functions. (Just thought I'd add that, because despite the rhetorical question, it took awhile for me to come up with the answer).

pocksuppet · 2026-04-26T16:59:20 1777222760

Goroutines? Java virtual threads?

YZF · 2026-04-25T21:29:45 1777152585

Pretty much anything that needs performance and has a lot of relatively light operations is not a candidate for spawning a thread. Context switching and the cost of threads is going to kill performance. A server spawning a thread per request for relatively lightweight request is going to be extremely slow. But sure, if every REST call results in a 10s database query then that's not your bottleneck. A query to a database can be very fast though (due to caches, indices, etc.) so it's not a given that just because you're talking to a database you can just spin up new threads and it'll be fine.

EDIT: Something else to consider is what if your REST calls needs to make 5 queries. Do you serialize them? Now your latency can be worse. Do you launch a thread per query? Now you need to a) synchornize b) take x5 the thread cost. Async patterns or green threads or coroutines enable more efficient overlapping of operations and potentially better concurrency (though a server that handles lots of concurrent requests may already have "enough" concurrency anyways).

layer8 · 2026-04-25T22:06:22 1777154782

Server applications don’t spawn threads per request, they use thread pools. The extra context switching due to threads waiting for I/O is negligible in practice for most applications. Asynchronous I/O becomes important when the number of simultaneous requests approaches the number of threads you can have on your system. Many applications don’t come close to that in practice.

There’s a benefit in being able to code the handling of a request in synchronous logic. A case has to be made for the particular application that it would cause performance or resource issues, before opting for asynchronous code that adds more complexity.

YZF · 2026-04-25T23:31:02 1777159862

Thread pools are another variation on the theme. But if your threads block then your pool saturates and you can't process any more requests. So thread pools still need non-blocking operations to be efficient or you need more threads. If you have thread pools you also need a way of communicating with that pool. Maybe that exists in the framework and you don't worry about it as a developer. If you are managing a pool of threads then there's a fair amount of complexity to deal with.

I totally agree there are applications for which this is overkill and adds complexity. It's just a tool in the toolbox. Video games famously are just a single thread/main loop kind of application.

acdha · 2026-04-25T23:35:56 1777160156

There’s also a really good operational benefit if you have limits like total RAM, database connections, etc. where being able to reason about resource usage is important. I’ve seen multiple async apps struggle with things like that because async makes it harder to reason about when resources are released.

tcfhgj · 2026-04-26T01:36:02 1777167362

Could you point out the issue here?

Why does async make it harder to reason about when resources are released?

acdha · 2026-04-26T02:31:29 1777170689

Basically it’s the non-linear execution flow creating situations which are harder to reason about. Here’s an example I’m trying to help a Node team fix right now: something is blocking the main loop long enough that some of the API calls made in various places are timing out or getting auth errors due to the signature expiring between when the request was prepared and when it is actually dispatched because that’s sporadically tend of seconds instead of milliseconds. Because it’s all async calls, there are hundreds of places which have to be checked whereas if it was threaded this class of error either wouldn’t be possible or would be limited to the same thread or an explicit synchronization primitive for something like a concurrency limit on the number of simultaneous HTTP requests to a given target. Also, the call stack and other context is unhelpful until you put effort into observability for everything because you need to know what happened between hitting await and the exception deep in code which doesn’t share a call stack.

tcfhgj · 2026-04-26T17:03:39 1777223019

I still don't get it.

The execution flows of individual async tasks are still linear, much like individual threads are linear.

Scheduling (tasks by the async runtime vs threads by the OS), however results in random execution order either way.

If there is a slow resource, both, async tasks as well as threads will pile potentially increasing response times.

Wether async or threads, you can easily put a concurrency limit on resources using e.g. semaphores [1]:

- limit yourself to x connections (either wait or return an error)

- limit the resource to x concurrent usages (either wait until other users leave, or return an error)

Regarding blocking the main loop: with async and non-blocking operations, how would something block the main loop? And why would the main loop being blocked cause API calls being timing out? Is it single threaded?

[1]: https://docs.rs/tokio/latest/tokio/sync/struct.Semaphore.htm...

acdha · 2026-04-26T18:43:29 1777229009

> The execution flows of individual async tasks are still linear, much like individual threads are linear.

Think about what happens:

1. Request one hits an await in foo()

2. Runtime switches to request two in bar() until it awaits

3. Runtime switches to request three in baaz(), which blocks the loop for a while

4. Request one gets a socket timeout or expired API key

That error in #4 does not tell you anything about #2 or #3, and because execution spreads across everything in that process you have to check everything. If it was a thread, you would either not have the problem at all, it would show up clearly in request three, or you’d have a clear informative failure on a synchronization primitive saying that #3 held a lock for too long.

That makes it harder to control when memory is allocated or released in garbage collected languages, too, because you have to be very careful to trigger gc before doing something which can suspend execution for a while or you’ll get odd patterns when a small but non-zero percentage of those async requests take longer than expected (i.e. load image master, create derivative, send response needs care to release the first two steps before the last or you’ll have weird behavior when a slow client takes 5 minutes to finish transferring that response).

Arguably that’s something you want to do anyway but it dramatically undercuts the simplicity benefits of async code. I’m not saying that we should all give up async but there are definitely some pitfalls which many people stumble into.

otabdeveloper4 · 2026-04-26T02:13:04 1777169584

Because async usually means you've stopped having "call stack" as a useful abstraction.

otabdeveloper4 · 2026-04-26T02:11:47 1777169507

> Context switching

No such thing. In a preemptive multitasking OS (that's basically all of them today) you will get context switching regardless of what you do. Most modern OS's don't even give you the tools to mess with the scheduler at all; the scheduler knows best.

YZF · 2026-04-26T03:52:48 1777175568

That's not accurate. Preemptive multitasking just means your thread will get preempted. Blocking still incurs additional context switching. The core your thread is running on isn't just going to sit idle while your thread blocks.

RustyRussell · 2026-04-26T03:13:26 1777173206

I agree: fork is fast, cheap and easy. If you're spawning something for significant work it tends to be in the noise.

Linux kernel uses 8k stacks (TBH, it's been a while), but there's also some copy-on-write overhead. Still, this is not the C10k problem...

ozgrakkurt · 2026-04-25T18:36:13 1777142173

Async does make nvme io faster because queueing multiple operations on the nvme itself is faster.

mbid · 2026-04-26T07:16:45 1777187805

This is outside of my expertise, but wouldn't multiple threads each submitting a single operation in parallel have the same effect?

ozgrakkurt · 2026-04-26T07:56:38 1777190198

That is still “async” considering what gp wrote.

Because they wrote “thread per task” which I assume to mean something like “each os thread handles the work submitted by one user”.

This is beside the point but, something like io_uring is still significantly better than doing threadpool nvme io.

default-kramer · 2026-04-25T20:06:02 1777147562

I think it's another case of the whole industry being driven by the needs of the very small number of systems that need to handle >10k concurrent requests.

cmrdporcupine · 2026-04-25T23:39:13 1777160353

Or biases inherited from deploying on single or dual core 32-bit systems from 20 years ago.

Honestly, it's a mostly obsolete approach. OS threads are fast. We have lots of cores. The cost of bouncing around on the same core and losing L1 cache coherence is higher than the cost of firing up a new OS thread that could land on a new core.

The kernel scheduler gets tuned. Language specific async runtimes are unlikely to see so many eyeballs.

rerdavies · 2026-04-26T14:19:28 1777213168

Also very useful for applications that have heavily animated graphical user interfaces, where you don't want I/O to interfere with UI animations on the UI thread. (e.g. ALL android apps. Or any UI framework that has a message pump (all of them?).

mbid · 2026-04-14T18:01:28 1776189688

I believe these ideas are much more mature and better explored for code gen, but similar techniques are useful also in the frontend of compilers, in the type checker. There's a blog post [1] by Niko Matsakis where he writes about adding equalities to Datalog so that Rust's trait solver can be encoded in Datalog. Instead of desugaring equality into a special binary predicate to normal Datalog as Niko suggests, it can be also be implemented by keeping track of equality with union-find and then propagating equality through relations, eliminating now-duplicate rows recursively. The resulting system generalizes both Datalog and e-graphs, since the functionality axiom ("if f(x) = y and f(x) = z, then y = z") is a Datalog rule with equality if you phrase it in terms of the graph of functions.

Systems implementing this are egglog [2] (related to egg mentioned in the article) and (self-plug, I'm the author) eqlog [3]. I've written about implementing Hindley-Milner type systems here: [4]. But I suspect that Datalog-based static analysis tools like CodeQL would also benefit from equalities/e-graphs.

[1] https://smallcultfollowing.com/babysteps/blog/2017/01/26/low...

[2] https://github.com/egraphs-good/egglog

[3] https://github.com/eqlog/eqlog

[4] https://www.mbid.me/posts/type-checking-with-eqlog-polymorph...

mbid · on Nov 20, 2024

The post mentions the idea that querying a database D can be understood algebraically as enumerating all morphisms Q -> D, where Q is the "classifying" database of the query, i.e. a minimal database instance that admits a single "generic" result of the query. You can use this to give a neat formulation of Datalog evaluation. A Datalog rule then corresponds a morphism P -> H, where P is the classifying database instance of the rule body and H is the classifying database instance for matches of both body and head. For example, for the the transitivity rule

  edge(x, z) :- edge(x, y), edge(y, z).

you'd take for P the database instance containing two rows (a_1, a_2) and (a_2, a_3), and the database instance H contains additionally (a_1, a_3). Now saying that a Database D satisfies this rule means that every morphism P -> D (i.e., every match of the premise of the rule) can be completed to a commuting diagram

  P --> D
  |    ^
  |   /
  ⌄  /
  Q

where the additional map is the arrow Q -> D, which corresponds to a match of both body and head.

This kind of phenomenon is known in category theory as a "lifting property", and there's rich theory around it. For example, you can show in great generality that there's always a "free" way to add data to a database D so that it satisfies the lifting property (the orthogonal reflection construction/the small object argument). Those are the theoretical underpinnings of the Datalog engine I'm sometimes working on [1], and there they allow you to prove that Datalog evaluation is also well-defined if you allow adjoining new elements during evaluation in a controlled way. I believe the author of this post is involved in the egglog project [2], which might have similar features as well.

[1] https://github.com/eqlog/eqlog [2] https://github.com/egraphs-good/egglog

snthpy · on Nov 21, 2024

Thank you @xlinux and @mbid. Very interesting and not something I knew much about before.

I had a look at eglog and egglog and if I'm understanding things correctly then one possible use case is type inference and optimization. I'm particular I looked at the example in [1].

I'm thinking that this could be useful in the PRQL [2] compiler, in particular for: a) inference of type restrictions on input relations and resultant output relation types, b) optimization of resultant SQL queries.

Would you be able to comment on whether that's correct?

Any links to related examples, papers, or work would be appreciated. Thanks!

1: https://egglog-python.readthedocs.io/latest/tutorials/sklear...

2: https://www.prql-lang.org/

mbid · on Nov 21, 2024

I actually started working on Eqlog because I wanted to use it to implement a type checker. You might want to skim the posts in my series on implementing a Hindley-Milner type system using Eqlog, starting here [1]. The meat is in posts 3 - 5.

The type checker of Eqlog is mostly implement in Eqlog itself [2]. The general idea is that your parser populates a Database with syntax nodes, which are represented as `...Node` types in the Eqlog program at [2], and then you propagate type information with Datalog/Eqlog evaluation. Afterwards, you check whether the Database contains certain patterns that you want to rule out, e.g. a variable that doesn't have a type [3].

There are still some unsolved problems if you're interested in writing the whole type checker in Datalog. For example, variable lookup requires quadratic memory when implemented in Datalog. I mention this and a possible solution at [4]. However, Datalog as is can probably still be useful for some subtasks during type checking. For example, the Rust compiler uses Datalog in some parts of the type checker I believe. Reach out via e.g. github to mbid@ if you'd like to discuss in more detail.

Regarding optimization you probably want to talk with somebody working with egglog, they have a dedicated Zulip [5]. I'd imagine that for prql you want to encode the algebraic rules of pipeline transformations, e.g. associativity of filter over append. Given the query AST, eqlog or egglog would give you all equivalent ways to write the query according to your rules. You'd then select the representation you estimate to be the most performant based on a score you assign to (sub)expression.

[1] https://www.mbid.me/posts/type-checking-with-eqlog-parsing/

[2] https://github.com/eqlog/eqlog/blob/9efb4d3cb3d9b024d681401b...

[3] https://github.com/eqlog/eqlog/blob/9efb4d3cb3d9b024d681401b...

[4] https://www.mbid.me/posts/dependent-types-for-datalog/#morph...

[5] https://egraphs.zulipchat.com

snthpy · on Nov 22, 2024

Thank you. Will try to get into this on the weekend. I'll reach out once I can ask a more informed question.

bubblyworld · on Nov 21, 2024

Very interesting perspective I hadn't heard before on datalog, thanks. How far does it go - can you interpret extensions of datalog (say negation or constrained existentials) in a nice categorical way, for instance? I've given this very little thought but I imagine you'd have issues with uniqueness of these "minimal" database instances, and I'm not sure what that means for these lifting properties.

(if my question even makes sense, pardon the ignorance)

mbid · on Nov 21, 2024

If you're interested in the details, you might want to have a look at papers [1] or [2].

You can add existentials in this framework, which basically means that the lifting problems mentioned above don't need to have unique solutions. But as you say, then the "minimal" databases aren't determined uniquely up to isomorphism anymore. So the result of Datalog evaluation now depends on the order in which you apply rules.

If I recall correctly, then [3] discusses a logic corresponding to accessible categories (Datalog + equality corresponds to locally presentable categories) which includes the the theory of fields. The theory of fields involves the negation 0 != 1, so perhaps that might give you a nicer way to incorporate negations without stratification.

[1] https://www.mbid.me/eqlog-semantics/

[2] https://arxiv.org/abs/2205.02425

[3] Locally presentable and accessible categories, https://www.cambridge.org/core/books/locally-presentable-and...

bubblyworld · on Nov 21, 2024

Thanks for the references, those papers looks great! Will dig into them this evening =)

mbid · on Jan 26, 2024

For me, the main problem with most tools that render to HTML was that they don't support all math typesetting libraries that latex supports. I used to work with category theory, where it's common to use the tikz-cd library to typeset commutative diagrams. tikz-cd is based on tikz, which is usually not supported for HTML output.

But apart from math typesetting, my latex documents were usually very simple: They just used sections, paragraphs, some theorem environments and references to those, perhaps similar to what the stack project uses [3]. Simple latex such as this corresponds relatively directly to HTML (except for the math formulas of course). But many latex to html tools try to implement a full tex engine, which I believe means that they lower the high-level constructs to something more low level (or that's at least my understanding). This results in very complicated HTML documents from even simple latex input documents.

So what would've been needed for me was a tool that can (1) render all math that pdflatex can render, but that apart from math only needs to (2) support a very limited set of other latex features. In a hacky way, (1) can be accomplished by simply using pdflatex to render each formula of a latex document in isolation to a separate pdf, then converting this pdf to svg, and then incuding this svg in the output HTML in the appropriate position. And (2) is simply a matter of parsing this limited subset of latex. I've prototyped a tool like that here [1]. An example output can be found here [2].

Of course, SVGs are not exactly great for accessibility. But my understanding is that many blind mathematicians are very good at reading latex source code, so perhaps an SVG with alt text set to the latex source for that image is already pretty good.

[1] https://github.com/mbid/latex-to-html

[2] https://www.mbid.me/lcc-model/

[3] https://stacks.math.columbia.edu/

ykonstant · on Jan 26, 2024

Tangentially, for me the stacks project is the gold standard of mathematical typography on the web. Look at this beauty: https://stacks.math.columbia.edu/tag/074J

Also check the diagrams: https://stacks.math.columbia.edu/tag/001U

If anyone can explain to me, a complete noob regarding html, how they achieve this result with html, css and whichever latex engine they use, I would be grateful. I want to make a personal webpage in this style.

artagnon · on Jan 26, 2024

It's standard MathJaX that's rendered client-side. I managed to get MathJaX + XyPic rendered server-side on my website, which is a lot nicer.

ykonstant · on Jan 26, 2024

Oh, you misunderstand the level of my question; rephrased, how do maek wabpag with "MathJaX that's rendered client-side"? (o´▽`o)

red_trumpet · on Jan 26, 2024

Take a look at MathJax's website: https://www.mathjax.org/#gettingstarted

They have a link to JSBin which contains an easy example html page.

ykonstant · on Jan 26, 2024

Thanks!

dolmen · on Jan 26, 2024

uMatrix tells me there are 8 external sites to grant permissions for access to resources. Definitely not a "beauty".

ykonstant · on Jan 26, 2024

I don't understand what this has to do with typography.

datadeft · on Jan 26, 2024

Have you seen typst? I have moved over from LaTex to Typst and most if not all your use cases are covered.

https://typst.app/

_flux · on Jan 26, 2024

Except the main theme, which was HTML export? https://github.com/typst/typst/issues/721

Though it's in the roadmap!

mbid · on Oct 14, 2023

If you're going to send out math as SVGs anyway, you can also just use your normal latex to PDF renderer (e.g. pdflatex) on each formula, and then convert the output PDFs to SVGs. That way, you get the same output you'd get with latex, and you can also use latex packages that aren't supported by MathJax (e.g. tikz-cd). I've prototyped a latex to html converter [1] based on that approach, but it's probably not ready for serious use. Here's an example: https://www.mbid.me/lcc-model/

[1] https://github.com/mbid/latex-to-html

mbid · on Oct 14, 2023

> using KaTeX [...] switched to server-side rendering with MathJax

I've been meaning to look into KaTex. Could you elaborate on why you switched away from it? KaTeX appears to support server-side rendering already, in the sense that it generates plain HTML.

batterylow · on Oct 16, 2023

> Could you elaborate on why you switched away from it?

I started using KaTeX sometime after 2015 because it promised to be fast (the fastest! [1]). I had to change the representation of a bunch of expressions because KaTeX didn't support some environments, whilst MathJax did. It was a trade-off I was willing to accept at the time.

Many years later, I started writing a personal static-site generator. I wanted comparatively lightweight pages, so rendering server-side was an option. I re-evaluated MathJax vs KaTeX again and this time I leaned towards MathJax, as speed was no longer an issue for me. It looks like KaTeX has broader support now [2].

[1] https://katex.org

[2] https://katex.org/docs/support_table.html

atlintots · on Oct 14, 2023

MathJax has better MathML support.

mbid · on Aug 14, 2023

The rename system call is not quite atomic. From https://linux.die.net/man/2/rename:

> However, when overwriting there will probably be a window in which both oldpath and newpath refer to the file being renamed.

A better way to think about rename is that it's given by two atomic operations in sequence: (1) An atomic link call to make the file available under the new name, and then (2) an atomic remove operation to delete the old name. Thus, it's possible to observe the state after (1) but before (2).

jakewins · on Aug 14, 2023

Dan Luu also says rename is not atomic, at the bottom of this post: https://danluu.com/file-consistency/

I've wondered ever since I read that what the first bullet there means. It can't be what you're saying here, what you're saying would apply to racing processes as well as crashes?

So then.. what does he mean when he says rename isn't atomic if there's a crash? What's the failure condition / in-between state that can occur if there's a crash?

mbid · on Aug 14, 2023

>what does he mean when he says rename isn't atomic if there's a crash?

Not sure. One of the papers he cites [1] has this to say about rename atomicity:

>Directory operations such as rename() and link() are seemingly atomic on all file systems that use techniques like journaling or copy-on-write for consistency

So it seems to depend on the file system. But the authors claim that not just link but also rename is atomic without qualification, which appears to be false, so I'm not sure how trustworthy this is.

There's a stackoverflow comment [2] that suggests that ext4 guarantees the kind of sequential atomicity property (atomic link followed by atomic remove) I mentioned:

>Kernel implementation depends on filesystem but here's implementation of Linux ext4 filesystem: elixir.bootlin.com/linux/v5.19.3/source/fs/ext4/namei.c#L3887 – it seems to first mark the inode as dirty, then create new link and after that delete the old link. If the kernel crashes in the middle, the end result could be two links and dirty mark. I would guess (but didn't investigate if this true) that journal recovery during the next mount would fix the end result to match atomic behavior of leaving either the old or new state depending on exact crash location.

In general, I find it rather weird how little authoritative documentation there is about whether or not this property holds given how important it is.

[1] https://www.usenix.org/system/files/conference/osdi14/osdi14...

[2] https://stackoverflow.com/questions/7054844/is-rename-atomic...

fao_ · on Aug 14, 2023

> I find it rather weird how little authoritative documentation there is about whether or not this property holds given how important it is.

Well, it can't be that important then, can it? :)

remram · on Aug 14, 2023

Usually the atomicity I care about is with regard to newpath: I either observe no file/previous file at newpath, or I observe the new file at newpath. This is in opposition to a copy for example, where I might observe a partially-written file.

This is a great point though, when taking oldpath into account the operation is not atomic. TIL

fsckboy · on Aug 14, 2023

the type of atomicity I've always seen it used for is, two processes try to rename a file at the same time, only one will succeed; the winner does what they planned and then rename the file back.

I guess you shouldn't ignore the error with both processes using the same destination name, why folks generally tend to put their pid in the lockfilename

yencabulator · on Aug 25, 2023

Umm, two concurrent rename(2) syscalls to the same destination path will likely both succeed. One of them will win (not be overwritten), but both should succeed.

If you want atomic creation of a path, where no one else can succeed also, you want link(2) not rename(2).

mbid · on Aug 14, 2023

This is also the natural solution when you're using Datalog to compute with the AST: Datalog operates on relations/tables, so associating types to AST nodes can be accomplished using a table with two columns: One for the ID representing an AST node, and one for the type associated to the AST node. I've recently written about this approach in the series starting here: https://www.mbid.me/posts/type-checking-with-eqlog-parsing/

mbid · on Oct 2, 2021

>Pompom is an attractive implementation of an extensional (!) dependently typed language

>Pompom provides [...] a strong normalization system

How is this possible? Extensional dependent type theory has undecidable term/type equality, so there cannot be a strong normalization algorithm.

caotic123 · on Oct 2, 2021

Oh, it is extensional in the sense of supporting K axiom, actually (not identity as propositional <-> definitional) :).

caotic123 · on Oct 2, 2021

As far i know, pompom has a decidable type check though.

mbid · on March 27, 2019

Just to add to the confusion: Monoidal categories are the appropriate notion of "monoid object" in the category of categories; unit and associativity law are replaced by coherent natural isomorphisms (see "categorification"). So that's where the "monoidal" comes from.