Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Stackful coroutines would definitely benefit from being builtin in the language as you can get a significantly better ABI that you can do with a pure library based solution. You can sorta-kinda make it work with GCC extended inline assembly[1] but it is quite fragile as you need to handle exceptions, unwind info, red zones, etc.

Also you need compiler support to correctly handle thread_local.

[1] https://github.com/gpderetta/delimited/blob/master/delimited...



You can do somewhat better than that with clang.

attribute((naked)) on a function which has a single asm block as the implementation gives you control over argument passing and changing the stack pointer.

attribute((preserve_none)) on the same function spills most live registers to the stack in the caller. The coroutine switch doesn't need to do as many push/pop which makes it a bit more readable, but mainly this means you don't spill dead registers. That's the big thing you need compiler support for.

I believe the x64 redzone is a non-issue here as you've called the switch function, as opposed to tried to call from within inline asm (which does need to be careful about that). The magic globals are a problem though (floating point control thing, maybe signal mask, errno et al) so I guess don't use the magic globals from within fibres.

"thread_local" doesn't map very sensibly onto fibres. There have been compiler bugs in that area too. Storing some information at the start of the fibre stack works fine though, you just don't get syntactic support for allocating / dereferencing from it.


yes, preserve_none would be exactly what I want, except that I also want to avoid the call instruction in the final asm stream: as the call would not be paired with a ret, the call stack predictor will always mispredict it on every context switch, while an an indirect jmp has a much better chance to be predicted when two coroutines call each other in a tight loop (consider generators for example).

Ideally I think that a ctx_t* __builtin_context_switch(ctx_t* to) would need to be provided by the compiler.

Re thread_local, I believe at least MSVC has (had?) a fiber-safe flag that would handle thread_locals correctly by not caching addresses across function calls.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: