Yeah. I've been doing this for almost 10 years now. It's not APE/cosmopolitan (which also "kinda works" with Nim but has many lowest common denominator platform support issues, e.g. posix_fallocate). However, it does let you have very cross-Linux portable binaries. Maybe beyond Linux.
Some might appreciate a concrete instance of this advice inline here. For `foo.nim`, you can just add a `foo.nim.cfg`:
Besides `ar` as a sibiling observed, you might also be thinking of pixz - https://github.com/vasi/pixz , but really any archive format (cpio, etc.) can, in principle, just put a stake in the ground to have its last file be any kind of binary / whatever index file directory like Zip. Or it could hog a special name like .__META_INF__ instead.
One aspect of the question is that "permissions" are mostly regulated at the time of open and user-code should check for failures. This was a driving inspiration for the tiny 27 lines of C virtual machine in https://github.com/c-blake/batch that allows you to, e.g., synthesize a single call that mmaps a whole file https://github.com/c-blake/batch/blob/64a35b4b35efa8c52afb64... which seems like it would have also helped the article author.
This description matches my own experience. E.g., I recall having to use my own macro-based syscall() things when the inotify system was first introduced because glibc did not have support for years and then it was years more for slow moving Linux distros to pick up the new glibc version.
Unsaid was that much of this project separation comes from glibc being born as (and probably still being) a "portable libc with extra GNU-ish features", not a Linux-specific thing.
Honestly, some of this pain might have been avoided had the Bell Labs guys made two libraries - the syscall interface part of `libc`, called say `libos`, and the more articulated language run-time (string/buffered IO/etc./etc) the actual `libc`. Then the kernel could "easily" ship with libos and libc's could vary. To even realize this might be helpful someday likely required foresight beyond reason in the mid-1970s. Then, afterwards, Makefile's and other build system stuff probably wanted to stay with "-lc" in various places and then glibc/others wanted to support that and so it goes. Integration can be hard to un-do.
While I was always a sourced-base/personalized distribution personality type, this is also a big part of why I moved to Gentoo in early 2004 (for amd64, not Risc-V / other embedded per your example). While Pentium-IV's very deep pipelines and compiler flag sensitivities (and the name itself for the fastest Penguin) drove the for-speed perception of the compile-just-for-my-system style, it really plays well to all customization/configuation hacker mindsets.
That is a fantastic historical parallel. The early amd64 days were arguably Gentoo's killer app moment. While the binary distributions were wrestling with the logistical nightmare of splitting repositories and figuring out the /lib64 vs /lib standard, Gentoo users just changed their CHOST, bootstrapped and were running 64-bit native. You nailed the psychology of it, too. The speed marketing was always a bit of a red herring. The ability to say "I do not want LDAP support in my mail client" and have the package manager actually respect that is cool. It respects the user's intelligence rather than abstracting it away.
Since you've been on the ride since '04, I'm curious to hear your thoughts. How do you feel the maintenance burden compares today versus the GCC 3.x era? With the modern binhost fallback and the improvements in portage, I feel like we now spend less time fighting rebuild loops than back then? But I wonder if long time users feel the same.
> The ability to say "I do not want LDAP support in my mail client" and have the package manager actually respect that is cool.
I tried Gentoo around the time that OP started using it, and I also really liked that aspect of it. Most package managers really struggle with this, and when there is configuration, the default is usually "all features enabled". So, when you want to install, say, ffmpeg on Debian, it pulls in a tree of over 250 (!!) dependency packages. Even if you just wanted to use it once to convert a .mp4 container into .mkv.
I also liked the idea when I used Gentoo 15 years ago but you quickly realise it doesn't make much sense.
You are trading off having a system able to handle everything you will throw at it, and having the same binaries as everyone else for, well, basically nothing. You have a supposedly smaller exploitable surface but you have to trust that the Gentoo patches cutting these things out don't introduce new vulnerabilities and don't inadvertently shut off hardening features. You have slightly smaller packages but I'm hard pressed to think of a scenario where it would matter in 2026.
To me, the worst debuggability and the inability to properly communicate with the source project make it a bad idea. I find Arch's pledge to only ship strictly vanilla software much more sensible.
> Since you've been on the ride since '04, I'm curious to hear your thoughts. How do you feel the maintenance burden compares today versus the GCC 3.x era? With the modern binhost fallback and the improvements in portage, I feel like we now spend less time fighting rebuild loops than back then? But I wonder if long time users feel the same.
I'm another one on it since the same era :)
In general stable has become _really_ stable, and unstable is still mostly usable without major hiccups. My maintenance burden is limited nowadays compared to 10y ago - pretty much running `emerge -uDN @world --quiet --keep-going` and fixing issues if any, maybe once a month I get package failures but I run a llvm+libcxx system and also package tests, so likely I get more issues than the average user on GCC.
For me these days it's not about the speed anymore of course, but really the customization options and the ability to build pretty much anything I need locally. I also really like the fact that ebuilds are basically bash scripts, and if I need to further customize or reproduce something I can literally copy-paste commands from the package manager in my local folder.
The project has successfully implemented a lot of by-default optimizations and best practices, and in general I feel the codebases for system packages have matured to the point where it's odd to run in internal compiler errors, weird dependency issues, whole-world rebuilds etc. From my point of view it also helped a lot that many compilers begun enforcing more modern and stricter C/C++ standards over time, and at the same time we got Github, CI workflows, better testing tools etc.
I run `emerge -e1 @world` maybe once a year just to shake out stuff lurking in the shadows (like stuff compiled with clang 19 vs clang 21), but it's really normally not needed anymore. The configuration stays pretty much untouched unless I want to enable a new USE for a new package I'm installing.
I am replying here as a kind of "better place to attach".
Anyway, to answer grandparent, I basically never had rebuild loops in 19 years.. just emerge -uU world every day or sometimes every week. I have been running the same base system since..let's see:
I have never once had to rebuild the whole system from scratch in those 19 years. (I've just rsync'd the rootfs from machine to machine as I upgraded HW and gradually rebuilt because as many others here have said, for me it wasn't about "perf of everything" or some kind of reproducible system - "more customization + perf of some things".) The upgrade from monolithic X11 to split X11 was "fun", though. /s
I do engage in all sorts of package.mask/per-package use/many global use. I have my own portage/local overlay for things where I disagree with upstream. I even have an automated system to "patch" my disagreements in. E.g, I control how fast I upgrade my LLVM junk so I do it on my own timeline. Mostly I use gcc. I control that, too. Any really slow individual build, basically.
If over the decades, they ever did anything that made it look like crazy amounts of rebuilds would happen, I'd tend to wait a few days/week or so and then figure something out. If some new dependency brings in a mountain of crap, I usually figure out how to block that.
gcc 3.3 to 3.4 was a big thing, and could cause some issues if people didnt follow the upgrade procedures, and also many c++ codebases would need minor adjustments.. this has been much much less of a problem since.
Additionally gentoo has become way more strict with use flag dependencies, and it also checks if binaries are depending on old libs, and doesnt remove them when updating a package, such that the "app depends on old libstdc++" doesnt happen anymore. It then automatically removes the old when nothing needs it anymore
I have been running gentoo since before 04, continously, and things pretty much just work. I would be willing to put money that I spend less time "managing my OS" than most who run other systems such as osx, windows, debian etc. Sure, my cpu gets to compile a lot, but thats about it.
And yes, the "--omg-optimize" was never really the selling point, but rather the useflags, where theres complete control. Pretty much nothing else comes close, and it is why gentoo is awesome
To be fair it was not that difficult to set create a pure 64 bit binary distro and there were a few of them. The real issue was to figure out how to do mixed 32/64 bit and this is where the fight about /lib directories originated. In a pure 64 bit distro the only way to run 32 bit binaries was to create a chroot with a full 32 bit installation. It took a while before better solutions were agreed to. This was an era of Flash and Acrobat Reader - all proprietary and all 32 bit only so people really cared about 32 bit.
I'd say "the fastest" is a side effect of "allowing one to tune their systems to their utmost liking." -march=native, throw away unused bits and pieces, integrate modules into the kernel, replace bits and pieces with faster -- if more limited -- bits and pieces. And so on.
Popularity is one thing { and probably more people use it than a non-existing new PLang you are only at the design phase of :-) }, but I think you misunderstood the backend idea. Nim has a javascript backend via `nim js` and on that backend you can use `emit` to do inline javascript, just as on the C/C++ backends you can use emit to put out C/C++. So, if you did do a zig backend, being able to emit inline zig would be part of that. It may still not be what you want, of course.
I see from `test/test_suite/compile_time_introspection/paramsof.c3t` that there is a way to get names & types of function parameters [1]. The language also seems to support default values { e.g. `int foo(int a, int b = 2) {...}` } and even call with keyword arguments/named parameters [2], but I couldn't find any `defaultof` or similar thing in the code. Does anyone know if this is just an oversight / temporary omission?
I don't think it is available no, and it's the first time I heard about such an idea. Thinking on it, this would allow such cursed code (love that :D). I'll put it up for discussion in the Discord as I'm interested in hearing whether `.defaultof` is a good idea or not.
One application of such a feature would be something like a "cligen.c3" (like the Nim https://github.com/c-blake/cligen or its /python/cg.py port or etc.). Mostly it just seems a more complete signature extraction, though. Any other kind of documentation system might benefit.
graalvm is literally 500x more overhead than a statically linked dash script.
Maybe not an issue for terminal UIs, but the article mentions both TUIs and CLI tools. A lot of people use CLI tools with a shell. As soon as you do `for file in *.c; do tool "$file"; done` (as a simple example), pure overhead on the order of even 10s of ms becomes noticeable. This is not theoretical. I recently had this trouble with python3, but I didn't want to rewrite all my f-strings into python2. So, it does arise in practice. (At least in the practice of some.)
Haven't checked graalvm in a long time. So, I got graalvm-jdk-25.0.1+8.1 for x86_64. It's a lot faster than Julia, and maybe 43ms is not slow in "human terms", but it's still pretty slow compared to some other competition. This was for a helloworld.jar [1]. On my laptop (i7-1370P, p-cores) using tim[2]:
That is just a normal JVM with optional Graal components if enabled, but not being used. The default memory allocation is based on a percentage of available memory and uncommitted (meaning its available for other programs). When people mention Graal they mean an AOT compiled executable that can be run without a JVM installed. Sometimes they may refer to Graal JIT as a replacement for C1/C2 available also in VM mode. You are using a plain HotSpot VM in server mode, as the optimized client mode was removed when desktop use-cases were deprioritized (e.g. JWS discontinued).
You are correct and I apologize for the misimpression.
`native-image -jar helloworld.jar helloworld` did take a whopping 17 seconds to compile, what might be the smallest possible project. That does makes me worry for iterations trying to get better perf in a context where startup overhead matters, BUT the executable it produced did run much faster - only about 1.8x slower than `tcc -run`:
Perl has 2 more shared libraries for ld.so to link, but is somehow faster. So, there may still be some room for improvement, but anyway, thank you for the correction.
(Also, I included 4 of the faster comparative programs to show additionally that the error bars are vaguely credible. In truth, on time shared OSes, the distributions have heavier tails than Gaussian and so a single +- is inadequate.)
--
EDIT: So, the ld.so/dynamic linking overhead was bothering me. I had to get a musl + zlib build environment going, but I did after a few minutes and then found this result with a fully statically linked binary executable:
398.2 +- 4.2 μs ./helloworld-sta>/n
(I should have noted earlier that /n -> /dev/null is just a convenience symlink I put on all my systems. Also, this is all on Linux 6.18.2 and the same CPU as before.)
Also, only around 4.2 MiB of RSS compared to ~1.8 MiB for dash & awk. So, 2.4x the space and only ~4X the time of static awk & dash. That might sound like criticism, but those are usually the efficiency champs. The binary size is kind of hefty (~2x like the RAM use):
$ size *sta
text data bss dec hex filename
2856644 3486552 3184 6346380 60d68c helloworld-sta
So, I guess, in 2026, Java start-up overhead is pretty acceptable. I hope that these "real numbers" can maybe add some precision to the discussion. Someone saying "mere milliseconds" just does not mean as much to me as 400 +- 4 microseconds, and perhaps there are others like me.
Thanks for the corrected evaluation. Just for your awareness, the HotSpot team is working on offering a spectrum of AOT/JIT options under the Leyden project [1]. Currently one has to choose between either a fully open world (JIT) or closed world (AOT) evaluation. The mixed world allows for a tunable knob, e.g. pre-warmup by AOT while retaining dynamic class loading and fast compile times for the application. This will soften the hard edges so developers can choose their constraints that best fit their application's deployment/usage model.
Some might appreciate a concrete instance of this advice inline here. For `foo.nim`, you can just add a `foo.nim.cfg`:
There is also a "NimScript" syntax you could use a `foo.nims`: