More

aidanhs · 2025-04-09T16:22:29 1744215749

One annoying part of using chroot if you're creating them on the fly is teardown - you have to manually invoke umount, and also take care to get this right for partially created chroots (maybe you detected an error after mounting proc, in the process of getting other files in place).

This was my original motivation in creating machroot (mentioned elsewhere in this thread) and having it use namespaces.

aidanhs · 2025-04-09T16:20:32 1744215632

As a number of comments have noted, there are a bunch of different axes that chroot could be 'better' on - e.g. security and sandboxing.

I wrote https://github.com/aidanhs/machroot (initially forked from bubble wrap) a while ago to lean into the pure "pretend I see another filesystem" aspect of chroot with additional conveniences (so no security focus). For example, it allows setting up overlay filesystems, allows mounting squashfs filesystems with an overlay on top...and because it uses a mount namespace, means you don't need to tear down the mount points - just exit the command and you're done.

The codebase is pretty small so I just tweaked it with whatever features I needed at the time, rather than try and make it a fully fledged tool.

(honestly you can probably replicate most of it with a shell script that invokes unshare and appropriate mount commands)

aidanhs · 2025-01-04T20:30:03 1736022603

Out of curiosity, what would you say the current state of the art is for full compilable decompilation? This is something I have a vague interest in but I'm not involved enough in the space to be on top of the latest and greatest tooling.

feznyng · 2025-01-04T21:26:34 1736025994

Echoing IDA but its pricing is a huge PITA if you’re using it in a hobbyist capacity i.e. you don’t have an employer willing to pay for it. Could opt for the home version but that’s a yearly cost and you have to use their cloud decompiler. Ghidra’s your best bet if you want something FOSS and community-driven although not as great at decompilation.

mdaniel · 2025-01-05T02:47:42 1736045262

Not only the pricing by itself, every story that I've heard about normal people trying to actually give them money is that they actually don't want to sell it to anyone other than big players

That said, depending on ones needs they do actually offer a slimmed down IDA Free: https://hex-rays.com/ida-free

I actually use AUR to more-or-less track its releases https://aur.archlinux.org/packages/ida-free

D4Ha · 2025-01-06T18:47:02 1736189222

Hexrays used to be difficult to deal with if you want to purchase IDA Pro for the first time, due to their software getting leaked online.

They have eased the procedure to buy from them, but from time to time they'll ask you to fill out your info with national ID/passport (they say its because they don't want to sell their software to individuals under sanctions). This is despite them being based in Belgium (not the US).

For any serious work IDA Pro is highly suitable (the customization and scripting, loader examples and processor plugins...etc), on the other hand for side projects and basic security research Binary ninja and ghidra can go along way.

carom · 2025-01-04T22:22:33 1736029353

Most decompilers do not strive for recompilability. [1] I believe there are (or were) some academic projects that aimed for recompilation as a core feature, but it is a hard problem.

On the commercial side, IDA / HexRays [2] is very strong for C-like decompilation. If you're looking at Go, Rust, or even C++ it is going to be a little bit more messy. As other commenters have said, you'll work function-by-function and it is expensive, though the free version does have decompilation (F5) for x86 and x64 (IIRC).

Binary Ninja [3] (no affiliation) is the coolest IMO, they have multiple intermediate representations they lift the assembly through. So you get like assembly -> low level IL -> medium level IL -> high level IL. There are also SSA forms (static single assignment) that can aid in programmatic analyses. The high level IL is very readable but makes no effort to be compilable as a programming language. That being said, Binary Ninja has implemented different "views" on the HLIL so you can show it as pseudo-C, Rust, etc. There is a free online version and the commercial version is cheaper than IDA but still expensive. Good Python API, good UI.

Ghidra [4] is the RE framework released by NSA. It is free and open source. It supports a ton of niche architectures. This is what most people use. I think the UI is awful, personally. It has a decompiler, the results are OK. They have an intermediate representation (P-Code) and plugins are in Java (since it is written in Java). I haven't worked much with it.

Most online decompilations you see for old games are likely using Ghidra, some might be using IDA. This is largely a manual process of doing a function at a time and building up the mental map of the program and how things interact.

Also worth mentioning are lifters. There were a few projects that aimed to lift assembly to LLVM IR (compiler framework's intermediate representation), with the idea being that then all your analyses could be written over LLVM IR as a lingua franca. Since it is in LLVM IR, it would be also recompilable and retargetable. [5][6]

1. https://reverseengineering.stackexchange.com/questions/2603/...

2. https://hex-rays.com/ida-free

3. https://binary.ninja/free/

4. https://ghidra-sre.org/

5. https://github.com/avast/retdec

6. https://github.com/lifting-bits/mcsema

r00t- · 2025-01-04T23:53:07 1736034787

Meta has a foundation model trained on LLVM IR: https://ai.meta.com/research/publications/meta-large-languag...

spectre9 · 2025-01-05T00:22:02 1736036522

lol ok, now we’re getting into pure-nonsense territory

michaeltlewis · 2025-01-05T02:55:30 1736045730

It's not clear to me why that is so. An LLM trained on IR for the purpose of compilation is not quite what we're looking for here but it is in the same territory.

Retr0id · 2025-01-04T20:39:00 1736023140

Looking at an individual function, IDA hex-rays output is often recompilable as-is (or with minor modifications), but it won't necessarily be idiomatic, especially if you don't have symbol information.

aidanhs · on Nov 14, 2024

All the emulation of desktop machines in WASM I've seen so far have been for x86 - do you think there are significant additional hurdles for x86_64? Or is it just a matter of time?

Separately, one bit of feedback - it's cool that webvm is open source, but I think it's fair to ask you to be upfront that cheerpx itself is not (which is fine!) in the blog post itself where you talk about webvm licensing. If I wasn't already familiar with the wasm emulation space I would have felt rather misled.

apignotti · on Nov 14, 2024

I cannot speak for other VMs, in the case of CheerpX there is nothing fundamental preventing emulation/JIT-ting of 64-bit platforms, but it is an inefficient choice due to the current limitations of Wasm, in particular on the front of how much memory can be used in total. In the best case scenario it would be 4GB, but it's unlikely this can be achieved on all devices in the real world.

64-bit code by its nature consume more memory, each pointer is twice the size, which makes the memory limitation even more pressing for no advantage. Please note, that this is unrelated to the work on WebAssembly Memory64. The issue is not the lack of address space, but rather on the actual memory that can be allocated in practice.

For this public deployment of WebVM we have chosen to limit the maximum memory to 700MBs, which makes the demo work fine on the vast majority of devices, including mobile ones. This said, we do plan to support the 64-bit ISA in the future with the main use case of supporting all the existing docker images available on Docker Hub and similar platforms.

Appreciate your feedback on licensing and we will take it into account, but please notice that this article is specifically about WebVM, that is indeed FOSS. A separate article dedicated to the CheerpX 1.0 release will be published soon, and it will of course be very clear that the latter is proprietary.

aidanhs · on Nov 14, 2024

Interesting, that's helpful, thanks - so with the eventual arrival of memory64 and assuming I only wanted to target desktop systems and assuming browser implementations permit large allocations (e.g. 8GB) - large 64bit apps could work fine. I have a use case for this I've been poking at for a bit, but implementing my own version of cheerpx would be a lot of work, maybe I'll just wait!

On open source - I can only give you feedback as an outside fresh pair of eyes :) I incorrectly interpreted that it was full stack OSS based on the overall blog post 'vibe' and had to deliberately double check because I was aware of cheerpx beforehand. Perhaps it's just me. I look forward to the cheerpx blog post!

aidanhs · on Sept 19, 2024

The company I work at (Hadean) used to have this as a product - think erlang-like multi machine IPC, with automatic acquisition of cloud resources and language integration for Rust, C, C++, Python. Pretty easy to point it at some machines and get them running a distributed application (as in simulation or big data).

But infrastructure for developers is hard to make money with - developers like to build it themselves and people holding the purse strings point at kubernetes and say "that's free". So we just use it as an internal platform for a distributed simulation engine and it works pretty well.

I did an analysis of removing it (it's a lot of bespoke code that we have to maintain for something that isn't our actual product) and I think you could probably implement something on top of Nomad that's close enough...but then Nomad went BSL and Kubernetes is a big complexity shift.

So...if anyone knows of something out there let me know, I'd love to be able to use it outside of work :)

jmakov · on Sept 19, 2024

ray.io seems to be doing pretty well financially...

aidanhs · on Sept 19, 2024

Right, because Anyscale found a niche that distributed compute matters in (AI) and built great libraries/hosted platforms/services around that. I would venture that the money they make from people who pare back things to just ray core is ~0, which is why it's open source.

Put another way - building such a platform doesn't preclude commercial success, but (at least for us) it isn't sufficient. Fly.io might be able to pull it off if they want to explore that direction imo.

Fwiw if you dig around in the ray core codebase (as I did when I was doing competitor analysis years ago) you can use the core C code from other languages to build such a platform for Rust if you like - they had Java and C++ interfaces at the time, but I haven't looked in the last 5 years.

aidanhs · on April 7, 2024

Can you elaborate on what you're getting at?

Syntax-wise it's about as similar to JSON as Erlang expressions are (i.e. superficially similar in some cases).

Semantics-wise I've personally found any superficial similarity to JSON to be actively unhelpful in understanding because of some important processing differences (e.g. paths, laziness).

waldrews · on April 7, 2024

The FunctionalScript language is supposed to be a superset of JSON and a subset of Javascript that's pure and functional. Not sure that's suited to the use case Nix is going for, but that sounds like what we'd get if we took the extended-JSON path.

otabdeveloper4 · on April 9, 2024

JSON is not a language, it's a data representation. Nix data is equivalent to JSON, except Nix also has a "path" datatype.

aidanhs · on April 14, 2024

First a minor quibble: if you're talking about 'Nix data', then starting the conversation by talking about 'Nix-the-language' is rather misleading.

That aside, Nix allows you to create infinite datastructures, e.g.

    $ nix eval --expr 'rec { z = { a = z; i = 5; }; }.z.a.a.a.i'
    5

which you can't do with JSON.

But even if JSON did have some way of handling datastructure 'loops', it's still not helpful because of laziness. You almost never want to eagerly evaluate a Nix expression to produce what you seem to term 'Nix data', because you'll invoke the `derivation` built in function to create paths you never actually reference - this is why laziness is such an important property of Nix.

So I'm still not clear what user-facing part of Nix is isomorphic to JSON. If it's just "Nix types [0] are similar to JSON types" then...sure.

[0] https://nixos.org/manual/nix/stable/language/values

maxcoder4 · on April 7, 2024

Agreed. I've tried to write a json-nix converter (because I wanted to try configuring my system in jsonnet) and it's absolutely not easy, or obvious how to do this.

aidanhs · on Feb 5, 2024

Of my series of PRs, I suspect the third (i.e. https://github.com/zellij-org/zellij/pull/3043) is most likely to have an effect. But if it does it'd only be as a side effect unfortunately - my focus was on fixing lag with splitting of extremely long lines.

From what I saw while making my changes, that area of the code has a bunch more possible optimisations, but it's 'good enough' for me at this point so I'm not planning to continue pulling at the thread right now. If you wanted to look yourself, I left the script I used for benchmarking and profiling in https://github.com/zellij-org/zellij/issues/2622#issuecommen...

aidanhs · on April 8, 2023

I didn't believe you that it was broken, but you're right - very disappointing. For anyone interested, the bug for it being is at [1] (reported mid 2021).

The build failure is easy to fix, so I created a repo at [2] which builds a program against a glibc with static nss. I verified with strace that it does indeed check nsswitch.conf and try and load dynamic libraries (I'd at least submit my patch [3] for the build failure but I find mailing lists to be a hassle)

All this said, I wouldn't call it undocumented - it's documented in the `configure --help` itself as well as the online version [4], and it has an FAQ entry [5].

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=27959

[2] https://github.com/aidanhs/gcc-static-linking

[3] https://github.com/aidanhs/gcc-static-linking/blob/1f04425e2...

[4] https://www.gnu.org/software/libc/manual/html_node/Configuri...

[5] https://sourceware.org/glibc/wiki/FAQ#Even_statically_linked...

aidanhs · on March 1, 2023

Hadean | Rust Engineer | London (flexible/hybrid) or REMOTE (UK) | Full-Time

Hadean are backed by the likes of Epic Games. Our speciality is in spatial compute - we’ve built a massive-scale distributed simulation engine and a connectivity layer to plug thousands of users into a single world. We’re using these to provide the infrastructure and computational power to build, run and monetise the Metaverse.

Our core (internal) platform, connectivity layer and some higher level components are written in Rust and we’re looking for people to work on the design, implementation, and maintenance of our products.

Check out our careers page at https://hadean.com/jobs/

ianpurton · on March 1, 2023

What the salary range?

aidanhs · on Feb 21, 2023

(for context - I'm not interested in first class node support)

This seems pretty cool. I particularly like how 'gradual' it seems to be relative to things like Bazel, i.e. you can take some shell scripts and migrate things over. I did have a play and hit an initial problem around project caching I think, which I raised at [0].

One comment, from the paranoid point of view of someone who has built distributed caching build systems before is that your caching is very pessimistic! I understand why you hash outputs by default (as well as inputs), but I think that will massively reduce hit rate a lot of the time when it may not be necessary? I raised [1].

Edit: for any future readers, I spotted an additional issue around the cache not being pessimistic enough [3]

As an aside, I do wish build systems moved beyond the 'file-based' approach to inputs/outputs to something more abstract/extensible. For example, when creating docker images I'd prefer to define an extension that informs the build system of the docker image hash, rather than create marker files on disk (the same is true of initiating rebuilds on environment variable change, which I see moon has some limited support for). It just feels like language agnostic build systems saw the file-based nature of Make and said 'good enough for us' (honorable mention to Shake, which is an exception [2]).

[0] https://github.com/moonrepo/moon/issues/637

[1] https://github.com/moonrepo/moon/issues/638

[2] https://shakebuild.com/why#expresses-many-types-of-build-rul...

[3] https://github.com/moonrepo/moon/issues/640

mileswjohnson · on Feb 21, 2023

Thanks for the feedback and prototyping with it immediately! Always appreciated to get hands on feedback.