Hacker Newsnew | past | comments | ask | show | jobs | submit | matrss's commentslogin

So, basically iocaine (https://iocaine.madhouse-project.org/). It has indeed been very useful to get the AI scraper load on a server I maintain down to a reasonable level, even with its not so strict default configuration.

https://blog.cloudflare.com/ai-labyrinth/

A bit like this? ( iocaine is newer)


If I think about it, I find it awful. The fact that we need to put junk in our own stuff just for crawlers does not sit well with me.

Yup, it's a clown world.

Any functioning society would deal with the offenders directly and had this stopped before it became an issue for most sites.


First time seeing that, but yes, seems similar in concept. Iocaine can be self-hosted and put in as a "middleware" in your reverse proxy with a few lines of config, cloudflare's seems tied to their services. Cloudflares also generates garbage with generative models, while iocaine uses much simpler (and surely more "crude") methods of generating its garbage. Using LLMs to feed junk to LLMs just makes me cry, so much wasted compute.

Is iocaine actually newer though? Its first commit dates to 2025-01, while the blog post is from 2025-03. I couldn't find info on when Cloudflare started theirs. There's also Nepenthes, which had its first release in 2025-01 too.


Yes, except with the content being based on the real content rather than completely random. My intuition says that this will be more effective, specifically poisoning the model wrt tokens relating to that content rather than just increasing the overall noise level a bit (the damage there being smoothed out over the wider model).

> But they’re roughly the same paradigm as docker, right?

Absolutely not. Nix and Guix are package managers that (very simplified) model the build process of software as pure functions mapping dependencies and source code as inputs to a resulting build as their output. Docker is something entirely different.

> they’re both still throwing in the towel on deploying directly on the underlying OS’s userland

The existence of an underlying OS userland _is_ the disaster. You can't build a robust package management system on a shaky foundation, if nix or guix were to use anything from the host OS their packaging model would fundamentally break.

> unless you go all the way to nixOS

NixOS does not have a "traditional/standard/global" OS userland on which anything could be deployed (excluding /bin/sh for simplicity). A package installed with nix on NixOS is identical to the same package being installed on a non-NixOS system (modulo system architecture).

> shipping what amounts to a filesystem in a box

No. Docker ships a "filesystem in a box", i.e. an opaque blob, an image. Nix and Guix ship the package definitions from which they derive what they need to have populated in their respective stores, and either build those required packages or download pre-built ones from somewhere else, depending on configuration and availability.

With docker two independent images share nothing, except maybe some base layer, if they happen to use the same one. With nix or Guix, packages automatically share their dependencies iff it is the same dependency. The thing is: if one package depends on lib foo compiled with -O2 and the other one depends on lib foo compiled with -O3, then those are two different dependencies. This nuance is something that only the nix model started to capture at all.


> Docker ships a "filesystem in a box", i.e. an opaque blob, an image. Nix and Guix ship the package definitions from which they derive what they need to have populated in their respective stores, and either build those required packages or download pre-built ones from somewhere else, depending on configuration and availability.

The rest of your endorsement of NixOS is well taken, but this is a silly distinction to draw. Dockerfiles and nix package definitions are extremely similar. The fact that docker images are distributed with a heavier emphasis on opaque binary build step caching, and nix expressions have a heavier emphasis on code-level determinism/purity is accidental. The output of both is some form of a copy of a Linux user space “in a box” (via squashfs and namespaces for Docker, and via path hacks and symlinks for Nix). Zoom out even a little and they look extremely alike.


> This nuance is something that only the nix model started to capture at all.

Unpopular opinion, loosely held: the whole attempt to share any dependencies at all is the source of evil.

If you imagine the absolute worst case scenario that every program shipped all of its dependencies and nothing was shared then the end result would be… a few gigabytes of duplicated data? Which could plausible be deduped at the filesystem level rather than build or deployment layer?

Feels like a big waste of time. Maybe it mattered in the 70s. But that was a long, long time ago.


I think the storage optimization aspect is secondary, it is more about keeping control over your distribution. You need processes to replace all occurrences of xz with an uncompromised version when necessary. When all packages in the distribution link against one and the same that's easy.

Nix and guix sort of move this into the source layer. Within their respective distributions you would update the package definition of xz and all packages depending on it would be rebuild to use the new version.

Using shared dependencies is a mostly irrelevant detail that falls out of this in the end. Nix can dedupe at the filesystem layer too, e.g. to reduce duplication between different versions of the same packages.

You can of course ship all dependencies for all packages separately, but you have to have a solution for security updates.


Node.js basically tried this — every package gets its own copy of every dependency in node_modules. Worked great until you had 400MB of duplicated lodash copies and the memes started.

pnpm fixed it exactly the way you describe though: content-addressable store with hardlinks. Every package version exists once on disk, projects just link to it. So the "dedup at filesystem level" approach does work, it just took the ecosystem a decade of pain to get there.


nix has a cache too but only if the packages are reproducible.

Much harder to get reproducibility with C++ than JavaScript to say the least.


> If you imagine the absolute worst case scenario that every program shipped all of its dependencies and nothing was shared then the end result would be… a few gigabytes of duplicated data?

Honestly, I've seen projects that do this. In fact, a lot of projects that do this, at the compilation level.

It feels like a lot of the projects that I would want to use from git pull in their own dependencies via submodules when I compile them, even when I already have the development libraries needed to compile it. It's honestly kind of frustrating.

I mean, I get it - it makes it easier to compile for people who don't actually do things like that regularly. And yeah, I can see why that's a good thing. But at the very least, please give me an option to opt out and to use my own installed libraries.


Maybe the RAM crunch will get people optimizing for dedup again.


You have to differentiate container images, and "runtime" containers. You can have the former without the latter, and vice versa. They are entirely orthogonal things.

E.g. systemd exposes a lot of resource control as well as sandboxing options, to the point that I would argue that systemd services can be very similar to "traditional" runtime containers, without any image involved.


Well, I did mention "or use cgroups" above.


And what I've said is that there are more options. You don't have to use cgroups directly, there are other tools abstracting over them (e.g. systemd) that aren't also container runtimes.


> i could never understand why anyone would us vi/m with its bs shortcuts, making BASIC text editing into a complete *.

I could never understand why anyone would use nano with its bs shortcuts, making basic text editing (in contrast to basic linear text writing, which even a non-modal editor like nano can do decently) into a complete *.

This is dumb. Sure, some people don't get modal editing. Others don't get how you could live without. It is almost as if people work differently and have different preferences.


I get why OSes come with nano as the default editor, but it's so confining and slow to use when you're used to vim (or I'm sure emacs)


Emacs is a bit special in that the "canonical" way of editing a remote configuration file with it is probably using TRAMP, i.e. connecting your local emacs via ssh to edit the remote file as if it was local.


Vim is the exception, not the rule. Most people don't want a mental model just to type a sentence. Instead of the snark, you could just admit that your preference doesn't align with the median user.


> Most people don't want a mental model just to type a sentence.

"Just typing a sentence" is what I was referring to with "basic linear text writing", for which modal editing indeed does not bring much of a benefit. That's not text editing though.

> Instead of the snark, you could just admit that your preference doesn't align with the median user.

? I explicitly wrote that people work differently and have different preferences. What was snarky about that?

Besides, the median user does not edit configuration files via ssh, so they are hardly relevant here. The median user does not even know what a terminal is. If this was about the median user, then we would be discussing Word vs. Notepad, or whatever.


> I deploy using a dedicated user, which has passwordless sudo set up to work.

IMO there is no point in doing that over just using root, maybe unless you have multiple administrators and do it for audit purposes.

Anyway, what you can do is have a dedicated deployment key that is only allowed to execute a subset of commands (via the command= option in authorized_keys). I've used it to only allow starting the nixos-upgrade.service (and some other not necessarily required things), which then pulls updates from a predefined location.


Brew _is_ a linux package manager.

There is also conda/mamba/pixi/etc. (anything in the conda-forge ecosystem) that can be used without root. Then there are Guix and nix, which (mostly) require to be set up by someone with root privileges, but which then allow unprivileged users to install packages for themselves. I think I have even used emerge rootless-ly at some point a few years ago.


> “No person shall circumvent a technological measure that effectively controls access to a work protected under this title.”

This wording always bothers me. If a person were to circumvent a technological measure that tries to control such access, then the circumvention itself proves that this measure was not effective at doing what it is supposed to be doing. Therefore the person is not circumventing something that _effectively_ controls anything. They just showed that it is ineffective, and therefore the law does not apply to them.

Of course, no one who actually has to interpret these laws shares my opinion.


The meaning of effect is not what you think it is. In this case it simply means “to bring into being”, or “have the intent to”.


So, Prolog is not code then?

> Except you can't run english on your computer.

I can't run C on it either, without translating it to machine code first. Is C code?


A prompt is for the AI to follow. C is for the computer to follow. I don't want to play games with definitions anymore, so I am no longer going to reply if you continue to drill down and nitpick about exact definitions.


If you don't want to argue about definitions, then I'd recommend you don't start arguments about definitions.

"AI" is not special-sauce. LLMs are transformations that map an input (a prompt) to some output (in this case the implementation of a specification used as a prompt). Likewise, a C compiler is a transformation that maps an input (C code) to some output (an executable program). Currently the big difference between the two is that LLMs are usually probabilistic and non-deterministic. Their output for the same prompt can change wildly in-between invocations. C compilers on the other hand usually have the property that their output is deterministic, or at least functionally equivalent for independent invocation with the same input. This might be the most important property that a compiler has to have, together with "the generated program does what the code told it to do".

Now, if multiple invocations of a LLM were to reliably produce functionally equivalent implementations of a specification as long as the specification doesn't change (and assuming that this generated implementation does actually implement the specification), then how does the LLM differ from a compiler? If it does not fundamentally differ from a compiler, then why should the specification not be called code?


> the prompt is for the AI.

and C is for the compiler not "the computer".

It's commonplace for a compiler on one computer to read C code created on a second computer and output (if successfully parsed) machine code for a third computer.


In other words: it's going downhill ever since the DB was privatized.


DB is not privatized. It is 100% owned by the state.


DB has been reorganized as an AG in the 90s, i.e. a corporation under private law. They are forced to (at least try to) make a profit for their shareholders, which is a common trait of private organizations. They consistently do so via short-sighted (mis-)management, another common trait with many private organizations. This privatized corporation is indeed fully owned by the state as its only shareholder, but unfortunately that doesn't manifest in the DB being run as the critical infrastructure that it is. I suspect that the indirections in power over the corporation that the privatized structure imposes is a key reason for why it became such a disaster.


I wonder how many times a low-effort "truthy" sounding comment like that is written without someone like you to correct them and to clarify. There's also comments here suggesting UK's privatisation fixed BR that I do not have the energy to correct anymore, so they just sit there being wrong for all to see


Is their comment true because you want it to be, or is it actually factually inaccurate and biased as many other people are saying?


> They are forced to (at least try to) make a profit for their shareholders,

This is not true at all.

The shareholders set the targets and since the shareholder is the government they can set any target they want: profitability, more trains, cheaper tickets etc..

If the shareholder wants to inject 10% every year in stead of taking a profit they are absolutely free to do so.


The DB AG has been specifically founded to be "market-oriented" and profit-making, so yes, it is true.

I am sure the state could try to do _something_ about it, but I am also sure that a very strong car lobby here in Germany is working against that. BTW, the road network, which I would consider to conceptually be the same kind of infrastructure as the rail network, is to my understanding mostly built and maintained by state organizations, so it is possible to do it that way.

I guess it is also harder to market "let's subsidize this private company with tax payer money so they can continue to offer mediocre service" to voters, compared to "let's use tax payer money to build and maintain one-of-a-kind critical infrastructure from which everyone (with a car, which due to the less-than-great alternatives is a lot of people) can profit".

Again, having it organized as a private company adds indirection, diffuses power and responsibility, and adds a certain more or less implicit expectation of what private companies are supposed to do. That's my main issue with it. Private companies aren't supposed to run critical infrastructure as a monopoly for profit. It's the states job to provide and maintain critical infrastructure in the interest of all.


>The DB AG has been specifically founded to be "market-oriented" and profit-making, so yes, it is true.

Again, if the shareholders decide this is the reason: yes.

But shareholders can just as easily set other targets or incentives.

>I guess it is also harder to market "let's subsidize this private company with tax payer money so they can continue to offer mediocre service" to voters,

The government owns DB AG, it is not a private company. It is a public company.


> The government owns DB AG, it is not a private company. It is a public company.

It is a private company, as in it is a legal entity under private law. This is in contrast to a "öffentlich-rechtliches Unternehmen" (I don't know if this even has a proper translation or equivalent in other jurisdictions). There is more than two options here, it can be both privatized and public according to your definition.


Any private company is fully free to financially ruin themselves if this is what the shareholders want.

You are under no obligation to make a profit.


Do you know that the government has set those targets?


> They are forced to (at least try to) make a profit for their shareholders [...]

Not true. Shareholder primacy is not as huge as in Delaware.

And in the end it's the government that owns all shares and thus can decide how much profit the company should make.


Just because it is even more true elsewhere does not mean it is untrue here.


That's ridiculous. DB is not even trying to become profitable, not is there any evidence that it's sole shareholder, aka the government, sets it as a target.


Well apparently they have been somewhat profitable from 2016 to 2019, and they have been paying a dividend to the state more often than not. I don't think their goal is actively loosing money?


The site is pretty clear: "Free and works in browser", "Processed locally", "Private". But apparently the site (sorry for the harsh word, but I can't interpret it any other way) lies.


"is incorrect" is slightly less harsh, but in this case, I'd call it a lie. It's a rather subtle but important implementation detail. I don't think the author (who is here in this thread) is necessarily malicious because of this, but, well, it's a lie.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: