*Systemd's response was to say that they should incorporate systemd's library, a...

shakna · on Jan 29, 2019

> Because it's a bug for some, and intended behavior for others. Look, you make it as if they introduced a bug on purpose to screw with some people. It's clearly not the case, there was a specific tradeoff involved.

They broke userland.

It doesn't matter what tradeoff they made - they went against POSIX behaviour, and as a result, broke numerous utilities, both past and future.

Let's say that again - systemd introduced breaking behaviour on userland, against POSIX, and instead of backing down and allowing for expected and specified behaviour, they said it's everyone else's problem.

That is neither professional, nor responsible.

When you make a mistake, a mistake that breaks the behaviour of POSIX, and POSIX utilities like _cron_, you apologise, and fix the problem.

You don't turn around and say that all the sysutils should incorporate your new idea.

poettering · on Jan 29, 2019

First of all, as mentioned above, we made this compile-time as well as runtime-configurable, so that downstream distros can choose whether they want to make this opt-in or opt-out. Hence blame your distros if you picked it in a way you didn't like.

Moreover, this doesn't affect cron at all. Cron creates its own PAM session for each job it runs which means those jobs are independent from any real login session (i.e. ssh, graphical, tty login), and thus also don't get cleaned up by them.

This affected stuff that is forked off a login session and then stays around as "orphan" if you so will, i.e. with all session resources released, except for these processes that try hard to avoid clean-up (usually by double forking + detaching explicitly from any TTY/ignoring SIGHUP).

MereInterest · on Jan 29, 2019

As many, many others have stated, ignoring SIGHUP is not a way to "avoid clean-up". It is the explicit and intended method that a program should use to indicate that it should not be cleaned up.

youdontknowtho · on Jan 29, 2019

This has more to do with feelings about you and the perception of you as a "bad guy" than it does about the technical discussion.

I tend to agree with the idea that the choice of defaults belongs to the distro's. If the distro's are deferring to the upstream project on default settings for a critical system component then they need to be more thorough and validate what they are shipping.

v_lisivka · on Jan 29, 2019

Maintaining of all these special cases requires lot of knowledge. If maintainer is responsible for just systemd package, then it's not a problem, but when number of packages per maintainer is measured in hundreds, maintainer will stick to defaults, unless users will complain loudly enough to sacrifice whole working day on the problem.

Redoubts · on Jan 29, 2019

> Maintaining of all these special cases requires lot of knowledge.

Distro maintainers need to have a lot of knowledge about their init system. There's no way out of that. It's probably something everyone should know a little about as well.

inferiorhuman · on Jan 29, 2019

> Distro maintainers need to have a lot of knowledge about their init system. There's no way out of that. It's probably something everyone should know a little about as well.

Then maybe the init system should be simpler and not attempt to ingratiate itself with UEFI or attempt to replace su, sudo, syslogd, netcat, resolvconf, etc.

zbentley · on Jan 30, 2019

> They broke userland.

That alludes to kernel development, which systemd is largely uninvolved with. A userland program chosen by various distributions failed to support conventions from a different userland program. That's all. Were the programs involved fundamental and highly important to many users' experience? Sure. Is busting out "you broke userland" like some magical shibboleth useful as a means of your conveying your unhappiness that your distribution maintainers chose to replace a widely-depended-upon program with a different program useful? I think not.

> they went against POSIX behaviour

Which? There's "tradition" and "specified behaviour". Both are important in different situations and in different degrees.

> You don't turn around and say that all the sysutils should incorporate your new idea.

Why not? They're no more privileged by the POSIX specification, or by the user/kernel -space divide than any other program.

pas · on Jan 30, 2019

POSIX was broken first. It's insecure by default.

Intel, the kernel, even Chrome broke my userland by mitigating Spectre.

It happens.

CRON was and is run as a system service, in its own scope. If you run your own cron instance, but forgot to set it up as a system service, yeah, it gets cleaned up as you exit your shell/session/scope.

xyzzyz · on Jan 29, 2019

> They broke userland.

So? "We don't break userland" is a Linux kernel thing. Systemd is not kernel, it's userland, and userland things break other userland things all the time. They already broke lots of existing stuff when they replaced /etc/init.d/ scripts with systemd definition files, should systemd also have not done that?

> It doesn't matter what tradeoff they made - they went against POSIX behaviour, and as a result, broke numerous utilities, both past and future.

Linux is not POSIX, so I don't see how that's relevant. For what it's worth, I don't even know what part of POSIX it broke. Care to enlighten me?

jimrandomh · on Jan 29, 2019

Right; the Linux kernel has a "we don't break userland" policy, systemd doesn't. That's a selling point for the Linux kernel, and a strike against systemd. Both systemd and the Linux kernel are infrastructure projects which, if they're doing their jobs well, will never cause me problems so I get to ignore them. Systemd has been causing other people problems, and doesn't seem to understand that in the role they're trying to fill, preventing that from happening is their first and most important responsibility.

majewsky · on Jan 29, 2019

Like it or not, the Linux kernel is clearly the outlier in terms of backwards compatibility. For example, Postgres changes their data format in most non-bugfix releases. Would you consider that "a strike against" Postgres?

dooglius · on Jan 29, 2019

They provide an upgrade process that makes this invisible to the end user, so it's not a fair comparison. If it started deleting tables when I exit a session, that would definitely be a strike against it.

SahAssar · on Jan 29, 2019

Postgres has session-bound resources, and in most cases no way to disable those from being deleted when exiting a session. For example in postgres you can't persist a prepared statement, but you can of course persist data within a table. Any function running will be killed when you exit (or at least not complete since the transaction is cancelled).

IMO when a user has logged out and has not had the permissions/foresight to setup a task in the system to run without a session it should be killed.

I get that this has not been the default behavior in linux/UNIX, but to me it seems like the sensible one.

And that's before we ever argue about the possibility to turn it off.

kokada · on Jan 29, 2019

Systemd offer a compile and runtime option to turn this option off, so it is a fair comparison.

shakna · on Jan 29, 2019

I think you're completely missing the point.

If you ruin everyone else's day, and change behaviour everyone else is expecting, then it's probably your own fault.

Approaching it as if everyone should simply change and do what you want, is the height of arrogance. You are generating work for others. And in this particular case, not only are you generating work for others, you are eradicating a category of software.

When a distribution adopts systemd, they let everyone know how things are changing, and slowly transition things over, releasing when stable.

We know systemd replaces init.d. It was difficult, but distributions using systemd got over that hurdle, but it did take time.

However, this is not the same.

Yes, systemd is userland, however it is also PID 1. It is a layer between most userland and the kernel, and so needs to reflect the responsibility of it's position.

Ignoring how NOHUP is supposed to be interpreted, is a _bad idea_, and yes, a violation of POSIX, specifically signals (SIGHUP and nohup), and how they are supposed to be handled.

Moreso, it greatly heightens the difficulty of many utilities that are expected to work.

Why should cron (all implementations of cron), suddenly need to rely on another userland library to maintain it's function?

You just broke most Linux automation. Across an entire industry.

Why should screen (all implementations of screen), suddenly need to rely on a userland library much bigger than most implementations, to continue it's base function?

You just broke an entire category of background systems - including systems communicating with embedded hardware. You might have caused a factory-floor fault. Which could cause injury, or worse.

A breaking change of this level can cause industry-wide ramifications that are not just limited to the digital. Unexpected behaviour is exceptional, and should take time and considerable thought before occurring.

Systemd has responsibility that no other userland system has. It's PID 1.

If they're going to require a massive change in process behaviour, then they are going to require consultation, awareness within the industry, and transition time. They should be working with distributions, aware of the man-hours they're generating, before they put something in place.

BlackFly · on Jan 29, 2019

This discussion is very much apropos of what the article is talking about:

> The whole systemd battle, Rice said, comes down to a lot of disruptive change; that is where the tragedy comes in. Nerds have a complicated relationship to change; it's awesome when we are the ones creating the change, but it's untrustworthy when it comes from outside. Systemd represents that sort of externally imposed change that people find threatening. That is true even when the change isn't coming from developers like Poettering, who has shown little sympathy toward the people who have to deal with this change that has been imposed on them.

The posix violation is by design. If you think that posix dictates the wrong thing, then you will do something different and this is what Poettering has done. The fact that systemd has more or less been embraced by linux is an endorsement of his design philosophy, even if distributions reject specific features.

shakna · on Jan 29, 2019

I am not upset that there was divergence from POSIX.

Design choices are fine - I can understand why systemd takes a different approach.

What I don't like, and completely disagree with, is systemd not working with the community they directly effect to reduce disruption.

Like it or not, the product is an industry standard, and so will be held to industry expectations.

Rather than turning around and requiring everyone to change, they could have said, "Sorry, we're making changes, here are some preliminary patches that could help."

Or a timeline for a breaking change, wherein they can negotiate with others.

I don't have significant issues with systemd's software, though some reservations about quality. My main concern, and it has been since the beginning, is that systemd acts without thought or conscience to the effects that they might cause.

They lack the ability to be a team player, despite creating an environment where people depend on them.

systemd's adoption rates is an absolute credit to it. They have some very good design thoughts, and those working on it have done some excellent work.

However, it would be better if they communicated with the people they effect, rather than letting the community be an accidental Q&A team when things go wrong.

They do get this right sometimes, but that seems to be the exception, rather than the rule.

They approached the init.d situation calmly, and slowly. They worked with Debian, and Fedora and others to make sure it would work without interruption or loss of quality.

They approached the sigkill situation like they were a kid who just learned how to light a fire and wanted to burn the library down.

poettering · on Jan 29, 2019

You make plenty of assumptions there, in particular that there was no communication about the session killing thing. Turns however there was. We informed downstreams about our intention and the reasons in detail, and we documented this for everybody else in NEWS. We also made sure there was an easy compile-time option to pick the default for this option, and then left the rest for the downstreams to decide: whether to default to on or off to this, taking in the information we got from us and from the rest of the community. If you think they made the wrong decision, then complain to them really. But seriously, you really just assume we wouldn't talk to anyone, without actually having any idea what it communication is really taking place.

rauhl · on Jan 29, 2019

> We informed downstreams about our intention and the reasons in detail, and we documented this for everybody else in NEWS.

From The Hitchiker’s Guide to the Galaxy, regarding the plans to destroy the Earth:

‘But the plans were on display …’

‘On display? I eventually had to go down to the cellar to find them.’

‘That’s the display department.’

‘With a flashlight.’

‘Ah, well, the lights had probably gone.’

‘So had the stairs.’

‘But look, you found the notice, didn’t you?’

‘Yes,’ said Arthur, ‘yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying “Beware of the Leopard.”’

Back in the real world: you built & shipped a system whose defaults were and are broken, and now you blame others for not enabling the DONT_BE_WRONG setting. You might as well blame end users for not becoming fully-versed with your code before their first login.

It’s not the users’ fault. It’s not the distros’ fault. It’s yours, and your project’s, for shipping code which breaks the user experience.

I appreciate your vision. It’s a good one. You’re a smart guy. But have some humility! Have a sense of your own limitations, and those of the distros and users who will use your code. You’re a human being; the distros are made up of human beings; your end users are … human beings. Think of them.

Redoubts · on Jan 29, 2019

This is kind of a ridiculous reply. Is the only solution then to admit that Linux is "done"? Because it sounds like there's no room for change, even when change is communicated and multiple options to avoid it are provided.

brmgb · on Jan 29, 2019

> What I don't like, and completely disagree with, is systemd not working with the community they directly effect to reduce disruption.

> Rather than turning around and requiring everyone to change, they could have said, "Sorry, we're making changes, here are some preliminary patches that could help."

> Or a timeline for a breaking change, wherein they can negotiate with others.

But they did exactly that.

They contacted the tmux mainteners and asked if some modifications would be possible to accomodate the new option (see poettering comment here: run things as child of systemd --user or just register a separate PAM session). If I remember correctly, it would not even have been the first special case in tmux ; there already is one for OSX.

The discussion was actually progressing nicely until the anti-systemd flooded it. I remember seeing posts in a lot of place urging people to comment on the bug report with specious arguments. The whole thing was kind of upsetting.

irishsultan · on Jan 29, 2019

They did that 6 days after releasing the version that broke tmux, that's hardly preparing for or negotiating.

youdontknowtho · on Jan 29, 2019

POSIX isn't a law. You don't "violate" POSIX. It's a standard for compatibility. You can choose to not be compatible with a standard when you think it makes sense. That's something that lots of projects do. You are using standards compliance as a moral cudgel.

Your argument is way too impassioned to be just technical. You just basically accused Lennart of hurting people with no evidence whatsoever.

This sort of stuff really doesn't help.

cassianoleal · on Jan 29, 2019

When there is a standard and someone doesn't follow it, it is said that the standard has been violated.

It follows that when someone implements functionality that doesn't follow POSIX, POSIX has been violated.

There's nothing wrong with the statement.

youdontknowtho · on Jan 30, 2019

He accused Lennart of hurting people with no proof. Is that reasonable?

cassianoleal · on Jan 30, 2019

Please point out where in my comment I make any reference to reasonability.

youdontknowtho · on Jan 31, 2019

Apologies for that part, then. I just don't see standards compliance like other people do. Personally, I don't see standards as things that imply some kind of morality. They are tools to accomplish a goal. sometimes other goals may supersede their usefulness.

cassianoleal · on Feb 8, 2019

That is fair enough. I have not argued against your point of view. My comment was more on the linguistic side of things.

You criticised the parent's language saying that "you don't violate a standard" because it "isn't a law". I was just pointing out that you do indeed violate a standard because it's a standard, and saying that does not add any kind of moral or passion value - it's just using the language the way it's intended.

jodrellblank · on Jan 29, 2019

Aren't we just a few weeks after Rich Hickey's "you have no right to make demands of open source software" rant?

Systemd has responsibility that no other userland system has. It's PID 1.

No, you have the responsibility to check what the software you are installing does, and if you don't approve, change it or reject it. Or, don't check, and deal with it.

Systemd developers do not owe you working POSIX, working cron, industry wide working Linux automation, screen, separate userland for everything. They don't owe you anything. If you don't like their thing, don't use their thing.

buster · on Jan 29, 2019

Although I very much like the "don't break userland" approach, I agree with you. Especially in the light, that 1. You can start your background process the systemd way (shown elsewhere in this thread) 2. You can configure the desired behavior 3. Your distro probably already has configured it for you (Debian)

So it comes down to "something changed which is absolutely extremely important for me but I would rather discuss about it for hours then take the few seconds to configure it". Especially since the new behavior is intended behavior and also has upsides for a lot of use cases.

So don't be ungrateful. Be happy that some people are really putting a lot of work behind the software you use daily FOR FREE and just configure the darn thing the way you like.

And last but not least, most people here (me included) are not in the position to complain so much about free software, unless they show some commitment to open source themselves.

Karunamon · on Jan 29, 2019

>If you don’t like their thing, don’t use their thing

Oh how I wish that was a course of action I could reasonably take in this instance...

cyphar · on Jan 29, 2019

> Annoying to have to learn a new thing, but hardly the unbearable burden.

The problem is now your scripts won't work on systems that don't use systemd. Shell scripts work on FreeBSD, but now you can't use them because they require systemd-specific code.

I am not necessarily anti-systemd in most respects (I like a declarative definitions of services and less shell script hell), but the fact that they keep trying to get people (including container runtime developers like myself) to use _their_ API rather than the preexisting ones is fairly "anti-social".

poettering · on Jan 29, 2019

Aleksa,

I am not trying to get you to use our APIs. You talking about the cgroups APIs again, if I am not mistaken? As I tried to explain again and again: if you want container runtimes to manage their own cgroups then just set Delegate=yes in the unit file of your manager, get your own cgroup subtree, and you can do below it whatever you want, you do not have to call into systemd ever. Not a single API call, no C call, no D-Bus call, nothing. You get your own kingdom if you set Delegate=yes, and systemd won't interfere with that. This is extensively documented.

I wished you'd actually listen to what I keep repeating to you. We tried to be really nice to container managers, knowing that they disklike systemd APIs, so we put a lot of work in making the delegation boundary clean, so that they can be entirely systemd agnostic beyond setting the Delegate=yes boolean in their unit file, but alas, we just keep hearing the same nonsense.

The LXC/LXD people btw did get this right: they manage their own cgroup subtree now, and systemd doesn't interfere, and they don't link to or do dbus calls into systemd either.

cyphar · on Jan 29, 2019

> then just set Delegate=yes in the unit file of your manager

In runc we don't have a dedicated manager or long-running daemon. Yes, Docker and cri-o use Delegate=yes (so I am quite aware of this option) but that really doesn't help people who are using runc in their own user sessions or wrote their own wrapper and aren't aware of Delegate=yes.

I get that we are quite odd, and don't fit into a system-service model. After all of the back-and-forth with both you and Tejun (especially when it comes to "rootless" delegation -- which systemd only offers if you get a privileged user to delegate for you), I'm not sure that there's much I can do on this topic. I get that what I care about is not something you care about, but I would hope you accept that I'm not just being obstinate for the sake of it.

> Not a single API call, no C call, no D-Bus call, nothing.

Right, unless you need to set this up for someone else. And we have code that does this too -- I don't really recommend people use it, but it is necessary (and I'm pretty sure some folks at Red Hat use it based on how many bug reports they submit related to it).

Since systemd is managing the entire cgroupv2 tree (and the fact we can get around that for cgroupv1 appears to be seen as a design flaw by both you and Tejun), obviously we have to talk to systemd to do this type of thing. I just wish this wasn't the way it was done (and if cgroupv2 had a named cgroup concept -- which is what systemd needs for tracking services -- I would think that this wouldn't be such a pain-point).

I guess I'm just annoyed that we can't use "better rlimits" with "rootless" container runtimes because of all of this.

> I wished you'd actually listen to what I keep repeating to you.

I am listening, and I am aware of Delegate=yes and all of that history. But as I outlined above, I don't necessarily agree with it entirely. And unlike a lot of people around here, I don't think any of these pain-points are coming up because of malice or something stupid like that -- I just think we disagree on our priorities.

> We tried to be really nice to container managers, knowing that they disklike systemd APIs, so we put a lot of work in making the delegation boundary clean

Don't get me wrong -- I do appreciate that we have Delegate now (there was a period of several years where "systemd decided to reorganise the cgroup tree, un-containing my containers" happened on several occasions -- and Delegate solved those issues).

And from what I've heard from the LXC folks, you were quite reasonable about getting systemd to work inside LXC. Which is good to hear.

> The LXC/LXD people btw did get this right: they manage their own cgroup subtree now, and systemd doesn't interfere, and they don't link to or do dbus calls into systemd either.

We do basically the same thing. We just don't support cgroupv2.

stiff · on Jan 29, 2019

They changed a decades-old behavior many people rely on, and it must have been obvious from the start people will loose work because of it.

nerdponx · on Jan 29, 2019

It's a bug because it violates the expectations of an uninformed user. You aren't given a warning about it, it's not documented in big bold letters anywhere, and it's also not POSIX compliant.

zorpner · on Jan 29, 2019

Annoying to have to learn a new thing, but hardly the unbearable burden.

Rather, a breaking change to everyone's scripts and processes for zero benefit.

Tor3 · on Jan 29, 2019

Our scripts and tools work similarly on the four Unix systems we have in-house. Are you saying that it's OK that they don't work on Linux? Please do not forget that Linux is a POSIX system, basically a re-implementation of Unix, and until systemd it's been a fully compliant -nix system. Where I work we have transparently been able to deploy our products on all -nix, including Linux, since the nineties.

EDIT: My reply was supposed to be to xyzzys's post below, not the one I apparently replied to.. sorry about that.

xyzzyz · on Jan 29, 2019

There's a benefit, you're just not seeing it. Again, do you think that the systemd developers decided to implement it just to screw with people? As I said, there's a specific trade-off involved here.

I agree that it might not be the most desirable default, but if that's the case, then the guilt also falls on the distribution maintainers, who either ignored the big bold letters in the changelog, or didn't bother to test the everyone's standard workflows before pushing to stable.

inferiorhuman · on Jan 29, 2019

> Again, do you think that the systemd developers decided to implement it just to screw with people?

Based on Lennart's behavior, yes I do.

michaelmrose · on Jan 29, 2019

Instead of pretending the benefit is so obvious it doesn't require you to discuss it perhaps you could explain it.

dvfjsdhgfv · on Jan 29, 2019

Not the parent nor Systemd developers, but apparently they think it's the only way to make sure the user's session is cleaned up.

But frankly, 100% people would be fine with it if the default was left at no instead of changing it to yes. It's all about giving users a choice when a new feature is introduced, something Systemd developers understand only partially.

zorpner · on Jan 29, 2019

There's a benefit, you're just not seeing it.

Not to appeal to self-authority, but I have been maintaining production Linux systems in large-scale environments since the late 90s. If there were a benefit that outweighed the unnecessary breaking changes, I would see it, even if I didn't appreciate it. There isn't.

You should stop and think before you assume that other people are incompetent, both because it would make you a better interlocutor, and as a bonus it wouldn't violate HN's principle of charity.

xyzzyz · on Jan 29, 2019

The benefit is, of course, clean up of orphan defunct processes. One might argue if this is outweighing the drawback of the change (it might not, but that’s what some distro maintainers chose to enable), but you shouldn’t suggest that they just broke you for no purpose, instead, you should stop and think before you assume that other people are incompetent, both because it would make you a better interlocutor, and as a bonus it wouldn't violate HN's principle of charity.

zorpner · on Jan 30, 2019

Your copy/paste doesn't apply to my comment, since I didn't assume you were incompetent, just that you'd made an overaggressive claim you didn't care to back up.

Of course, a defense of systemd's comically broken reaping behavior removes all necessity for assumption in this case. sysvinit at least consistently reaps on SIGCHLD -- systemd randomly reorders into the sd-event API and then does something random based on the order receipt.

xyzzyz · on Jan 30, 2019

> Your copy/paste doesn't apply to my comment, since I didn't assume you were incompetent, just that you'd made an overaggressive claim you didn't care to back up.

Sorry, I assumed you're competent enough to figure it out, or at least look at the original sources where authors of the change explicitly explain the reason why they do it. Of course, since you assumed that they are incompetent, you didn't bother to do so, instead, completely uncharitably assumed that there's zero benefit for that.

andrewshadura · on Jan 29, 2019

I'm sorry to bring bad news, but there's indeed a benefit, you just don't see it.

zorpner · on Jan 29, 2019

Surely it can be articulated, then.

andrewshadura · on Jan 30, 2019

It was, many times, you can just google and educate yourself.