Hacker Newsnew | past | comments | ask | show | jobs | submit | david_xia's commentslogin

Hey, didn't notice your comment when I posted mine. But I'm seeing similar behavior on GKE 1.15.12-gke.3: CPU throttling even when CPU usage < CPU limits. https://news.ycombinator.com/item?id=24351566


I work on a team that operates multitenant GKE clusters for other engineers at our company. Earlier this year I read this blog post [1] about a bug in the Linux kernel that unnecessarily throttles workloads due to a CFS bug. Kernel versions 4.19 and higher have been patched. I asked GCP support which GKE versions included this patch. They told me 1.15.9-gke.9. But my team at work is still getting reports of CPU throttling causing increased latencies on GKE workloads in these clusters.

This means

1. we're using a kernel that doesn't contain the patch. 2. the patch wasn't sufficient to prevent unnecessary CPU throttling 3. latency is caused by something other than CPU throttling

To rule out 1, I again checked that our GKE clusters (which are using nodes with Container Optimized OS [COS] VM images) are on a version that contains the CFS patch.

```

dxia@one-of-our-gke-nodes ~ $ uname -a Linux one-of-our-gke-nodes 4.19.112+ #1 SMP Sat Apr 4 06:26:23 PDT 2020 x86_64 Intel(R) Xeon(R) CPU @ 2.30GHz GenuineIntel GNU/Linux

```

Kernel version is 4.19.112+ which is a good sign. I also checked the COS VM image version.

gke-11512-gke3-cos-77-12371-227-0-v200605-pre

The cumulative diff for [COS release notes][2] for cos-stable-77-12371-227-0 show this lineage (see "Changelog (vs ..." in each entry).

cos-stable-77-12371-227-0 77-12371-208-0 77-12371-183-0 77-12371-175-0 77-12371-141-0 <- This one's notes say "Fixed CFS quota throttling issue."

Now looking into 2:

This dashboard [5]. Top graph shows an example Container's CPU limit, request, and usage. The bottom graph shows the number of seconds the Container was CPU throttled as measured by sampling the local kubelet's Prometheus metric for `container_cpu_cfs_throttled_seconds_total` over time. CPU usage data is collected from resource usage metrics for Containers from the [Kubernetes Metrics API][6] which is returns metrics from the [metrics-server][7].

The first graph shows usage is not close to the limit. So there shouldn't be any CPU throttling happening.

The first drop in the top graph was decreasing the CPU limit from 24 to match the CPU requests of 16. The decrease of CPU limit from 24 to 16 actually caused CPU throttling to increase. We removed CPU limits from the Container on 8/31 12:00 which decreased number of seconds of CPU throttling to zero. This makes me think the kernel patch wasn't sufficient to prevent unnecessary CPU throttling.

This K8s Github issue ["CFS quotas can lead to unnecessary throttling #67577"][8] is still open. The linked [kernel bug][9] has a comment saying it should be marked fixed. I'm not sure if there are still CPU throttling issues with CFS not tracked in issue #67577 though.

Because of the strong correlation in the graphs between removing CPU limits and CPU throttling, I'm assuming the kernel patch named "Fixed CFS quota throttling issue." in COS 77-12371-141-0 wasn't enough.

Questions

1. Anyone else using GKE run into this issue?

2. Does anyone have a link to the exact kernel patch that the COS entry "Fixed CFS quota throttling issue." contains? A Linux mailing list ticket or patch would be great so I can see if it's the same patch that various blog posts reference.

3. Anyone aware of any CPU throttling issues in the current COS version and kernel we're using? 77-12371-227-0 and 4.19.112+, respectively.

[1]: https://medium.com/omio-engineering/cpu-limits-and-aggressiv...

[2]: https://cloud.google.com/container-optimized-os/docs/release...

[5]: https://share.getcloudapp.com/o0u8KoEn

[6]: https://kubernetes.io/docs/tasks/debug-application-cluster/r...

[7]: https://github.com/kubernetes/kubernetes/tree/master/cluster...

[8]: https://github.com/kubernetes/kubernetes/issues/67577

[9]: https://bugzilla.kernel.org/show_bug.cgi?id=198197

[COS]: https://cloud.google.com/container-optimized-os/docs


Hey David, we talked on a podcast once :) Please raise a support case and send me the ticket number; I'll see if we can get to the bottom of this for you.


Thanks. It seems like these are specifically for web pages where you need to add a script tag to a page you control.

I was thinking very generally of just having the device specify an HTTP(S) proxy and having that remote server be able to inspect any HTTP(S) traffic whether its from a browser or native apps.



I think you are mistaking the dramatic flourishes of the essay for its true message. It's a well-written piece that has a twist ending (not entirely unexpected).

"Maybe we go out in order to fall short . . . because we want to learn how to be good at being people . . . and moreover, because we want to be people."

She's saying that addiction to people is not the same as addiction to cigarettes. This is a good kind of addiction because human beings are by nature social creatures and should interact with one another.


I agree with your assessment of the essay's conclusion. All I'm saying is that the essay's whole thought process began with the basic problem that social situations often backfire for the author in very negative ways. If these negative experiences didn't cut so deep, it might not have been necessary for her to spend 2500 words convincing herself that some socializing is better than completely cutting yourself off from the world.

I could be totally wrong about all this, and I really regret if my original comment comes off sounding at all presumptuous. My point is just that if indeed the author suffers in the way that the essay describes (which sounds totally believable to me), I have a lot of compassion, but also hope that things can be better for her.


FWIW, I think your comments DO some off a bit presumptuous, and a bit condescending, despite you saying that you have a lot of compassion and are incredibly empathetic.

Using language like "This is not normal" and "crippling anxiety" is really making a lot of assumptions about the author. It kind of presupposes that you know what's normal. Of course it's possible that you actually do, maybe you're a psychologist or therapist, and as an expert can assert that you do know what's normal, but I would argue you can't infer enough from this piece to know for sure.

There was a review of Sheila Heti's new book in this week's NYer, and from that I gleaned that her work draws from life but also involves a lot of dramatization and fictionalization of her life, and isn't entirely non-fiction.


That's fair. In retrospect I wish I had written my comment in more of the form: "If X, then Y", instead of presuming X. If the essay is fictionalized and she didn't actually move to a new city to avoid social interaction, or find herself totally consumed for weeks by one mean comment, then surely none of what I am saying applies.

Re: "this is not normal," if I sound like I'm claiming some kind of authority, it is only the authority of personal experience. I could have written those passages that I quoted, and not too long ago either. Those were my "normal." Once I experienced what it was like to feel secure and free of those unmanageable emotional responses, I found it incredibly liberating.

The strong belief I took from this experience is: no one should have to live that way. No one should have to structure their life around avoiding things that they emotionally can't handle. I can't say with certainty that this is her situation, but I know it was mine.


I would also like to see a detailed explanation of how they prepared the box to safely allow people to play (ie logins, permissions, etc). How to restrict privileges yet allow just enough to make it varied and fun.


The big thing is that we are connected into a chroot that has neither /proc nor /sys mounted, and which itself is on a read-only filesystem.


First play with the webpage after entering the correct credentials. Then read through the PHP script that generates that page and understand what's going on behind it. Do you see any vulnerabilities in it?


Anyone have the text? There's some database error right now.



I think these socially sponsored ads are going to be money-making and also insidious if done right. It's a nuanced, strange mix of social content and advertising. They're trying to make all ads social or all their open graph actions into ads. If you play this out to its logical conclusion, that's kinda scary.

Imagine a world where a large chunk of your online interactions are actually being used as content/ads targeted at your friends.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: