Personally I’m not a fan of Go’s default zero-initialisation. I’ve seen many bugs caused by adding a new field, forgetting to update constructors to intialise these fields to “non-zero” values which caused bugs. I prefer Rust’s approach where one has to be explicit.
That being said it’s way less complex than C++’s rules and that’s welcomef.
I spent a year and a half writing go code, and I found that it promised simplicity but there an endless number of these kinds of issues where it boils down to "well don't make that mistake".
It turns out that a lot of the complexity of modern programming languages come from the language designers trying to make misaked harder.
If you want to simplyfing by synthesising decades of accumulated knowledge into a coherent language, or to remove depreciated ideas (instead of the evolved spaghetti you get by decades of updating a language) then fine. If your approach to simplicity is to just not include the complexity, you will soon disciplinary that the complexity was there for a reason.
The problem you are describing in Go is rarely a problem in C++. In my experience, a mature code base rarely has things with default constructors, so adding a new field will cause the compiler to complain there's no default constructor for what you added, therefore avoiding this bug. Primitive types like `int` usually have a wrapper around them to clarify what kind of integers, and same with standard library containers like vector.
However I can't help but think that maybe I'm just so fortunate to be able to work in a nice code base optimized for developer productivity like this. C++ is really a nice language for experts.
Compare `int albumId, songId;` versus `AlbumId albumId; SongId songId;`. The former two variables can be assigned to each other causing potential bug and confusion. The latter two will not. Once you have a basic wrapper for integers, further wrappers are just a one-liner so why not. And also in practice making the type more meaningful leads you to shorter variable names because the information is already expressed in types.
Wouldn’t it just be considered bad practice to add a field and not initialize it? That feels strongly like something a code review is intended to catch.
It’s easy to miss this in large codebases. Having to check every single struct initalisation whenever a field is added is not practical. Some folks have mentioned that linters exist to catch implicit initialisation but I would argue this shouldn’t require a 3rd party project which is completely opt-in to install and run.
I just finished this. To each their own of course, but I found the writing too padded and tonally off-putting at times. Some of the stories felt dated both from a technological stance and a cultural stance. I prefer Azure's Cloud Pattern docs myself (though "Release It!" was really good if you prefer a storytelling approach):
This looks incredibly comprehensive, thanks for sharing!
Should have added that I read this book in 2016, and the first edition is even older, so there’s naturally been lots of new (and exciting) developments in this area!
Overall, I am for frame pointers, but after some years working in this space, I thought I would share some thoughts:
* Many frame pointer unwinders don't account for a problem they have that DWARF unwind info doesn't have: the fact that the frame set-up is not atomic, it's done in two instructions, `push $rbp` and `mov $rsp $rbp`, and if when a snapshot is taken we are in the `push`, we'll miss the parent frame. I think this might be able to be fired by inspecting the code, but I think this might only be as good as a heuristic as there could be other `push %rbp` unrelated to the stack frame. I would love to hear if there's a better approach!
* I developed the solution Brendan mentions which allows faster, in-kernel unwinding without frame pointers using BPF [0]. This doesn't use DWARF CFI (the unwind info) as-is but converts it into a random-access format that we can use in BPF. He mentions not supporting JVM languages, and while it's true that right now it only supports JIT sections that have frame pointers, I planned to implement a full JVM interpreter unwinder. I have left Polar Signals since and shifted priorities but it's feasible to get a JVM unwinder to work in lockstep with the native unwinder.
* In an ideal world, enabling frame pointers should be done on a case-by-case. Benchmarking is key, and the tradeoffs that you make might change a lot depending on the industry you are in, and what your software is doing. In the past I have seen large projects enabling/disabling frame pointers not doing an in-depth assessment of losses/gains of performance, observability, and how they connect to business metrics. The Fedora folks have done a superb and rigorous job here.
* Related to the previous point, having a build system that enables you to change this system-wide, including libraries your software depends on can be awesome to not only test these changes but also put them in production.
* Lastly, I am quite excited about SFrame that Indu is working on. It's going to solve a lot of the problems we are facing right now while letting users decide whether they use frame pointers. I can't wait for it, but I am afraid it might take several years until all the infrastructure is in place and everybody upgrades to it.
On the third point, you have to do frame pointers across the whole Linux distro in order to be able to get good flamegraphs. You have to do whole system analysis to really understand what's going on. The way that current binary Linux distros (like Fedora and Debian) works makes any alternative impossible.
It could be one instruction: ENTER N,0 (where N is the amount of stack space to reserve for locals)---this is the same as:
PUSH EBP
MOV ESP,ESP
SUB SP,N
(I don't recall if ENTER is x86-64 or not). But even with this, the frame setup isn't atomic with respect to CALL, and if the snapshot is taken after the CALL but before the ENTER, we still don't get the fame setup.
As for the reason why ENTER isn't used, it was deemed too slow. LEAVE (MOV SP,BP; POP BP) is used as it's just as fast as, if not faster, than the sequence it replaces. If ENTER were just the PUSH/MOV/SUB sequence, it probably would be used, but it's that other operand (which is 0 above in my example) that kills it performance wise (it's for nested functions to gain access to upper stack frames and is every expensive to use).
Great comments, thanks for sharing. The non-atomic frame setup is indeed problematic for CPU profilers, but it's not an issue for allocation profiling, Off-CPU profiling or other types off non-interrupt driven profiling. But as you mentioned, there might be ways to solve that problem.
There's always room for improvement, for example, Samply [0] is a wonderful profiler that uses the same APIs that `perf` uses, but unwinds the stacks as they come rather than dumping them all to disk and then having to process them in bulk.
Samply unwinds significantly faster than `perf` because it caches unwind information.
That being said, this approach still has some limitations, such as that very deep stacks won't be unwound, as the size of the process stack the kernel sends is quite limited.
Inlined functions can be symbolized using DWARF line information[0] while unwinding requires DWARF unwind information (CFI), which the x86_64 ABI mandates in every single ELF in the `.eh_frame` section
So not a perf issue there, but they don't think the workflow is suitable for whole-system profiling. Perf issues were in the context of `perf` using DWARF:
Once it’s loaded in memory, if Kernel Samepage Merging is enabled it might not be as bad, but would love to hear of somebody has any thoughts
https://docs.kernel.org/admin-guide/mm/ksm.html
> KSM only merges anonymous (private) pages, never pagecache (file) pages.
So it wouldn't be able to help with static libraries loaded from different executables. (At any rate, they'd have to be at the same alignment within the page, which is unlikely without some special linker configuration.)
It uses eBPF to provide instrumentation of the kernel calls up as well as hooking into networking for http2 pgsql etc. Since it’s running the process in eBPF it’s essentially sandboxed and all memory, kernel function calls, and even profiling, is an option. They have an agent that collects this information and sends to the server over RPC (protobuf/grpc). You should check it out (however, some of the docs are in Chinese).
> High-performance storage engines. There are a number of storage engines and key-value stores optimized for flash. RocksDB [36] is based on an LSM-Tree that is optimized for low write amplification (at the cost of higher read amplification). RocksDB was designed for flash storage, but at the time of SATA SSDs, and therefore cannot saturate large NVMe arrays.
From this slightly tangent mention, I am guessing not.
Curious what the overhead of dealing with files in Go would be if finalisers were cheaper / not in use.
Sorry there’s no link to the source, AFK right now, but files opened with os.Open will be automatically closed once all its references have been collected.
Found out about this behaviour some months ago while debugging some code at work.
But you'd be forgiven for not knowing that: To my own surprise, I could not find the need for this idiom explained in the package documentation for os.Open, though you can see it in action throughout the std implementation. For example: https://cs.opensource.google/go/go/+/refs/tags/go1.20.4:src/...
FWIW, I never advocated for not explicitly calling Close(). Brought up the finalizer in File, because it seems to have performance implications and it’s called for every file handle.
> Fd returns the integer Unix file descriptor referencing the open file. If f is closed, the file descriptor becomes invalid. If f is garbage collected, a finalizer may close the file descriptor, making it invalid; see runtime.SetFinalizer for more information on when a finalizer might be run.
To give more context on how I found this out, we were missing a Close for a couple of files in our codebase. As soon as I realised I added them, but checked in production for file descriptor leaks and there were none. Checking Go’s source code led me to the finalizer and this doc confirmed what I saw in the code.
Strong disagree. If a long-running program opens a lot of files, has no references to them and they are not closed, those files will stay open, using system resources. There's no way for the Go code to close them. Yes, it's a bug to open a file and lose all references before closing it, but the runtime is doing the right thing by closing them for the program.
You could go the other way: the program is incorrect if it doesn't explicitly close files. Allowing incorrect programs to somehow work correctly only encourages programmers to create more advanced incorrect programs, until they exceed the ability of the language/runtime to fix.
I tend more towards this line of thought. But I'd still put a finalizer on file handles. I just would have it yell very loudly that there's a bug. Maybe even close the file, but there's no way I wouldn't at the very least be generating a lot of diagnostics saying things were broken.
I agree with your take on correctness. The problem that you run into is that GC and thus finalization is triggered by memory pressure. But closing a file doesn't relieve memory pressure, it relieves FD pressure. So what you'd want is some sort of native thing not related to GC that closes unused files.
This isn't the only resource like this. If you write to a lot of temporary files, you'd want to start cleaning those up before the OS returns "no space left on device". If you listen on a lot of TCP ports, you want to clean up stale listeners before you've listened 65535 times. If you span a subprocess, you want to start reaping the children when PID pressure prevents creating future subprocesses.
This all ends up being rather esoteric so many programming languages just say "tell me as soon as you're done" and you pay the cost right then. This isn't optimal (especially "x := Open(thing); defer x.Close(); do something with x; do a bunch of slow stuff without x"). A big downside is that it's possible to have an object in memory that refers to something in an invalid state; you can write "x := Open(thing); x.Close(); x.Read()" just fine in many languages; the code compiles and it runs, but it returns incorrect results. This could be an area of focus for future languages, but I doubt much of this would be general. A lot of nitty gritty specifics depending on what resources you want to track.
It would also be weird to not close files just because someone turned GC off, right? If you tie finalizers to memory management, if the memory management isn't under memory pressure, then you start leaking FDs. Weird! All in all, the memory management subsystem is a weird place to manage non-memory resources. So to get this right, you really need something GC-like for everything that's not memory.
I think people are looking for something like, or are already very comfortable with, C++'s RAII. "I am certain this variable is allocated on the stack, so the compiler should fail to compile if that's untrue, and a Close method should be called when the function returns." But sometimes you will want the resource to be allocated beyond the current stack frame, and now you need special syntax for that. (Newer languages like to avoid exposing stack vs. heap to the programmer and let The Algorithm pick the best place. C++ never considered that an option so you do get some extra flexibility when you need it.)
Anyway this is a long post to say "all options are bad".
Sure. But if you think that a file will be closed by the runtime "when there are no more references to it" then you're prone to assume that that the file must be closed when you later reopen the file or pass that filename to some other process . And most of the times the file will be indeed closed, until the GC just happens to run the finalized a little bit later than usual.
It's very hard to debug such bugs.
Imo it's much easier to make the leak more prominent so that people are incentivised to properly close the files
It's possible to use Go's pprof to find leaks. Also, if it's important enough then runtime.SetFinalizer(fd, any) will allow the programmer to do something appropriate.
Granted, but that only works for a subset of all cases. Remember that FDs can also refer to sockets and other things that aren't "files". A careful programmer will ensure that all FDs get cleaned up, but even careful programmers can fall victim to corner cases.
I don't see why 'defer' wouldn't work for FDs or any other resource. There's nothing about 'defer' that says it will work only with 'files'. In fact I've been using it for general-purpose resource management for a while now.
That being said it’s way less complex than C++’s rules and that’s welcomef.