Hacker Newsnew | past | comments | ask | show | jobs | submit | fuzzybear3965's commentslogin


Of course, I saw that, but if the text of the book is not freely available, then the examples wouldn't really be helpful, no?

So buy the book? The expectation of free stuff is all too common.

Regardless, a link to a repo of disjointed examples is not very interesting or helpful.

If you don't wanna pay, Library Genesis has the first edition (2004), but, if you didn't find the examples to be at least modestly interesting in themselves, is this even your bag? As a Linux sysadmin and occasional writer of lousy C programs, I often consult NetBSD's source tree for when I want good examples that aren't as complex as GNU's, so I expect to come back to these.

Judging by the publisher's sample,[1] the second edition (2025) looked like a worthwhile upgrade, so I ordered it. Much of the material is in the manpages, but this presents it with better explanations.

___

1. <https://ptgmedia.pearsoncmg.com/images/9780135325520/samplep...>



Or perhaps maybe rather free stuff is all too uncommon…

Sure, but on principle, looking at the paper, I'd expect it to outperform B-trees since write amplification is reduced, generally. You thinking about cases requiring ordering of writes to a given record (lock contention)?


I think their claims of write amplification reduction are a bit overstated given more realistic workloads.

It is true that b-trees aren't ideal in that respect, and you will see some amount of write amplification, but not enough that it should be a major consideration, in my experience

You really have to take into account workingset size and cache size to make any judgements there; your b-tree writes should be given by journal/WAL reclaim, which will buffer up updates.

A purely random update workload will kill a conventional b-tree on write amplification - like I mentioned, that's the absolute worst case scenario for a b-tree. But it just doesn't happen in the real world.

For the data I can give you, that would be bcachefs's hybrid b-tree - large btree nodes (256k, typically) which are internally log structured; I would consider it a minor variation on a classical b-tree. The log structuring mean that we can incrementally write only the dirty keys in a node, at the cost of some compaction overhead (drastically less than a conventional LSM).

In actual real world usage, when I've looked at the numbers (not recently, so this may have changed) we're always able to do giant highly efficient b-tree writes - the journal and in-memory cache are batching things up as much as we want - which means write amplification is negligible.


Also you can use dense B+-Trees for reads possibly with some bloom filters or the like if you expect/profile a high fraction of negative lookups, use LSM to eventually compact, and get both SSD/ZNS friendly write patterns as well as full freedom to only compact a layer once it's finer state is no longer relevant to any MVCC/multi-phase-commit schemes. Being able to e.g. run a compression algorithm until you just exceed the storage page size, take it's state from just before it exceeded, and begin the next bundle with the entry that made you exceed the page size.... It's quite helpful when storage space or IO bandwidth is somewhat scarce.

If you're worried about the last layer being a giant unmanageably large B+-Tree, just shard it similarly in key space to not need much free temporary working space on SSD to stream the freshly compacted data to while the inputs to the compaction still serve real time queries.


Of course mileage may vary with different workloads, but are there any good benchmarks/suites to use for comparison in cases like these? They used YCSB but I don't know if those workloads ([1]) are relevant to modern/typical access patterns nor if they're applicable to SQL databases.

You thinking about running some benchmarks in a bcachefs branch (:pray:)?

I want to see this data structure prototyped in PostgreSQL.

[1]: https://github.com/brianfrankcooper/YCSB/tree/master/workloa...


I've got microbenchmarks for the bcachefs btree here: https://evilpiepirate.org/git/bcachefs.git/tree/fs/bcachefs/...

They're ancient, I only have pure random and sequential benchmarks - no zipf distribution, which really should be included.

Feel free to play around with them if you want :) I could even find the driver code, if you want.

I've always been curious about PostgreSQL's core b-tree implementation. I ran into a PostgreSQL developer at a conference once, and exchanged a few words that as I recall were enough to get me intrigued, but never learned anything about it.

In a system as big, complex and well optimized as either bcachefs or postgres, the core index implementation is no longer the main consideration - there's layers and layers, and the stuff that's fun to optimize and write paper about eventually gets buried (and you start thinking a lot more about how to lay out your data structures and less about optimizing the data structures themselves).

But you know in something like that there's going to be some clever tricks, that few people know about or even remember anymore :)


I think a better candidate to prototype would be SQLite, at least to have a better sense of how would bf-tree behave on real world


Weeks? Try years. As a parent of a newborn, I can confidently say that my wife hasn't had a full night's sleep for even one night since the kid was born in early July - she watches the kid at night (we live in a one bedroom apartment with the crib in our room - it affects me, too, but I try to tune it out and focus on sleep, as breadwinner). She also watches the kid all day while I work (9-11 hours). She's drained. I don't see this getting any easier for at least 12 months. It's a long road.

We don't have family to support us and we can't afford childcare. So, that's our situation. Not complaining, just saying - it's not weeks of a little bit less sleep. It's chronically interrupted sleep for months, maybe years (according to Dr. Ferber), with severe affects to hormonal regulation and mood and weight.


I have two kids. After the first 3 months they were sleeping 7-8 hours through the night.


I hope not, three-month-olds are supposed to sleep 14-17 hours a day. Also, what happened when they dropped naps? Night sleep typically worsens right beforehand, as the residual nap pushes their lengthening wake window into night sleep. How did you manage their congestion when they were sick? Many kids, especially those using a pacifier, lose the ability to link their short, 45-minute sleep cycles and need to be comforted throughout the night even into toddlerhood.


7-8 hours at a stretch, there are more hours of sleep that happen on either side of that stretch.

Congestion with those little pipe things that you can used to suck the snot out of the baby’s nose.

We never used a pacifier. The baby slept in the same room and my wife nursed her for three years.


Not all kids are the same. Shocking I know…


Okay. Deal.


I'm not sure what "memory efficient" means. But, Go sprung as a competitor to Java (portability, language stability, corporate language support/development) and C++ (faster compile times). Can't beat C++ in terms of memory management (performance, guys, not safety) much. But, you can fare well against the JVM, I'm guessing.


In this benchmark actually no, Go doesn't fare well. There is actually higher static overhead per goroutine than JVM VirtualThread. I presume this is because of a larger initial stack size though/

This probably doesn't matter in the real world as you will actually use the tasks to do some real work which should really dwarf the static overhead is almost all cases.


Yeah, I think you're wrong. It should only take ~10s. tokio::time::sleep records the time it was called before returning the future [1]. So, all 1 million tasks should be stamped with +/- the same time (within a few milliseconds).

[1]: https://docs.rs/tokio/1.41.1/src/tokio/time/sleep.rs.html#12...


This makes total sense!


Yep. ~$.33/kWh in Southern California (SoCal Edison) and going up all the time!


$0.51/KwH during peek hours in San Francisco!


Or like Pulumi?


Wait. Am I missing something? Isn't that ~90m of car time assuming mild traffic?


I don't think they were flying SFO to SJC, rather they considered rebooking to SJC to avoid the delays at SFO


SJC is so much better. Honestly it's almost worth going to SJC and taking Caltrain despite the extra time and cost just due to how much nicer it is


When SFO is operating on only one of its two runways (common due to low clouds, now due to runway construction) then 3h+ delays are par for the course for flights in the evening.


I don't understand how this works in the case of testing many applications running on many machines, where many services on many machines need to communicate with each other. We deploy a mix of systemd services and OCI containers (running on podman and Docker) to different machines, the exact mix on each machine depends on the machine's intended purpose.

We currently run CI tests using QEMU VMs. These VMs comprise a few systems representative of those that we deploy to production.

Does adopting Antithesis mean that all non-containerized applications would need to be OCI-ified and every interaction would need to be mocked? There's a sort of combinatorial explosion that I'm concerned about when I'm thinking about testing/adding a new service to a system: All services on which it depends need to be mocked and all services which depend on it require creating a mocked version of it.

Seems like a lot of work. Can someone please help clarify things for me?

Also, how could we test the behavior of non-application code like drivers or the kernel itself?


The relevant part is

> The Antithesis environment simulates one or more computers using a collection of containers, all running within a single virtual machine managed by our hypervisor.

No mocking needed, but everything needs to share the single VM.

And it sure sounds like they run a custom kernel in the guest, so this is not for kernelspace testing:

> Since the Antithesis platform controls the guest’s scheduler,


I believe they have a discord for asking questions directly to their engineers https://discord.gg/75cBWkbC


Thanks!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: