It's great to see paths forward for the MySQL community! Exciting to see this become public.
Does the GPL2 license here mean that anyone writing extensions for VillageSQL must license extensions as GPL2? If so that's still quite far from the Postgres ecosystem where plenty of companies are built largely around being proprietary extensions.
Founder and CEO here. Great question. Right now, since you are linking MySQL files that means that all the extenions need to be GLPv2 or compatible license. In the future, we are hoping to have a stable SDK that is indepedent of the MySQL files so that if you run it out of process, that would give more licensing options to folks. There is a proof of concept of an out-of-process mode for the extensions in the alpha, but we haven't spent much time on proving it out.
The article addresses this, sort of. I don't understand how you can run multiple postmasters.
> Most online resources chalk this up to connection churn, citing fork rates and the pid-per-backend yada, yada. This is all true but in my opinion misses the forest from the trees. The real bottleneck is the single-threaded main loop in the postmaster. Every operation requiring postmaster involvement is pulling from a fixed pool, the size of a single CPU core. A rudimentary experiment shows that we can linearly increase connection throughput by adding additional postmasters on the same host.
You don't need multiple postmasters to spawn connection processes, if you have a set of Postgres proxies each maintaining a set pool of long-standing connections, and parceling them out to application servers upon request. When your proxies use up all their allocated connections, they throttle the application servers rather than overwhelming Postgres itself (either postmaster or query-serving systems).
That said, proxies aren't perfect. https://jpcamara.com/2023/04/12/pgbouncer-is-useful.html outlines some dangers of using them (particularly when you might need session-level variables). My understanding is that PgDog does more tracking that mitigates some of these issues, but some of these are fundamental to the model. They're not a drop-in component the way other "proxies" might be.
> I don't understand how you can run multiple postmasters.
I believe they're just referring to having several completely-independent postgres instances on the same host.
In other words: say that postgres is maxing out at 2000 conns/sec. If the bottleneck actually was fork rate on the host, then having 2 independent copies of postgres on a host wouldn't improve the total number of connections per second that could be handled: each instance would max out at ~1000 conns/sec, since they're competing for process-spawning. But in reality that isn't the case, indicating that the fork rate isn't the bottleneck.
Great educational project! I'm curious why you are using Raft and also 2PC unless you're sharding data and doing cross-shard transactions? Or is Raft only for cluster membership but 2PC is for replicating data? If that's the case it kind of seems like overkill but I'm not sure.
Few distributed filesystems/object stores seem to use Raft (or consensus at all) for replicating data because it's unnecessary overhead. Chain replication is one popular way for replicating data (which uses consensus to manage membership but the data path is outside of consensus).
Thank you for this sharp and detailed question!
In minikv, both Raft and 2PC are purposefully implemented, which may seem “overkill” in some contexts, but it serves both education and production-grade guarantees:
- Raft is used for intra-shard strong consistency: within each "virtual shard" (256 in total), data and metadata are replicated via Raft (with leader election and log replication), not just for cluster membership;
- 2PC (Two-Phase Commit) is only used when a transaction spans multiple shards: this allows atomic, distributed writes across multiple partitions. Raft alone is not enough for atomicity here, hence the 2PC overlay;
- The design aims to illustrate real-world distributed transaction tradeoffs, not just basic data replication. It helps understand what you gain and lose with a layered model versus simpler replication like chain replication (which, as you noted, is more common for the data path in some object stores).
So yes, in a pure object store, consensus for data replication is often skipped in favor of lighter-weight methods. Here, the explicit Raft+2PC combo is an architectural choice for anyone learning, experimenting, or wanting strong, multi-shard atomicity.
In a production system focused only on throughput or simple durability, some of this could absolutely be streamlined.
Hey folks, we're doing the next reading in the Software Internals Book Club. This is a long book so we'll cover it in three chunks with breaks in between. There are over 2,500 members in the club mailing list overall and typically 300-800 join any given book reading. It's all asynchronous over email. Different people kick off discussion for each chapter which makes it more fun and more sustainable to run. Hope to see you there!
reply