Yes, we've been doing that at my place basically since the start. Each tenant is...

neeleshs · on March 15, 2024

This is a very common approach and scales quite well. I worked for a company that had thousands of customers and each had their own schema. A single master database that kept track of which customer is on what physical db cluster, and this was globally replicated (EU,ANZ, NA).

Certainly needs a bunch of tooling, but worked well. Some apps were stateless and could connect to any physical cluster. Some others were sticky and only connected to a subset.

Similar architecture in my current company as well and we serve nearly a thousand customer instances served across 4 physical clusters.

We do have some basic tools to provision new customers on the emptiest cluster, move customers from one cluster to another etc

Syntaf · on March 14, 2024

I recall a popular rails gem[1] once upon a time that provided multi-tenancy via postgres schemas.

As it turns out, even the company the initially developed the gem ended up ditching the approach due to some of the issues you outlined above.

Managing separate schemas feels like one of those nefarious decisions that make things simple _initially_ but get you into a world of hurt when you need to scale. The company is since defunct but they have an article where they discuss why they ditched the approach [2], TL;DR it's too difficult to maintain and scale

[1] https://github.com/influitive/apartment#tenants-on-different... [2] https://web.archive.org/web/20201108191323/https://influitiv...

cqqxo4zV46cp · on March 14, 2024

This is the conclusion I came to when faced with the same quandary.

tharkun__ · on March 15, 2024

Let's address these one by one based on our experience (the part of the journey that I've been there at least as the implementation of the solution predates me but I live with it).

Migrations

    Things slow down past 100 [...] No one wants friction in their deployment process, especially as we’re attempting to deploy daily or more frequently.

We have more than a thousand schemas per database shard. That is why I said you want things to run in parallel for speed, yes. However, we deploy way more than daily and it's not really an issue in that sense. Schema updates are not that frequent but you do of course need to take into account that you will have to make the schema updates as a separate PR, wait and check that it worked and then deploy your actual change making use of the changes in application code. Which honestly isn't much different from ensuring that your BE changes and FE changes are compatible or made in the right order so that you don't get failed requests because an old FE happens to call a new BE node or vice versa :shrug:

Database resources

    "Need too large of an instance" r3.4xl

We seem to have m5.large. That has 2 virtual cores. Some r5.larges etc. Their r3.4xl has 16?! So not sure what kind of load pattern they have :shrug:

Client Memory Bloat

    Ruby ActiveRecord something

Yeah well, we don't have that, so not sure what to say :shrug: Definitely not a generalized reason to say "can't do, is too complicated for scaling".

Record Identification

    One major drawback of schemas as tenants is that your sequence generators will reside independently in each tenant.

I respectfully disagree. This is a great advantage because it means just because you get more and more tenants (some of which churn and you throw away their data anyway) your identifiers don't grow past limits as easily. In fact, in most cases the identifiers never run out of runway at all.

They complain about "what if you need to join this data somewhere else, then you need to also add the tenantId". Yeah, so? We also have a data warehouse we we do just that. Not a problem at all. We also have other services than our main one, which do use different database technologies where we use tenant as part of the key (for loads that actually benefit from being in a NoSQL type DB) and there we do not have sharding other than what the NoSQL database does by itself so to speak by dividing the keyspace.

That's it. End of article. Basically, we have none of these issues. They don't mention their actual scale. The only number they mention is the "hundreds of schemas". We have more than ten times that number per physical shard and have tens of thousands of tenants total. Again :shrug:

nightpool · on March 15, 2024

That's still 2 orders of magnitude smaller than the scale of Figma—they would need to somehow manage millions of Postgres schemas. I don't think it's a realistic possibility

tharkun__ · on March 15, 2024

https://news.ycombinator.com/item?id=39711815 Shopify seems to have millions of customers. Granted, they have MySQL and it might not use database schemas and something more MySQL specific. In general though ...

nightpool · on March 15, 2024

"We can attach a shop_id to all shop-owned tables and have it serve as a sharding key. Moving a shop from one shard to another involves selecting all records from all tables that have the desired shop_id and copying them to another MySQL shard. For this post, it’s helpful to think of each pod as a MySQL shard."

Sounds like this is just normal sharding. Completely different from having a totally separate schema like GP proposes (with that approach you don't use sharding keys at all).

tharkun__ · on March 16, 2024

I don't see how it follows from that. "Attach" seems very generic.

But I do see now, searching for that part of the info specifically, they mention

    Peppy Peppers’ orders can be migrated by running SELECT FROM orders WHERE shop_id = 1 FOR UPDATE on Pod 1’s database

Yeah that's just having the tenant id in the table directly. Oh well!

tianzhou · on March 17, 2024

FWIW, we at bytebase building the batch change feature to make this less painful https://www.bytebase.com/docs/change-database/batch-change/

aledalgrande · on March 15, 2024

> a single physical piece of hardware that will show its limitations if too many large or active tenants happen to be on the same shard

Shopify has pretty much mastered this https://shopify.engineering/mysql-database-shard-balancing-t...

derekperkins · on March 16, 2024

They have since migrated to Vitess