Yes, we've been doing that at my place basically since the start. Each tenant is a schema in postgres. Works perfectly fine on the one hand, i.e. your tables don't grow to 'infinity' just because you're adding more and more tenants. If there's a particular tenant that has lots of data, only that tenant's indexes and tables grow huge and become slower because of that particular reason etc. If a tenant leaves, you keep the schema around for some time, so they can come back and then at some point you just drop the schema!
It does mean having to upgrade each schema individually, which also makes it both easier and harder. Easier because the tables are smaller, so any schema changes that require things like say a table lock, are locking for a smaller amount of time and won't affect more than the one tenant at a given time. It also means that you can get into an inconsistent state of course, where some of your tenants have all the latest DB upgrades, while it failed on another subset.
At some point Postgres's internal tables become a bit of a "problem", as you want to run as many of these updates in parallel as you can for speed, which could lead to contention on Postgres' administrative side. You'll also still need to shard across multiple actual RDS instances, because you still have many tenants running against a single physical piece of hardware that will show its limitations if too many large or active tenants happen to be on the same shard.
And then you have the problem of keeping a record of which physical RDS instance (i.e. shard) the tenant is on. Your application code will need to look that up (and cache that info for some time ;)) and you have some choice there as well. I.e. do you shard those as well and juggle load balancing as basically a 1:1 mapping to shards or do you have your application layer connect to all database shards and handle any tenants? One is more complicated I would say while the other could run out of connections depending on how you need to scale the application layer and what kind of RDS instance you have.
This is a very common approach and scales quite well. I worked for a company that had thousands of customers and each had their own schema. A single master database that kept track of which customer is on what physical db cluster, and this was globally replicated (EU,ANZ, NA).
Certainly needs a bunch of tooling, but worked well. Some apps were stateless and could connect to any physical cluster. Some others were sticky and only connected to a subset.
Similar architecture in my current company as well and we serve nearly a thousand customer instances served across 4 physical clusters.
We do have some basic tools to provision new customers on the emptiest cluster, move customers from one cluster to another etc
I recall a popular rails gem[1] once upon a time that provided multi-tenancy via postgres schemas.
As it turns out, even the company the initially developed the gem ended up ditching the approach due to some of the issues you outlined above.
Managing separate schemas feels like one of those nefarious decisions that make things simple _initially_ but get you into a world of hurt when you need to scale. The company is since defunct but they have an article where they discuss why they ditched the approach [2], TL;DR it's too difficult to maintain and scale
Let's address these one by one based on our experience (the part of the journey that I've been there at least as the implementation of the solution predates me but I live with it).
Migrations
Things slow down past 100 [...] No one wants friction in their deployment process, especially as we’re attempting to deploy daily or more frequently.
We have more than a thousand schemas per database shard. That is why I said you want things to run in parallel for speed, yes. However, we deploy way more than daily and it's not really an issue in that sense. Schema updates are not that frequent but you do of course need to take into account that you will have to make the schema updates as a separate PR, wait and check that it worked and then deploy your actual change making use of the changes in application code. Which honestly isn't much different from ensuring that your BE changes and FE changes are compatible or made in the right order so that you don't get failed requests because an old FE happens to call a new BE node or vice versa :shrug:
Database resources
"Need too large of an instance" r3.4xl
We seem to have m5.large. That has 2 virtual cores. Some r5.larges etc. Their r3.4xl has 16?! So not sure what kind of load pattern they have :shrug:
Client Memory Bloat
Ruby ActiveRecord something
Yeah well, we don't have that, so not sure what to say :shrug: Definitely not a generalized reason to say "can't do, is too complicated for scaling".
Record Identification
One major drawback of schemas as tenants is that your sequence generators will reside independently in each tenant.
I respectfully disagree. This is a great advantage because it means just because you get more and more tenants (some of which churn and you throw away their data anyway) your identifiers don't grow past limits as easily. In fact, in most cases the identifiers never run out of runway at all.
They complain about "what if you need to join this data somewhere else, then you need to also add the tenantId". Yeah, so? We also have a data warehouse we we do just that. Not a problem at all. We also have other services than our main one, which do use different database technologies where we use tenant as part of the key (for loads that actually benefit from being in a NoSQL type DB) and there we do not have sharding other than what the NoSQL database does by itself so to speak by dividing the keyspace.
That's it. End of article. Basically, we have none of these issues. They don't mention their actual scale. The only number they mention is the "hundreds of schemas". We have more than ten times that number per physical shard and have tens of thousands of tenants total. Again :shrug:
That's still 2 orders of magnitude smaller than the scale of Figma—they would need to somehow manage millions of Postgres schemas. I don't think it's a realistic possibility
https://news.ycombinator.com/item?id=39711815 Shopify seems to have millions of customers. Granted, they have MySQL and it might not use database schemas and something more MySQL specific. In general though ...
"We can attach a shop_id to all shop-owned tables and have it serve as a sharding key. Moving a shop from one shard to another involves selecting all records from all tables that have the desired shop_id and copying them to another MySQL shard. For this post, it’s helpful to think of each pod as a MySQL shard."
Sounds like this is just normal sharding. Completely different from having a totally separate schema like GP proposes (with that approach you don't use sharding keys at all).
It does mean having to upgrade each schema individually, which also makes it both easier and harder. Easier because the tables are smaller, so any schema changes that require things like say a table lock, are locking for a smaller amount of time and won't affect more than the one tenant at a given time. It also means that you can get into an inconsistent state of course, where some of your tenants have all the latest DB upgrades, while it failed on another subset.
At some point Postgres's internal tables become a bit of a "problem", as you want to run as many of these updates in parallel as you can for speed, which could lead to contention on Postgres' administrative side. You'll also still need to shard across multiple actual RDS instances, because you still have many tenants running against a single physical piece of hardware that will show its limitations if too many large or active tenants happen to be on the same shard.
And then you have the problem of keeping a record of which physical RDS instance (i.e. shard) the tenant is on. Your application code will need to look that up (and cache that info for some time ;)) and you have some choice there as well. I.e. do you shard those as well and juggle load balancing as basically a 1:1 mapping to shards or do you have your application layer connect to all database shards and handle any tenants? One is more complicated I would say while the other could run out of connections depending on how you need to scale the application layer and what kind of RDS instance you have.