I'll point out that the ridiculous latency reductions don't apply to replicating the writes to S3 and/or any replica servers, that still takes as long as it would to any other server across a network. The latency reductions are only for pure read traffic. Also, every company I ever worked at had a policy to run at least two instances of a service in case of hardware failure. (Is this reasonable to extrapolate this policy to a company which might want to run on a single sqlite instance? I don't know, but just as a datapoint I don't think any business should strive to run on a single instance)
This write latency might be fine, although more than one backend app I know renewed the expiry time of a user session on every hit and would thus do at least one DB write per HTTP call. I don't think this is optimal, but it does happen and simply going "well don't do write traffic then" does not always line up with how apps are actually built. Replicated sqlite over litestream is very cool, but definitely you need to build your app around and also definitely something that costs you one of your innovation tokens.
There's no magic here (that there is no magic is part of the point). You have the same phenomenon in n-tier Postgres deployments: to be highly available, you need multiple instances; you're going to have a write leader, because you're not realistically want to run a Raft consensus for every write; etc.
The point of the post is just that if you can get rid of most of the big operational problems with using server-side SQLite in a distributed application --- most notably, failing over and snapshotting --- then SQLite can occupy a much more interesting role in your stack than it's conventionally been assigned. SQLite has some very attractive properties that have been largely ignored because people assume they won't be able to scale it out and manage it. Well, you can scale it out and manage it. Now you've got an extremely simple database layer that's easy to reason about, doesn't require you to run a database server (or even a cache server) next to all your app instances, and happens to be extraordinarily fast.
Maybe it doesn't make sense for your app? There are probably lots of apps that really want Postgres and not SQLite. But the architecture we're proposing is one people historically haven't even considered. Now, they should.
I'm not sure "litestream replicate <file>" really costs a whole innovation token. It's just SQLite. You should get an innovation rebate for using it. :)
> But the architecture we're proposing is one people historically haven't even considered. Now, they should.
I think this offering, and this idea, are absolutely fantastic, and if not the future, at least a big part of it, for the reason outlined in the post: namely, that for a lot of apps and use cases, sqlite is more than enough.
But I also suspect this is probably already the case, and we don't know about it because people don't talk about it.
Amusingly, I was recently scolded here on HN for suggesting to use sqlite, by someone who said HN was a place for "professionals":
Directing specific queries to a write connection (dbserver) vs directing requests to specific application servers (potentially mid-request) does seem operationally “harder” though.
I’m coming at this from the perspective of a traditional django app that has .using(“write”) when wanting to write data. Otherwise you’re replaying requests at a completely different app server.
This may or may not be that hard, depending on your server. You could proxy all "mutation" HTTP verbs to your one writer instance, and probably do similar if you are using GraphQL.
If you are using something like gRPC I feel this might be more complicated because it's not obvious which actions are read/write.
I'm in the same boat as you though overall - I'm not sure what the ideal strategy is, or if one even exists, since this seems to create a problem that does not normally exist.
If you are greenfield, maybe you create one service that only does writes - this may be CQRS-like.
This is great and I'm definitely going to be using it this week in a client project.
That being said, you don't get an innovation rebate for using a new tool, even if, as here, it's a parsimony-enabler. It's still a new tool.
A description from TFA reads "The most important thing you should understand about Litestream is that it's just SQLite." (This reminds me an awful lot of the tagline for CoffeeScript: "It's just JavaScript" -- where did that leave us?) But that info box is just under a description of how the new tool is implemented in a way that makes it sound (to someone who's never looked at the SQLite codebase) like it's breaking some assumptions that SQLite is making. That's the sound of an innovation token being spent.
CoffeeScript was not in fact just Javascript. Litestream literally is just sqlite3; it's not an app dependency, and doesn't interpose itself between your app and sqlite3. You could add it to an existing single-process SQLite-backed app without changing any code or configuration, other than running the Litestream process.
It's brilliant that a person can ship their WAL from an app that doesn't know anything about Litestream. That's cool. But it is not in fact just SQLite. If it were, there wouldn't be a blog post, or an additional binary to download, or a backup of my database in S3, or...
I think saying it is “just SQLite” is (unintentionally) misleading. Your app may not know it’s anything else but operationally it’s a sidecar, so another process to manage.
I actually had to go look that up because it was a little unclear from the blog post and this comments section.
If we're designing a system that relies on an unconventional and otherwise quite rare use-case of a dependency in order to make critical long-term stability guarantees, I would rather that dependency be SQLite, for sure.
> The latency reductions are only for pure read traffic.
Well, no, because every insert will still be fast (until there's too many). It does not block until it's written to e.g. s3.
So there's a window, let's say 1 second, of potential data loss.
I assume syncing the wal to s3 is much faster than inserting to sqlite, so it will never fall behind, but I have not tested.
> Also, every company I ever worked at had a policy to run at least two instances of a service in case of hardware failure.
Yeah, but the goal is not to have X instances, the goal is to limit downtime. In my experience the complicated setups have downtime as well, often related to how complicated they are.
In my mind a setup like this would only be used where some downtime is OK. But that's quite common.
This write latency might be fine, although more than one backend app I know renewed the expiry time of a user session on every hit and would thus do at least one DB write per HTTP call. I don't think this is optimal, but it does happen and simply going "well don't do write traffic then" does not always line up with how apps are actually built. Replicated sqlite over litestream is very cool, but definitely you need to build your app around and also definitely something that costs you one of your innovation tokens.