OK I love SQLite but there has been a long-standing misconception, or issue, that I'd like to get clarification on: Is using SQLite from multiple processes safe?
For a long while, either SQLite itself or the Python bindings weren't safe for concurrent access, is this still the case? Can I use SQLite for my Django app? With the backup system on the Tailscale post yesterday, the operational burden is much much lower than Postgres for many use cases.
It is always safe, and by "safe" I mean "safe for data". You won't have to deal with data corruption. Precisely, see "How To Corrupt An SQLite Database File": https://www.sqlite.org/howtocorrupt.html
Now concurrent accesses from different processes/connections can lead to runtime errors (SQLITE_BUSY), because the database happens to be locked by one connection.
Those errors are greatly reduced by the WAL mode (https://sqlite.org/wal.html) which provides ultra-robust single-writer/multiple-readers semantics:
- Writes can not happen concurrently (SQLITE_BUSY).
Oh excellent, I wasn't sure the "snapshot isolation" level existed (the WAL kind of implied it, but I wasn't sure). SQLITE_BUSY only happens if a write lock is kept for longer than the readers' configured timeout, right? It shouldn't happen for short writes?
One should only expect SQLITE_BUSY for writes (if a writer is already holding the lock, and the busy timeout expires before the other writer releases the lock). So yes, prefer short writes, or adjust your timeout. Generally speaking, SQLITE_BUSY can not be 100% prevented for writes.
Yes. Now, many short writes look exactly as one very long write from the point of view of an enqueued write that is waiting for its turn :-) I don't quite remember how fair is SQLite scheduling, in practice.
I don’t think there is any scheduling. Each connection polls to see if the write lock is available up to the max busy timeout setting.
The connection polls at these intervals: static const u8 delays[] = { 1, 2, 5, 10, 15, 20, 25, 25, 25, 50, 50, 100 };
So, if you are using the default 5 second timeout, and you are trying to acquire a lock while an exclusive lock is held, you will wait 1 second, then 2 seconds, then 5 seconds, and timeout. I’m not sure if you timeout after 3 total seconds have elapsed, or sometimes after the 2 and sometimes after the 5.
If you have a thread running many fast queries in a loop you can deny access to another thread that needs a lock. The other thread may get lucky and poll for the lock at the exact moment in between locks from the other thread, but it might not.
> The error message when using the connection from multiple threads is "outdated"
I think typically you'd still want to use one connection per thread because (for example) there can only be one transaction per connection at any given time.
It appears to be safe, but potentially slow, assuming you have working fcntl() locking, that is, for example, not using NFS. And you have to handle SQLITE_BUSY or set a timeout. More in their FAQ, Q5: https://www.sqlite.org/faq.html#q5
That looks great, thanks. I didn't realize it wasn't on by default, but it sounds like it'll greatly benefit the web app use case, where all web workers are on the same Linux machine.
Sounds like WAL is the way to go in most use cases, unless you want shared access from different machines.
The WAL doesn't work with shared directories (NFS/SMB) though.
Found this out when I tried to store the Plex data directory on a NSF share in a VM and it had really weird issues. Turned out Plex uses SQLite with WAL enabled.
This is just what I’ve heard so take with a grain of salt, but as I understand it sqlite is “mostly” safe to use concurrently. There are situations where it doesn’t behave correctly under concurrent load, the one I’ve heard about being when the sqlite is on a network-mounted drive. I’d love to hear from someone who knows more though.
That's kind of expected, I think. Due to its design, SQLite relies on filesystem semantics to provide atomicity. If the filesystem doesn't provide the semantics, it makes sense that atomicity will fail.
I'm more asking whether it's still unsafe to use in a filesystem that DOES provide those semantics.
The biggest problem SQLite has is it's size limitations. It can only hold ~281 TB in a database unfortunately.
If you need more storage than that - that's the only reason I could endorse someone using a different database.
:P
My experience is that it is only safe to have one connection to a SQLite database.
Having multiple connections mostly works, however if the database is stored on a slow medium, such as an SD-card, concurrent queries will fail.
You need to explicitly set the “busy_timeout” PRAGMA to allow concurrent write queries to block under WAL mode. Otherwise the query will return a BUSY error.
For a long while, either SQLite itself or the Python bindings weren't safe for concurrent access, is this still the case? Can I use SQLite for my Django app? With the backup system on the Tailscale post yesterday, the operational burden is much much lower than Postgres for many use cases.