Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

OK I love SQLite but there has been a long-standing misconception, or issue, that I'd like to get clarification on: Is using SQLite from multiple processes safe?

For a long while, either SQLite itself or the Python bindings weren't safe for concurrent access, is this still the case? Can I use SQLite for my Django app? With the backup system on the Tailscale post yesterday, the operational burden is much much lower than Postgres for many use cases.



It is always safe, and by "safe" I mean "safe for data". You won't have to deal with data corruption. Precisely, see "How To Corrupt An SQLite Database File": https://www.sqlite.org/howtocorrupt.html

Now concurrent accesses from different processes/connections can lead to runtime errors (SQLITE_BUSY), because the database happens to be locked by one connection.

Those errors are greatly reduced by the WAL mode (https://sqlite.org/wal.html) which provides ultra-robust single-writer/multiple-readers semantics:

- Writes can not happen concurrently (SQLITE_BUSY).

- One can reduce the occurrences of such SQLITE_BUSY errors by using a built-in timeout (https://www.sqlite.org/c3ref/busy_timeout.html).

- Several reads can happen concurrently, including with writers.

- A writer connection can enter the "Serializable" isolation level.

- A reader connection can enter the "Snapshot Isolation" level.

For more details, see https://www.sqlite.org/isolation.html

During all the years I've been developing the GRDB library (https://github.com/groue/GRDB.swift), I could never see SQLite fail its documented guarantees. This made it possible to build one of the most concurrency-focused SQLite toolkit for Swift, and I'm pretty happy with it (https://github.com/groue/GRDB.swift/blob/master/Documentatio...).


Oh excellent, I wasn't sure the "snapshot isolation" level existed (the WAL kind of implied it, but I wasn't sure). SQLITE_BUSY only happens if a write lock is kept for longer than the readers' configured timeout, right? It shouldn't happen for short writes?


What's cool with WAL mode is that SQLITE_BUSY won't happen for readers (except very rare scenarios: https://www.sqlite.org/wal.html#sometimes_queries_return_sql...)

One should only expect SQLITE_BUSY for writes (if a writer is already holding the lock, and the busy timeout expires before the other writer releases the lock). So yes, prefer short writes, or adjust your timeout. Generally speaking, SQLITE_BUSY can not be 100% prevented for writes.


Yeah, that makes perfect sense. For most apps, it's fairly easy to keep writes short, so SQLite is a great fit.


Yes. Now, many short writes look exactly as one very long write from the point of view of an enqueued write that is waiting for its turn :-) I don't quite remember how fair is SQLite scheduling, in practice.


I don’t think there is any scheduling. Each connection polls to see if the write lock is available up to the max busy timeout setting.

The connection polls at these intervals: static const u8 delays[] = { 1, 2, 5, 10, 15, 20, 25, 25, 25, 50, 50, 100 };

So, if you are using the default 5 second timeout, and you are trying to acquire a lock while an exclusive lock is held, you will wait 1 second, then 2 seconds, then 5 seconds, and timeout. I’m not sure if you timeout after 3 total seconds have elapsed, or sometimes after the 2 and sometimes after the 5.

If you have a thread running many fast queries in a loop you can deny access to another thread that needs a lock. The other thread may get lucky and poll for the lock at the exact moment in between locks from the other thread, but it might not.


Hm, right. At that point, I guess it's time to increase throughput, but it's a fair observation.


> Is using SQLite from multiple processes safe?

Yes [0].

> The Python bindings weren't safe for concurrent access, is this still the case?

I think they're safe now [1]. The error message when using the connection from multiple threads is "outdated" [2].

[0]: https://www.sqlite.org/threadsafe.html [1]: https://bugs.python.org/issue45613 [2]: https://docs.python.org/3/library/sqlite3.html#sqlite3.threa...


> The error message when using the connection from multiple threads is "outdated"

I think typically you'd still want to use one connection per thread because (for example) there can only be one transaction per connection at any given time.


Excellent, thank you! Combined with WAL mode, it sounds like SQLite is eminently usable for most of my use cases.


It appears to be safe, but potentially slow, assuming you have working fcntl() locking, that is, for example, not using NFS. And you have to handle SQLITE_BUSY or set a timeout. More in their FAQ, Q5: https://www.sqlite.org/faq.html#q5

There's also "BEGIN CONCURRENT" with WAL mode: https://sqlite.org/src/doc/begin-concurrent/doc/begin_concur...


Interesting, thanks!

EDIT: Looks like it has a simple read/write lock, which isn't great for write-heavy workloads, but that's up to the application developer to decide.

BEGIN CONCURRENT looks great, however.


With WAL it's a generally smooth experience. Really depends on your use case though.

https://sqlite.org/wal.html


That looks great, thanks. I didn't realize it wasn't on by default, but it sounds like it'll greatly benefit the web app use case, where all web workers are on the same Linux machine.

Sounds like WAL is the way to go in most use cases, unless you want shared access from different machines.


The WAL doesn't work with shared directories (NFS/SMB) though.

Found this out when I tried to store the Plex data directory on a NSF share in a VM and it had really weird issues. Turned out Plex uses SQLite with WAL enabled.


That's point number two in the disadvantages:

> WAL does not work over a network filesystem.

Somewhat strange for plex to ship with that by default given how many people generally have network shares for such things.


Home page of SQLite uses SQLite concurrently.

https://www.sqlite.org/whentouse.html

If it exist a concurrency problem it should therefore either be in the bindings, operating system or the file system.

I only used SQLite in a non-concurrent fashion from PHP (single use command line scripts).

It should be easy to disprove with a concurrent test program if you can reliably tell when concurrency has failed.


This is just what I’ve heard so take with a grain of salt, but as I understand it sqlite is “mostly” safe to use concurrently. There are situations where it doesn’t behave correctly under concurrent load, the one I’ve heard about being when the sqlite is on a network-mounted drive. I’d love to hear from someone who knows more though.


That's kind of expected, I think. Due to its design, SQLite relies on filesystem semantics to provide atomicity. If the filesystem doesn't provide the semantics, it makes sense that atomicity will fail.

I'm more asking whether it's still unsafe to use in a filesystem that DOES provide those semantics.


Yeah it's safe.

The biggest problem SQLite has is it's size limitations. It can only hold ~281 TB in a database unfortunately. If you need more storage than that - that's the only reason I could endorse someone using a different database. :P


Haha, well, Postgres does have some niceties like isolation levels, JSON fields, etc.


SQLite also supports JSON fields and operators: https://www.sqlite.org/json1.html


Well at least I'm learning a ton of stuff about SQLite in this thread, thank you!


My experience is that it is only safe to have one connection to a SQLite database. Having multiple connections mostly works, however if the database is stored on a slow medium, such as an SD-card, concurrent queries will fail.


Hmm, fail completely, or just block?


One the queries will fail with something like SQLITE_BUSY


You need to explicitly set the “busy_timeout” PRAGMA to allow concurrent write queries to block under WAL mode. Otherwise the query will return a BUSY error.

I wrote up some tips/caveats on the Litestream site with more explanation. https://litestream.io/tips/

Generally there’s just a few settings you want to set when developing with SQLite:

PRAGMA journal_mode = wal;

PRAGMA busy_timeout = 5000;

Also worth setting STRICT mode and setting SYNCHRONOUS to NORMAL but those are less important.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: