He will also talk about how RethinkDB fits into the CAP theorem (they chose CA)
There's no such thing as "choosing CA". You can't "choose" to not have partitions any more than you can "choose" to never have hard drives fail. The question is, when things fail (yes, when, not if) how does your system respond?
Alas, I'm in Vancouver, not San Francisco, so I can't attend and rant about this in person.
CAP is about choosing which are the main guarantees of a system. When you say P in CAP it means "If there's a network partition which affects n machines, the rest of the system will take charge of that load and continue working just fine" for a variable number of n. Saying "they chose CA" means they don't provide that guarantee and there could be problems related to a network partition (which by the way is how SQL databases usually operate). In RethinkDB's case they chose to sacrifice availability by default in those cases (but you can tune it).
Take for example Riak, they chose "AP" which means the system will continue working just fine through a network partition at the expense of not giving a strong consistency guarantee (eventual consistency). That doesn't mean that the data isn't consistent at all, just that it isn't its main concern.
"my past criticism of CAP not actually being about picking two of three out
of C (consistency), A (availability), and P (partition tolerance) due to the
fact that it does not make sense to reason about a system that is ‘CA’. (If
there is no partition, any system can be both consistent and available ---
the only question is what happens when there is a partition --- does
consistency or availability get sacrificed?)"
To nail down some definitions: C.A.P. = Consistency, Availability, Partition tolerance.
If their goal is hardline 'C,' that means (pedantically) if a partition is detected, the database reports back to the application "database partition exists. denying all reads and writes until resolved."
If you can't tolerate a partition but still want to claim 'A' (where "tolerating" requires merging structured data cleverly with CRDT-like things ("eventually") or last write wins (or even better: random write wins (which is basically a traditional RDBMS approach))), then they can be read-avail and maybe report metadata back to the client you're in read-only mode due to partition crappage and data isn't going to update until partition crappage resolves itself. Look! It's available! I can read from it!
(notably: in Amazon's original Dynamo case, write availability was more important than read availability, which is where ESCAPE comes in ("Eventually Somewhat Consistently Available Partition tolerant Engine"))
Great doc reference, thanks. I've always been impressed by RethinkDB and so was surprised to see the blurb above about being "CA". Good to see that's not actually the party line.
There's no such thing as "choosing CA". You can't "choose" to not have partitions any more than you can "choose" to never have hard drives fail. The question is, when things fail (yes, when, not if) how does your system respond?
Alas, I'm in Vancouver, not San Francisco, so I can't attend and rant about this in person.