whargarbl's comments

whargarbl · on April 27, 2019

AFAICT it is very far from supporting full Postgresql.

whargarbl · on Oct 19, 2018

As an aside, can somebody explain the constraints FoundationDB puts on replica distance?

evanweaver · on Oct 19, 2018

FoundationDB cannot currently replicate its transaction log across datacenters, either synchronously or asynchronously: https://forums.foundationdb.org/t/multi-dc-replication/499

In the discussed proposal datacenter failure is considered an extraordinary event with major performance implications, which is different than a global "write anywhere, read anywhere" multi-datacenter partitioned log model like Fauna.

voidmain · on Oct 19, 2018

FoundationDB has long supported both synchronous and asynchronous replication across regions, and its major users use multi-regional configurations. In the former mode, you will see 1xRTT latency for commits [1] from the active datacenter and 2xRTT latency for commits from other datacenters. In the latter mode, commits from the active datacenter are fast (0xRTT) but the durability property must be sacrificed if a region fails. I believe almost all users currently use asynchronous replication because the performance is so much better (than any geographically replicated synchronous database).

The new "satellite" replication model in FoundationDB 6.0 allows the best of both worlds: if you have two or more datacenters in each region (each storing transaction logs, but only one storing actual data) then you can do synchronous (0xRTT) commits to the regional datacenters and asynchronous replication to the other region(s). If there are failures in one region, the database can quickly and automatically fail over to the other region, while maintaining durability (by getting the last bits of transaction log from at least one datacenter in the failing region). Even if a whole region fails, as long as the failures of the datacenters and network links in it are spread out over time the database will complete a full ACID recovery. And in the worst case you can fall back to ACI recovery as in asynchronous replication.

The question you are linking to is asking about something different, the ability to have different parts of a single database have subgeographic latencies in different regions.

[1] You will also see 1xRTT latencies when you start transactions, but you can solve this by setting the causal_read_risky transaction option on all read/write transactions. This is not actually risky because if the reads in the transaction turn out not to be externally consistent, the transaction will not commit.

evanweaver · on Oct 19, 2018

Very interesting. People really care about latency so we are also look at more datacenter-local journaling schemes that maintain consistency at the expense of a theoretically unlikely hit to durability.

What do you mean, "active datacenter"? Can all datacenters accept transactions concurrently?

voidmain · on Oct 19, 2018

If you are using Raft for replication, at any given time your replica set has a leader and it is located somewhere, and I would assume that writes from near there are faster than from anywhere else. In FoundationDB this is handled at a somewhat higher layer of the system, and since our transaction pipeline is (very roughly) resolve conflicts -> transaction log -> storage rather than transaction log -> resolve conflicts -> storage, we are also doing conflict resolution in that region.

Moreover, most current users of FoundationDB aren't willing to accept even 1xRTT latencies anywhere in their transaction lifecycle, so they can't abstract away which region is the fast one. A common way (though not the only way) to set things up is that your whole application (not just the database) is active in region A, but prepared to fail over at a moment's notice to run in region B. Ideally this comes at basically no performance penalty relative to a single-region setup in region A. Alternatively, it's possible to read or write the database from passive region(s) (and individual reads will normally happen from the local region with low latency), but each transaction has to pay a geographic latency at some point (or, in the case of read-only transactions, accept reduced consistency).

I think it would be possible to implement something analogous to satellite replication for Raft. It's a really nice set of tradeoffs for many applications, and if your cloud provider or equivalent has their act at all together instantaneous failures of all the datacenters in a region or their networking should really be pretty rare.

whargarbl · on June 21, 2017

FaunaDB maybe for the database layer: https://fauna.com/blog/introducing-faunadb-serverless-cloud-...

whargarbl · on April 6, 2017

"The current public version of Spanner does not support client-side interactive transactions either."

Did not know that...why?

abadid · on April 6, 2017

I can't comment on Google's decision here. I don't see any technical reason why they can't support client-side interactive transactions. But I'm happy to comment on technical points I made in the post ...

wwilson · on April 6, 2017

(Disclaimer: I work on Cloud Spanner).

Cloud Spanner fully supports interactive read-write transactions.

I'm not sure what the source of the confusion here is. Maybe Daniel is using a new definition of "client-side interactive transactions" that I'm unfamiliar with. :)

evanweaver · on April 6, 2017

Our source for that was this post: https://quizlet.com/blog/quizlet-cloud-spanner (SQL Dialect)

Maybe that's no longer the case?

wwilson · on April 6, 2017

Yeah, I've read that post, and I have no idea where it gives the impression that we don't support interactive transactions. What paragraph are you looking at?

Could you be referring to the fact that we make you do writes via a mutation API rather than DML? Obviously that has no impact on interactivity...

evanweaver · on April 6, 2017

Yes, our misunderstanding and we misinformed Daniel. Fixed, and thank you.

It would be cool to know why Spanner is like that.

bdarnell · on April 6, 2017

(disclosure: CockroachDB founder) The reason I've heard is that Spanner uses a separate mutation API instead of SQL DML because of a quirk of its transaction model. Writes within a transaction are not visible to subsequent reads within the same transaction (source: https://cloud.google.com/spanner/docs/transactions#rw_transa...). This is different from other SQL databases, so the use of a different API forces you to think about your read/write transactions in a Spanner-specific way.

(FWIW, CockroachDB does not have this limitation - transactions can read their own uncommitted writes, just like in other databases)

elvinyung · on April 6, 2017

+1, also curious about this. I speculated that Cloud Spanner is supposed to be F1[1], but the fact that F1 seems to fully support SQL DML makes this difference even more perplexing.

> Updates are also supported using SQL data manipulation statements, with extensions to support updating fields inside protocol buffers and to deal with repeated structure inside protocol buffers.

[1] https://research.google.com/pubs/pub41344.html

abadid · on April 6, 2017

Sorry about that. As Evan said -- that sentence was based on something he told me. The post has now been fixed. My apologies.

whargarbl · on March 31, 2017

"FaunaDB implements a process scheduler that dynamically allocates resources and enforces quality of service...Execution proceeds via cooperative multitasking."

Just like Windows 95! I don't think I've ever seen a database do this. Apparently it operates per-query or at least per application. Operating system seems right; this is a long paper.

electrum · on March 31, 2017

Presto does this as well. Execution is done by "drivers" which move data pages between operators. It's a cooperative push model rather than a Volcano (iterator) style pull model. There are typically more drivers than worker threads. Drivers are assigned to threads using multilevel feedback queues (inspired by OS schedulers).

my123 · on April 1, 2017

SQL Server also uses its own scheduler inside ;)

zenithm · on March 31, 2017

Dynamic isolation sounds cool especially without the overhead of containers. Seems obvious in hindsight I guess.

whargarbl · on March 1, 2012

Post says the driver was benched at 25k rps. But yeah that C implementation sucks.

whargarbl · on March 1, 2012

AFAIK the issue is the way they listen() to the socket, as well as whether the accept is concurrent in some way.

whargarbl · on Jan 24, 2010

You didn't strace it to see what was going on?