asubiotto's comments

asubiotto · 2025-08-26T21:44:39 1756244679

Seems like it. Changed it back!

dang · 2025-08-26T22:59:38 1756249178

Oops, sorry.

Tadpole9181 · 2025-08-26T23:41:32 1756251692

Haha, is that automated or was someone trying to be helpful?

dang · 2025-08-27T02:11:04 1756260664

It's automated. And of course it's usually right, but the wrong cases stand out like sore thumbs.

asubiotto · on July 8, 2020

We haven't implemented disk spilling for aggregations yet. It's different from the algorithms discussed in this article in that the amount of memory used is proportional to the number of aggregation groups (buckets), rather than the number of input tuples. This makes it less likely that aggregations would need to spill to disk and requires a slightly specialized solution to figure out how to serialize intermediate aggregation results.

Sorting is a good idea. The in-memory aggregator could keep track of the aggregation columns of each bucket, sort the buckets and input that has not yet been processed and then perform an ordered aggregation on the sorted input using the already-computed intermediate result as a starting point for the group's aggregation result.

Another option is to partition the buckets and input, which would subdivide the aggregation and avoid a sort.

asubiotto · on Nov 11, 2019

It's hard to say whether one would generally be a bigger gain than another without experimenting. We chose to focus on just execution because it was a relatively smaller project (with less side-effects) that still promised a large performance improvement. In the future, we might consider keeping data in columnar format on learner replicas to offer even better performance for users that would like to run OLAP-style queries on slightly stale data, but this would be a larger project.

asubiotto · on Nov 11, 2019

CockroachDB does target OLTP workloads. Note that the vectorized SQL engine covered in this blog post is execution-only (and used only for queries that operate on many rows). The storage layer remains row-oriented so the write performance is not affected. Rows are columnarized before processing by the execution engine.

asubiotto · on Nov 11, 2019

+1 avo is on our radar as a tool to use in the future

asubiotto · on Jan 31, 2019

Indeed! We spent some time running isolated benchmarks before deciding to dive into the project: https://github.com/jordanlewis/exectoy