mihevc's comments

mihevc · 2025-12-08T18:38:41 1765219121

How does this compare to https://github.com/Query-farm/tributary ?

rustyconover · 2025-12-08T19:10:12 1765221012

The next major release of Tributary will support Avro, Protobuf and JSON along with the Schema Registry it will also bring the ability to write to Kafka with transactions.

But really you should get excited for DuckDB Labs to build out materialized views. Materialized views where you can ingest more streaming data to update aggregates. This way you could just keep pushing rows through aggregates from Kafka.

It is going to be a POWER HOUSE for streaming analytics.

Contact DuckDB Labs if you want to sponsor the work on materialized views: https://duckdb.org/roadmap

buremba · 2025-12-08T21:18:28 1765228708

Exactly. I have also been playing with DuckDB for streaming use cases, but it feels hacky to issue micro-batching queries on streaming data in short intervals.

DuckDB has everything that streaming engines such as Flink have; it just needs to support managing intermediate aggregate states and scheduling the materialized views itself.

trueno · 2025-12-09T03:27:51 1765250871

Is this to be used in an analytics application backend sort of scenario?

I am familiar with materialized views / dynamic tables from enterprise-grade cloud lake type offerings, but I've never quite understood where duckdb, though impressive, fits into everyones use case. I've toyed with it for personal things, it's very cool having a local instance of something akin to snowflake when it comes to processing and aggregating on Big Data™ but generally I don't see it used in operational settings. For application development people are generally tied to sqlite and postgres.

It all does seem really cool though, I guess I'm just not feeling creative enough to conjure up a stream-to-duckdb use case. Feel free to bombard me with cool ideas.

dm03514 · 2025-12-08T18:53:51 1765220031

Oh yes!! I've seen this a couple times. I am far from an expert in tributary so please take with a grain of salt.

Based on the tributary documentation, I understand that tributary embeds kafka consumers into duckdb. This makes duckdb the main process that you run to perform consumption. I think that this makes creating stream processing POCs very accessible. It looks like it is quite easy to start streaming data into duckdb. What I don't see is a full story around Devops, operations, testing, configuration as code etc.

SQLFlow is a service that embeds DuckDB as the storage and processing brains. Because of this, we're able to offer metrics, testing utilities, pipelines as code, and all the other DevOps utilities that are necessary to run a huge number of streaming instances 24x7. SQLFlow was created as a tool that I wish I had to for simple stream processing in production in high availability contexts :)

mihevc · 2025-12-08T19:05:58 1765220758

Nice! Thanks for the context, it's great to know!

mihevc · 2025-01-22T13:08:44 1737551324

I was there a week ago and it still wasn't back. The penis room was wide open though.

rawgabbit · 2025-01-23T17:01:35 1737651695

When I to the Naples Archaeological museum website, I get the following.

NOTICE Il Mosaico di Alessandro proveniente dalla Casa del fauno di Pompei è in restauro. Le indagini diagnostiche condotte negli ultimi anni hanno messo in luce una serie di criticità nello stato di conservazione del capolavoro. Attualmente è in corso di ultimazione la prima fase di lavoro, che ha riguardato la progettazione e la realizzazione del sistema di movimentazione del mosaico, il ribaltamento di quest'ultimo e a seguire una ulteriore campagna di indagini diagnostiche e saggi che permetteranno di completare il quadro della conoscenza del manufatto. Seguirà la seconda fase di lavoro, che prevede prima il progetto di restauro e poi l'esecuzione dei relativi interventi sia sulla parte posteriore che su quella frontale. Il pubblico può assistere alle attività del cantiere grazie a due distinti punti di osservazione appositamente predisposti nella Collezione Mosaici.

mihevc · on Jan 10, 2023

Here are some cookbook examples: https://arrow.apache.org/cookbook/py/data.html#group-a-table, https://arrow.apache.org/cookbook/. Datasets would probably be a good approach for the billions size, see: https://blog.djnavarro.net/posts/2022-11-30_unpacking-arrow-...

mihevc · on June 14, 2021

I wrote my master thesis on Acer C720 (2013) running xfce. I used some octave but I mostly needed it for latex. These days I don't use it anymore because the screen is not that great but it aged incredibly well.

mihevc · on Jan 1, 2021

Yet another format is CSF (Compressed Sparse Fiber): http://glaros.dtc.umn.edu/gkhome/node/1177

It's a generalized CSR/CSC (to ndim > 2) format that uses a tree data structure where each path from root to leaf encodes a nonzero element.

mihevc · on Jan 1, 2021

Nice post! For completeness: Apache Arrow is also adding sparse tensors in c++ [1] and wrapping them with python [2] (although the documentation for Python might be a bit lacking at the moment).

[1] https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=...

[2] https://github.com/apache/arrow/blob/master/python/pyarrow/t...