How do you connect multiple data sources? I have a usecase where I have multiple data sources batch and streaming that I need to analyze together. I have used a database to consolidate the various sources but I do not get the realtime outcome I need. I am exploring https://getdozer.io/ any suggestions / feedback?
Sounds like a great use case for Debezium (capturing changes from databases with low latency) and Apache Flink (for processing these change event streams, e.g. filering them, joining them, applying pattern searches, putting aggregated data to a dashboard, etc.
Disclaimer: I work for Decodable, where we build a managed platform around these technologies and their use cases
Thanks. Debezium looks interesting but it is a bit different than what I am looking for. I am looking for an api approach so that other systems can pull the data without having to worry about is it batch or stream.
I’ve just learned about the Multi-Catalog feature of Apache Doris (an analytic database). It allows you to connect to various data sources without worrying about data transfer and query data from multiple external sources as simply as querying internal data. (https://doris.apache.org/docs/dev/lakehouse/multi-catalog/)
Disclaimer: I work for Decodable, where we build a managed platform around these technologies and their use cases