I have a tool[1] that solely worked with docker before and was putting off supporting podman for a while because I thought it would take some time. But it turned out to work straight out of the box without tweaking. Essentially frictionless.
I have a small open-source project, that uses docker compose behind the scenes, to help startup any service. You can look to add it in (or I am also happy to add it in) and then users are one command away from running it (insta moose). Recently just added in lakekeeper and various data annotation tools.
Interesting. How do you do dependencies between those pieces of infrastructure if there's any? For example, in our Docker Compose file, we have temporal that depends on progress and then moose depends on temporal. How is that expressed in Insta-Infra?
It leverages docker compose 'depends_on' for the dependencies (https://docs.docker.com/compose/how-tos/startup-order/). For example, airflow depends on airflow-init container to be completed successfully which then depends on postgres.
This looks really cool. I'm surprised I didn't find this before when I was searching for something like this. I've been using jpackage[1] for a while now but this seems like it would be easier for me to manage using JReleaser given there is support via Gradle.
Would this be a simple lift and shift job to move to JReleaser (as it seems like it just uses jpackage behind the scenes)? With jpackage, if you want to create a Windows exe, it needs to be built on Windows. Similarly, build dmg on Mac and deb for Linux. Does Jreleaser also require this?
Given that JReleaser relies on jpackage to create native installers, yes, you must run it on the target platform. Luckily it’s not that complicated to do on GitHub Actions. JReleaser offers plenty of examples for different setups, here’s how to do it for jpackage https://jreleaser.org/guide/latest/examples/java/jpackage.ht...
I've taken a stab at making a solution for it via https://github.com/data-catering/data-caterer. It focuses on making integration tests easier by generating data across batch and real-time data sources, whilst maintaining any relationships across the datasets. You can automatically set it to pick up the schema definition from the metadata in your database to generate data for it. Once your app/job/data consumer(s) use the data, you can run data validations to ensure it runs as expected. Then you can clean up the data at the end (including data pushed to downstream data sources) if run in a shared test environment or locally. All of this runs within 60 seconds.
It also gives you the option of running other types of tests such as load/performance/stress testing via generating larger amounts of data.
Too many times I've become very frustrated when an installation doesn't work the first time or it has some dependencies that you haven't installed (or worse, you have a different version). Then you end up in some deep rabbit hole that you can't dig out from. Now for each tool I make, it must have a quick start with a single command.
The world of mock data generation is now flooded with ML/AI solutions generating data but this is a solution that understands it is better to generate metadata to help guide the data generation. I found this was the case given the former solutions rely on production data, retraining, slow speed, huge resources, no guarantee about leaking sensitive data and its inability to retain referential integrity.
As mentioned in the article, I think there is a lot of potential in this area for improvement. I've been working on a tool called Data Caterer (https://github.com/data-catering/data-caterer) which is a metadata-driven data generator that also can validate based on the generated data. Then you have full end-to-end testing using a single tool. There are also other metadata sources that can help drive these kinds of tools outside of using LLMs (i.e. data catalogs, data quality).
I recently went down the rabbit hole of using PyScript for running a Python CLI app in the browser.
It felt hacky the whole time, especially when dependencies were involved. I had to create wrapper classes to work around Pydantic 2.x not being available to use. I tried to put all logic into the Python files but found some things missing that I had to put in JavaScript.
I think it could be good in use cases where you want some simple UI with custom UI logic on top of your Python code but maybe Streamlit or Gradio could be more suitable.
[1] Tool for reference: https://github.com/data-catering/insta-infra