We used two types of datasets for post-training. Supervised finetuning data and preference data used for RLHF stage. You can actually use less than < 1M samples to significantly boost the aesthetics. Quality matters A LOT. Quantity helps with generalisation and stability of the checkpoints though.
The highest quality finetuning data was hand curated internally.
I would say our post training pipeline is quite similar to SeedDream 2.0 ~ 3.0 series from ByteDance. Similar to them, we use extensive quality filters and internal models to get the highest quality possible. Even from there, we still hand curate a hand-picked subset.
We have not added a separate RTX accelerated version for FLUX.1 Krea, but the model is fully compatible with existing FLUX.1 dev codebase. I don't think we made a separate onnx export for it though. Doing 4~8 bit quantized version with SVDQuant would be a nice follow up so that the checkpoint is more friendly for consumer grade hardware.
Have used in multiple projects and have found it's the right balance between ORMs and writing raw SQL. It's also easily extensible and takes care of the many edge cases and nuances of rolling your own SQL generator.
Since SQL is ever-so-slightly different across databases, I imagine trying to cover all of them as a single dev is a nightmare (especially if that's not the problem you're trying to solve).
I wrote my own query builder because I know for sure I'm only targeting SQLite. The second I need my feed reader library to work with another database engine I'm dumping my own for something more serious – either a full blown database abstraction layer like SQLAlchemy or Peewee (likely without the ORM part), or something simpler like PyPika or python-sql.[1]
Average dev salary at my org is north of 100k, but I do work in the US so lets say we have a developer earning half that. At $125 a month, it works out to be around 3% of monthly compensation (not including taxes and benefits). This only has to improve productivity by a tiny amount to be worth it.
I think it'd be really hard to prove any single tool provides you a specific productivity boost. Most engineers probably have a tool-set. Which means all those tools work together nicely. Taking one of them out, as well as adding one, might break the whole setup. Usually established engineers are not working in a vacuum, they already have their setups in place, so justifying a 3% extra cost might be very hard to justify for very unclear benefits, if any. I'm not trying to make a definitive argument, just some food for though.
That is not how it works, having a water cooler in the office increase the productivity 100x vs not having water, that does not mean you should pay millions for one. That is a myth invented by SAAS vendors and consultants to justify their sky-high price. The value offered of course factors in the price but many other factors too (scarcity of materials and resource to produce the good,cost of production, maintenance cost, cost of the products of your competitors, risk of vendor lock-in, etc)
Plenty of places pay X for tools that add more than X in productivity value.
In fact, nearly every tool I have ever gotten at a company worked like this. Most of them are also willing to test pricey tools to see if they would pay off, and when they do, the company starts buying such tools.
If you don't work at such a place, look for a place that values developer time.
I worked for a Fortune 20 company so you can stop the patronizing tone. A paper and a pencil also increases productivity by a lot ( perhaps more than any tool) that does not mean you need to pay 5% of your developer salary by month for them.
But, you likely would pay 5% of salary (or more) for a paper and pencil (to continue with your analogy) if you had no other choice and there was no alternative tool that could substitute. So I'm not sure what point you are trying to make.
Create a landing page. I will write a couple of posts saying that Sam Altman and Paul Graham are the smartest guys this century and we will be soon launching a a show HN, new YC company.
It's the enterprise version. They can afford it. Besides it's not like everyone in the org will be having a seat. Only the people that are doing data science.
I work in a very large enterprise that could definitely benefit from such a product, but at that cost I'm not even going to try mentioning it, it's never going to pass.
We have engineers in the thousands so that would be a budget in the millions per year, for a tool for which it's hard to demonstrate the productivity benefit over alternatives. Not going to happen.
It's not that the company can't afford it, but enterprises are strict when it comes to spending money. There are lots of processes, checks, and people that sign off.
On-Prem (even in private cloud) solutions like this seem to always be pricier, includes dedicated support too. They have a cheaper $19/m and free tier on https://datalore.jetbrains.com/