There's no lockfile or anything with this approach right? So in a year or two all of these scripts will be broken because people didn't pin their dependencies?
> So in a year or two all of these scripts will be broken because people didn't pin their dependencies?
People act like this happens all the time but in practice I haven't seen evidence that it's a serious problem. The Python ecosystem is not the JavaScript ecosystem.
I think it's because you don't maintain much python code, or use many third party libraries.
An easy way to prove that this is the norm is to take some existing code you have now, and update to the latest versions your dependencies are using, and watch everything break. You don't see a problem because those dependencies are using pinned/very restricted versions, to hide the frequency of the problem from you. You'll also see that, in their issue trackers, they've closed all sorts of version related bugs.
Are you sure you’re reading what I wrote fully? Getting pip, or any of them, to ignore all version requirements, including those listed by the dependencies themselves, required modifying source, last I tried.
I’ve had to modify code this week due to changes in some popular libraries. Some recent examples are Numpy 2.0 broke most code that used numpy. They changed the c side (full interpreter crashes with trimesh) and removed/moved common functions, like array.ptp(). Scipy moved a bunch of stuff lately, and fully removed some image related things.
If you think python libraries are somehow stable in time, you just don’t use many.
... So if the installer isn't going to ignore the version requirements, and thereby install an unsupported package that causes a breakage, then there isn't a problem with "scripts being broken because people didn't pin their dependencies". The packages listed in the PEP 723 metadata get installed by an installer, which resolves the listed (unpinned) dependencies to concrete ones (including transitive dependencies), following rules specified by the packages.
I thought we were talking about situations in which following those rules still leads to a runtime fault. Which is certainly possible, but in my experience a highly overstated risk. Packages that say they will work with `foolib >= 3` will very often continue to work with foolib 4.0, and the risk that they don't is commonly-in-the-Python-world considered worth it to avoid other problems caused by specifying `foolib >=3, <4` (as described in e.g. https://iscinumpy.dev/post/bound-version-constraints/ ).
The real problem is that there isn't a good way (from the perspective of the intermediate dependency's maintainer) to update the metadata after you find out that a new version of a (further-on) dependency is incompatible. You can really only upload a new patch version (or one with a post-release segment in the version number) and hope that people haven't pinned their dependencies so strictly as to exclude the fix. (Although they shouldn't be doing that unless they also pin transitive dependencies!)
That said, the end user can add constraints to Pip's dependency resolution by just creating a constraints file and specifying it on the command line. (This was suggested as a workaround when Setuptools caused a bunch of legacy dependencies to explode - not really the same situation, though, because that's a build-time dependency for some packages that were only made available as sdists, even pure-Python ones. Ideally everyone would follow modern practice as described at https://pradyunsg.me/blog/2022/12/31/wheels-are-faster-pure-... , but sometimes the maintainers are entirely MIA.)
> Numpy 2.0 is a very recent example that broke most code that used numpy.
This is fair to note, although I haven't seen anything like a source that would objectively establish the "most" part. The ABI changes in particular are only relevant for packages that were building their own C or Fortran code against Numpy.
> `foolib >= 3` will very often continue to work with foolib 4.0,
Absolute nonsense. It's industry standard that major version are widely accepted as/reserved for breaking changes. This is why you never see >= in any sane requirements list, you see `foolib == 3.*`. For anything you want to work for a reasonable amount of time, you see == 3.4.*, because deprecations often still happen within major versions, breaking all code that used those functions.
Breaking changes don't break everyone. For many projects, only a small fraction of users are broken any given time. Firefox is on version 139 (similarly Chrome and other web browsers); how many times have you had to reinstall your plugins and extensions?
For that matter, have you seen any Python unit tests written before the Pytest 8 release that were broken by it? I think even ones that I wrote in the 6.x era would still run.
For that matter, the Python 3.x bytecode changes with every minor revision and things get removed from the standard library following a deprecation schedule, etc., and there's a tendency in the ecosystem to drop support for EOL Python versions, just to not have to think about it - but tons of (non-async) new code would likely work as far back as 3.6. It's not hard to avoid the := operator or the match statement (f-strings are definitely more endemic than that).
Agreed, this is a big problem, and exactly why people pin their dependencies, rather than leaving them wide open: pinning a dependency guarantees continued functionality.
If you don't pin your dependencies, you will get breakage because your dependencies can have breaking changes from version bumps. If your dependencies don't fully pin, then you they will get breaking changes from what they rely on. That's why exact version numbers are almost always pinned for something distributed, because it's a frequent problem that you don't want the end user having to deal with.
Again, you don't see this problem often because you're lucky: you've installed at a time when the dependencies have already resolved all the breakage or, the more common case, the dependencies were pinned tight enough that those breaking changes were never an issue. In other words, everyone pinning their dependencies strict enough is already the solution to the problem. The tighter the restriction, the more guarantee of continued functionality.
1. Great Python support. Piping something from a structured data catalog into Python is trivial, and so is persisting results. With materialization, you never need to recompute something in Python twice if you don’t want to — you can store it in your data catalog forever.
Also, you can request anything Python package you want, and even have different Python versions and packages in different workflow steps.
2. Catalog integration. Safely make changes and run experiments in branches.
3. Efficient caching and data re-use. We do a ton of tricks behind to scenes to avoid recomputing or rescanning things that have already been done, and pass data between steps with Arrow zero copy tables. This means your DAGs run a lot faster because the amount of time spent shuffling bytes around is minimal.
To me they seem like the pythonic version of dbt! Instead of yaml, you write Python code. That, and a lot of on-the-fly computations to generate an optimized workflow plan.
Plenty of stuff in common with dbt's philosophy. One big thing though, dbt does not run your compute or manage your lake. It orchestrate your code and pushes it down to a runtime (e.g. 90% of the time Snowflake).
This IS a runtime.
You import bauplan, write your functions and run them in straight into the cloud - you don't need anything more. When you want to make a pipeline you chain the functions together, and the system manages the dependencies, the containerization, the runtime, and gives you a git-like abstractions over runs, tables and pipelines.
You technically just need storage (files in a bucket you own and control forever).
We bring you the compute as ephemeral functions, vertically integrated with your S3: table management, containerization, read / write optimizations, permissions etc. is all done by the platform, plus obvious (at least to us ;-)) stuff like preventing you to run a DAG that is syntactically incorrect etc.
Since we manage your code (compute) and data (lake state through git for data), we can also provide full auditing with one liners: e.g. "which specific run change this specific table on this data branch? -> bauplan commit ..."
I have worked with poetry professionally for about 5 years now and I am not looking back. It is exceptionally good. Dependency resolution speed is not an issue beyond the first run since all that hard to acquire metadata is actually cached in a local index.
And even that first run is not particularly slow - _unless_ you depend on packages that are not available as wheels, which last I checked is not nearly as common nowadays as it was 10 years ago. However it can still happen: for example, if you are working with python 3.8 and you are using the latest version of some fancy library, they may have already stopped building wheels for that version of python. That means the package manager has to fall back to the sdist, and actually run the build scripts to acquire the metadata.
On top of all this, private package feeds (like the one provided by azure devops) sometimes don't provide a metadata API at all, meaning the package manager has to download every single package just to get the metadata.
The important bit of my little wall of text here though is that this is all true for all the other package managers as well. You can't necessarily attribute slow dependency resolution to a solver being written in C++ or pure python, given all of these other compounding factors which are often overlooked.
I will! I'm sure it's faster when the data is available. But when it's not, in the common circumstances described above, network and disk IO are still the same unchanged bottlenecks, for any package manager.
In conversations like this, we are all too quick to project our experiences on the package managers and not sharing in what circumstances we are using them.
Doesn't this potentially create security problems if process lifetime is very long? Changes to the certificate store on the system will potentially not be picked up?
Yes. And not just a security problem but an operational problem, since if you have to rotate a trust anchor you might have a hard time finding and restarting all such long-lived processes.
IMO SSL_CTX_load_verify_locations() should reload the trust store when it changes, though not more often than once a minute. IMO all TLS libraries should work that way, at least when the trust anchors are stored in external systems that can be re-read (e.g., files, directories, registries, etc.).
Apps can do something like that by re-creating an SSL_CTX when the current one is older than some number of minutes.
On practice, we are talking about the root certificates store. That thing that organizations update every 10 or 20 years. Every other year there's an update there, because there are a few of them, but your "very long" there uses a strong "very".
Well, it doesn't necessarily have to be 10 or 20 years long, all it takes is for the timeframe to overlap with a certificate being revoked, I guess. Process lifetimes of a few months are definitely not uncommon. Anyway, I can see the tradeoff. There just needs to be a mechanism to disable this performance optimization, or to invalidate the cache (e.g. periodically).
We should invent solutions for problems that we have, instead of wildly trying to apply this tech to everything we can think of. We are skipping the benefit/cost considerations.
Is AI not a solution for our problems? I can imagine a declining population, a younger generation, that does not want to reproduce, neither working off the available work. We need a solution for that.
So we need systems, that learns human knowledge, refines it, makes it better and takes over all of the work.
And for programmers: is sitting not the new cig smoking and drinking alcohol for early death?
How does AI help in solving the issues you brought over in your post?
> a declining population, a younger generation, that does not want to reproduce, neither working off the available work. We need a solution for that.
And what do you reckon is the reason for that? "Kids these days are lazy and only want to have fun", or something to that extent?
Or is that because the cost of living has skyrocketed, the future is uncertain and people are afraid of the planet becoming more and more inhospitable for humans in the following decades or even years?
Maybe people do not want to work despite the job availability because those jobs offer piss poor conditions and no security? If not for all these reasons, why would people act so differently than in the past?
Once you put your precious A"I" in the equation, what do you get out of it? Even less job security, more leverage for bad employers to threaten workers with the magic wand of "you'll get substituted", and arguably way shittier quality of products all around (cases in point: A"I" customer service that fails to be satisfactory in resolving customer issues or become a huge nuisance to deal with, or A"I" "art" in articles and blogs that' so easy to spot that you could arguably be better off with no illustrations at all).
> So we need systems, that learns human knowledge, refines it, makes it better and takes over all of the work.
First off, who is "we"? And secondly, once it "takes over all of the work" (whatever that means), what are people going to do? Do you really think we'll get some utopian fantasy like UBI?
> And for programmers: is sitting not the new cig smoking and drinking alcohol for early death?
How tf does that have anything to do with A"I" helping? If I use Copilot to "help" me out in programming I'll still have to sit, will I not?
Declining birth-rates seem to happen to every society that passes a certain level of industrialization, with the most well-off and secure in these societies having the fewest kids, and birth rates as a whole declining as more people enter this "totally well-off and secure" demographic. In fact, people in these same late-industrialization cultures who aren't well-off, whose futures remain uncertain, do not experience any decline in aggregate birth-rate at all. Nor do people in societies where nobody is well-off — these in fact tending to be the societies with the highest birth-rates!
The trend in my own observations, from personal experience with both family-wanting "trad" people, and "child-free" people, is that as the self-perceived value and security of your own life goes up, two things seem to shift in your perspective:
1. your subjective resource-commitment bar for bringing a new life into the world grows ever higher proportionally to your available resources (i.e. a millionaire has the intuition that they'll have to allocate a good portion of their millions to any kids they have; a billionaire thinks the same, but now it's a good portion of their billions.) Having kids is always intuitively expensive — but as your own life becomes better and more secure, having kids begins to feel exceedingly, inhibitingly expensive. (Even though it's likely eminently practical for people with 1/1000th your resources, let alone for you!)
2. specifically for women, the act of gestating and raising of a baby grows ever-larger in its subjective potential for negative impact on their (increasingly-highly-valued and de-risked) lives — in terms of both time-cost and risk to their health, career, etc.
Given these trends, here's my hypothesis for what's happening:
We have seemingly evolved to feel driven to reproduce when under a sort of optimum level of scarcity and uncertainty. We feel this drive the most when:
• things are bad enough that you feel your own opportunities for a good life are done and spent — so having kids is your genes' Hail Mary to trying again in 15 years when opportunities could be better; with this lack of further opportunity making the risk to your health and resources of having a kid feel "worth it";
• but things still aren't so bad at the moment, that any kids you conceive would literally starve and not make it to reproductive age themselves.
This heuristic worked just fine for the entirety of human evolution (and perhaps long before that); but it seems to "go wrong" in post-industrial society, for people experiencing no present scarcity or uncertainty. The heuristic wasn't designed to cope with this! It outputs nonsense — things like:
• "you never need to have kids, you'll be immortal, always healthy, and will always have infinite opportunities"
• "the right time to have kids is after you retire, when you'll have time, and also money accumulated to raise them. But of course, only if you can get someone else to carry the baby for you, since you won't have working gonads by then. And only if you can afford a team of nannies and private tutors, since you won't be able to handle the rambunctiousness of a toddler by then."
...both of which, as intuitions, tend to result in people just never having kids — despite often eventually regretting not having had kids. Because, by the time these intuitions shift or resolve positively, it's too late.
This broken intuition about when (or if) you should ever have kids, seems to lead to people also developing very different perspectives on sexual relationships. These "child-free" people — especially women — often seem to experience much lower sexual drive, or even fear of sex for its potential to force an (unwanted) child upon them. And this negative attitude toward sex, secondary to fear of reproduction, often then leads to either strained romantic relationships, or just not bothering with romantic relationships at all.
If you wanted to medicalize all this, you might call it a specific kind of neurosis that humans develop, when they no longer need to do anything much to ensure all their needs are satisfied. It's a compulsion toward catastrophizing all the negative aspects of reproduction, child-rearing, sex, and romantic relationships; and a disconnection from the emotional valence-weight that the positives of these same subjects would normally have.
(And I suspect that this is exactly the framing various societies will eventually take toward this developing "problem" — as I can totally imagine drug companies developing and marketing treatments for "reproductive neurosis", that trick just the part of your brain that cares about that sort of stuff, into thinking you're not well-off and not secure, so that it'll spit out the signals to tell you to feel more positively toward these subjects.)
> Do you really think we'll get some utopian fantasy like UBI?
Yes. Why wouldn't we? As soon as nobody has to labor, UBI is just one global (probably bloody) proletarian revolution away.
There has always been a "so who does the work nobody wants to do, then" blank spot at the end of the Marxist plan, that left all previous Communist revolutions floundering after the "revolution" part.
But "intelligent robots grow all the food autonomously, cook it, and give it out for free, while also maintaining themselves and the whole pipeline that creates their parts, to the profit of no one but the benefit of all — the mechanistic equivalent of Sikh Langar, accomplished at megaproject scale in every country" (and analogous utopias re: clean water, housing, etc) form a set of neat snap-in answers to that blank spot. I have always presumed that these are what AI advocates are vaguely attempting to gesture toward when they imagine AI "taking over all of the work."
> I can imagine a declining population, a younger generation, that does not want to reproduce, neither working off the available work.
Luckily the future does not depend on what you can or can't imagine. And no, even if that imagination were accurate, AI would not be the solution for any of those problems. We know the root causes, we know the effects, having a bunch of companies boil the oceans to have AI generate mediocre copy and uninspired illustrations does not help with anything other than making those companies richer and displacing even more workers and shutting down even more career paths.
Young people don't want to have kids or work because all the aspirational goals previous generations had have become unattainable for them, we're counting down the years until the climate catastrophe becomes impossible to ignore even in wealthier countries and there is literally nothing the masses can do because politicians across the world have shown a complete disregard for human life in the face of a global pandemic that had a death toll in the millions before we simply stopped counting.
> So we need systems, that learns human knowledge, refines it, makes it better and takes over all of the work.
And how would that benefit anyone but those who own those systems and charge for its use? How would that benefit the "younger generation"? To the contrary this would seem like it would do what automation always does: drive down wages and reduce workforces while harming workers' ability to bargain for better working conditions.
As tech workers we should take the time to understand that the "Luddites" were not actually opposed to technological advancement but to the consequences of it in a system that always rewards the business but not the worker for an increase in productivity. You don't need to withdraw into a cabin in the woods to realize that the way we have set up the systems that govern us technological progress will always only accelerate the wealth drain to the rich, never reverse it.
And that doesn't even get into how most AI startups are completely unsustainable (as growth-oriented startups tend to be) or how the proliferation of underpriced AI has contributed to the destruction of knowledge via search engine spam, "content generation" and social media bots.
If you want to create fully automated gay luxury space communism, be my guest, but you'll also need to work on the "communism" part if you want to make the "fully automated" part not result in the opposite direction.
What about education for all, better healthcare and a solution to hunger. Or discriminations? I’m not seeing anything trying to solve that in the AI space right now. It’s all about replacing workers and artists.