I really love the wave of new tools that are gaining traction in the Python world: tqdm, rich (and soon textual), fastpi, typer, pydantic, shiv, toga, doit, diskcache...
With the better error messages in 3.10 and 11, plus the focus on speed, it's a fantastic era for the language and it's a ton of fun.
I didn't expect to find back the feeling of "too-good-to-be-true" I had when starting with 2.4, and yet.
In fact, being a dev is kinda awesome right now, no matter where you look. JS and PHP are getting more ergonomics, you get a load of new low level languages to sharpen your hardware, Java is modern now, C# runs on Unix, the Rust and Go communities are busy with shipping fantastic tools (ripgrep, fdfind, docker, cue, etc), windows has a decent terminal and WSL, my mother is actually using Linux and Apple came up with the M1. IDEs and browsers are incredible, and they do eat a lot of space, but I have a 32 Go RAM + 1TO SSD laptop that's as slim as a sheet of paper.
Not to mention there is a lot of money to be made.
I know it's trendy right now to say everything is bad in IT, but I disagree, overall, we have it SO good.
New features get added and these posts are a nice reminder to check the docs on a tool I literally never need to do that for because it's so habitual to wrap any long-running iterable inside of a `tqdm()`.
Yeah, I've been using it for a while, but with the barrage of memes and politics on the internet nobody finds out about the good stuff until years later.
Way before @noamraph put it up on in pypi it was internally available in our company. To my knowledge he didn't yet make it 30 years ago, but if he did then probably at least not for Python!
Jokes aside it's a great language. I would love to see a better package management ecosystem for it, as that is also my biggest issue. Python also does something no other mainstream scripted language does- it allows you to install extensions to the language right in the same package manager, as it can compile libraries from source when wheels are unavailable- this makes it a much harder challenge. At the same time I'm really happy that PyPI is a non-profit organization and won't have to go through the issues that something like NPM did.
I love Python (flexibility, ecosystem, datamodel). I love Java (verbosity, explicitness, robustness). Now, kill me. I don't let others define what I should and shouldn't think about a tool, in this case a programming language. I make up my own mind based on my own experience with the tools. Remember, they're just tools. Not the end game.
I really like Poetry for my own uses but as a semi-infrequent Python user my struggle is dealing with other people's Python repositories that aren't using it.
I can never remember all the differences and subtleties between virtualenv, pipenv, venv, pyenv-virtualenv, workon, conda, and so on when I encounter them in a random git repo.
I did AoC 2021 through Day 7 in Python, before switching to Go (for personal learning, can't remember if times listed below were with "go run" or from running after "go build"). D7 actually in both, to try and solve a problem I was having with my Python performance. But, it was actually a big learning moment for me when it came to structuring my logic.
My Python code in Part 2 (some minor adjustments from Part 1) took about 40 seconds to run. Terrible, but usable to get a proper answer. I was able to bring it down to ~25 seconds with some optimization by adding a calculated lookup dictionary per loop. Now, the same exact logic in Go ran in about 0.8 seconds.
However, I wasn't satisfied with this and realized that if I moved the dictionary to be global rather than per-loop, I would be able to realize significant gains in performance by completing eliminating redundant calculations. This change dropped the runtime from 25 seconds to 0.35 seconds (of course, applying the same logic in Go brought it down from 0.8s to 0.05s).
Due to the nature of performance you can get out of the Python interpreter, it can actually lead you down paths of learning better optimization strategies that you may initially write off in other languages (depending on the use case) because they perform inherently better. It made me think a bit more about what I was doing and how I could improve it since (in this particular case), the impact of not doing so was pretty drastic.
Day 6 and 7 are fantastic for that. It's a learning moment, where you can get the benefit of thinking about the problem for a few minutes instead of bruteforcing gives you a dramatic speed up.
When I read the Day 7, I saw bruteforcing would lead to bad perfs. Then I remembered that in high school, I learned a formula to calculate the nth term of a sequence without having to process the entire sequence.
I couldn't remember the formula, nor the name of the concept, so I google around until I found some tutorials, and relearned what I was taught as a child: arithmetic sums.
The consumption for a crab can then be calculated in constant time:
Interesting, I'll have to look at that. Actual algorithms are a big weak point for me, I'm not a developer by day, so I don't spend a lot of time learning code practices or computer science/math topics. This was my solution to Day 7 Part 2 (positions = sorted int list of input data):
def calculate_fuel(positions):
fuel = None
low_fuel_value = None
calculated = dict()
for value in range(positions[0], positions[-1]+1):
consumed = 0
for position in positions:
diff = abs(value - position)
try:
consumed += calculated[diff]
except KeyError:
if position != value:
consumption = sum([x for x in range(1,diff+1)])
calculated[diff] = consumption
consumed += consumption
if fuel:
if consumed < fuel:
fuel = consumed
low_fuel_value = value
else:
fuel = consumed
low_fuel_value = value
print(f"Aligning to: {low_fuel_value}")
return fuel
And Day 6 I fell for the bruteforce bait and had the thought "It can't be as easy as changing 80 to 256, right?". Then I realized the pain I had created for myself. BUT! My 6p2 code ran faster than my 6p1 by a good margin, which I was happy about.
Can even calculate the target without bruteforcing. . n^2+n is approximated by n^2 when n is large. Can take the mean of all the distances and use that (or perhaps +/- 1).
And similarly for part1 take the median. Why it kinda works:
Part1: The median I felt made sense intuitively, as in my head I thought about an example ala 1,1,3,100. Never makes sense to use x>3, because even though the crab at x=100 then can walk shorter, there are 3 others then having to walk longer. And x=1,2or3 doesn't matter, just symmetrically changes which side has to walk one step less or one step more.
And for part2 I thought similar, except the cost is exponential and therefore I want to minimize the avg move and not the total moves, thus taking the average.
The better optimization with Python is to not use python. Half joking. If you want awesome performances, transform your problem into a numerical one, and use Numpy. Numpy is awesome. I always miss it when using other languages than Python.
I don't understand your solution to Day7. What kind of dictionary, and how come even a naive solution would be so slow?
My Kotlin solution runs in about a second. And it was even so stupid that I didn't calculate the sum of the arithmetic series directly, but through a loop. Can't fathom something being slower.
line.split(",").map { it.toInt() }.let { crabs ->
(0..crabs.maxOf { it }).minOf { pos ->
crabs.sumOf {
(0..abs(it - pos)).sum() // slower than calculating arithmetic sum, but quicker to write
}
}
}
My proper Kotlin solution runs in less than a ms, though.
I do think that "it's good that python is slow because it forces you to optimize" is a weird take, though.
> I do think that "it's good that python is slow because it forces you to optimize" is a weird take, though.
My take wasn't that it forces you to, but that non-optimal code paths can be greatly exaggerated in comparison to other languages, particularly compiled ones. You can still ignore it (I mean, within reason), but it can give you that extra push to really look a bit deeper to understand what's going on. And of course, there's optimized libraries written in C/C++ that you can take advantage of for even better number crunching than standard CPython.
> What kind of dictionary, and how come even a naive solution would be so slow?
My naive solution was literally going through every single element for every loop and not storing any data besides the fuel buildup and the alignment number that generated it. The dictionary was added to act as a cache to store already computed fuel consumption values, initially per-loop then moved one level up to be global (because the summations wouldn't be different).
I'm not saying my method (posted in a sibling comment) is the best solution, but it's the way my brain walked through the problem.
Posted my optimized Kotlin solution in that sibling thread :)
Cool of you to participate without being a developer! Lots of computer science topics makes it easier, so hard without knowing of them. For instance graph searching / Dijkstra has beem relevant this week.
Yeah, I was doing a CS minor in college but had to drop as it was consuming too much of my time from my other discipline in the non-intro courses. Big(O)/time complexity were my usual failings in the intro algorithms course I took.
I'm not unfamiliar with programming, but I come from the sysadmin side of things. "Glue" work is usually where things are focused and the 'fun' nitty-gritty of algorithms can be a bit out-of-scope, though I'm not a sysadmin in my current role anymore so any dev-related work I do is purely personal now.
I've had to take a break from AoC, only got up through Day 10, but didn't get P2 for 8 and 9. It's a fun way to keep the mind going and to slip back into the coding space to at least not lose skills, even if the solutions are simple/non-optimal.
Yes! Tasks like AoC (relatively small, without stringent performance requirements but still requiring the correct algorithm) are where Python is not only a reasonable choice, it's unreasonably effective.
Doubly so this year, with the theme being linear algebra.
doit [0] is a superb toolkit for building task-oriented tools with dependencies. It isn't too complex to get started and gives you a lot to work with. I've used it for an internal tool used by everyone in the company for 5+ years and it has never given me a headache.
I mean, it's declarative, works on Windows, easy things are (very) easy, and because you can mix and match bash and python actions, hard things are suspiciously easy too.
Given how complicated the alternatives are (maeven, ninja, make, gulp...), you'd think it would have taken the world for years.
Yet I've only started to see people in the Python core dev team use it this year. It's only getting traction now.
Here is a task that groups all static files from a web project into a directory.
It has to make sure a bunch of directories exist, run a node js command to build some files from the proper dir, then run a python command to regroup them + all static files from all django apps into one dir. Simple.
But then I had a problem I had to hack around. This required me to change an entry to a generated TOML file on the fly at every build.
doit just lets me add a 5 lines python function that does whatever I want, and insert it between my bash tasks, and I'm done.
It is a pain to work with invoke now. It is all find and dandy for the basic features but you going to hit the seams soon you start trying advanced stuff. Looks like the project is going to be abandoned.
Love the idea of Schema, but not a fan of that syntax, doesn't seem Pythonic to me, seems more like something you'd see out of Javascript word.
Like looking at that very first example, I have no clue what "len" means in that context. Is it implicitly checking that it's not an empty string? Then on the next line, how come `int` has `Use()` around it, but on the previous line `str` didn't? I guess that int is being used as a converter, line on the next line with str.lower, but the str was being used as a type check?
I'll add BeeWare (https://beeware.org/) as a pretty nice nice to deploy Python applications cross-platform easily. It's more like a suite of tools, though.
I love Python for many things: scripts, data science, prototypes, etc. But I would never use it to build a large backend system. I just can't handle all the warts. Yes it has been used by large companies, but not without their problems.
You use Python to rapidly build a small backend system and then, if any warts appear that need to be smoothed over, you carve that function off to a more robust solution.
Python is great for programming at the speed of inspiration.
for prototypes and MVPs time-to-learning and time-to-market are king, and programmer time is the most expensive thing. in that niche Python is helpful and tests just slow you down
tests are super helpful and wise in any sort of long term or safety-critical software, obviously
so, tests (and Python) are not inherently a win or a loss. they just give you different trade-offs
times to write testless Python and times to write test-heavy Go, Rust etc.
Generally I believe in engineering tradeoffs but I think some tests are always important. Even with MVPs. Writing test might double your time to market, but they will allow you to move faster and make less regressions once in production. The early days are some of the least stable features wise and having tests in place to verify things when massively changing a codebase is nice. Also, it can greatly help add confidence to newcomers who don't know the codebase.
understood. theres def an art to judging when its too early to write tests vs the perfect time. its easy for anyone to recognize once its become "too late" and therefore more painful and costly to add it than otherwise would have been
I therefore try to put each situation quickly into 1 of 3 buckets then move forward on that basis:
1. heck no
2. heck yes
3. either way. a gray area
I generally see a case 1 and 2 with confidence. therefore by a process of elimination, that also lets me deduce when its case 3. and in those cases you cant go wrong. :)
I think that is also reasonable, if the production system does not have to service a heavy load, and does not need to be scalable. Scalability is often assumed to be a requirement, however it is not always required in practice.
It's energy inefficient like a school bus, not inefficient for the sake of it, but because it's a decent way to shuttle passengers of all experience levels through all weather.
Do you seriously think that 20 cars are more economic than a single bus?
Python performance problems are exaggerated for many use-cases (hot paths are not written in Python e.g., matching regexes happens in C code(re,regex), parsing xml too (elementtree, lxml), sqlite, numpy,scipy (C, Fortran), etc. Cython makes it trivial to drop to C level where necessary)
wow, didn't know this existed. I've been using ipython as shell-ish replacement for not-so-serious-thing. I need to take a detail look at this from my initial glance over. thanks
I think what OP is referring to is how fiddly it is to run a subprocess and capture the output. By the way, also super annoying in Java - you have to faff about with a thread (and you can't even really do that in python, which as much as I love the language, is quite pathetic) that drains stdout before it gets jettisoned by the OS, and in both cases long story short is that it's easy to write something that works when the output is short, but getting it to work on long output is an exercise in frustration. One neatly solved by plumbum, and so I'm definitely with OP it's use makes code measurably less shit.
If you like tqdm, it's worth checking out pqdm, a parallelized version. If you have embarrassingly parallel work to process in a script, it makes it dead simple to parallelize and monitor the progress of something. Highly recommend:
+1 for PQDM, I use it a ton and most of the time it just works. I did have some rare cases where PQDM was much slower than a direct joblib implemention, but that could well be a fluke on my end. Either way, amazing package!
My current issue with tqdm is nested progress bars in multi-threading/processing causing dead locks + tqdm.write just being broken in those contexts, either deadlocking or the ui just being wrong. Does pqdm do a better job?
Python is such an amazing language, on one hand, it's easy enough where you could probably train a non technical person on it within about two months.
Yet, you can still make 200K writing it, I don't know the next time I'll create a command line application in Python, but I'll keep this little tool in mind. I hope Python eats the world.
it may be easy to get going with python, but it takes a non trivial amount of time to understand, what is going on. I have a advanced python3 course https://github.com/MoserMichael/python-obj-system that explains some of the more advanced concepts. One of the things covered are decorators (tqdm is a decorator)
Check out Will Chrichton's talk "Type-Driven API Design in Rust" where he live-codes a tqdm-like progress bar for Rust iterators. Solid presentation and eye-opening how straightforward it was to extend the core language through traits.
Indeed. This comes to mind right away. Question is: are we missing something? Of course, tqdm brings this functionality right into python REPL, which is new and looks great.
pv and tqdm would look even better if they'd be called implicitly (with an opt-out) since I always end up regretting not using pv when my command is taking too long. Too late.
I have such mixed feelings about Click; I've reached the point of using it so much that I now hate it. Which I think probably means that it's great but only as as stopgap until you learn how to bend argparse to your will. It's just too magic, too much global state, and too much spooky action at a distance.
I actually don't think Click is too much global state and spookiness, at least in relation to a lot of popular Python libraries.
I do wish I could hook into it to test better, the only thing you can really do right now is to have it print stuff out and assert the output string. It's not really necessary to just build a CLI with click, but I want to build a library that integrates with it and testing the integration is a PITA.
I want to write a config-loading library for CLI apps like Golang's Viper lib, but for Python
I went the opposite way. Used to do everything with argparse, then discovered click and never looked back. While argparse lets you do anything, click forces you to build a good CLI.
I had the same feeling. I wish there were an equivalent that tried to minimize the magic and global state a bit while still letting you make decent CLIs with a tiny amount of code.
I love Typer, it's my go-to tool for building CLIs, but I'm worried about its future. It's developed by a single developer, there are too many issues left unattended, no development for months, lacking a good API to extend it or to interact with Click.
I use tqdm (with argparse) in a pyinstaller packaged exe. 5-stars - Its great! I call the exe from Java to do some ML and forward the tqdm progress and status messages to a Swing progress bar. It makes the user experience seamless. Depending on the task and the user's settings the tool usually takes about 7-8 seconds(including the PyInstaller extract) but it can also take up to a minute. When its 7-8 seconds the progress messages fly by and the tool feels snappy. When its 50-60 seconds the users are very grateful for the progress bar.
Meanwhile I can develop and test the tool from the command-line and see progress info and when I want to run the ML code in Jupyter notebooks the progress bars can still be made to work.
tqdm is one of the very few Python packages that makes it into every script I write. It's a very high ROI for managing and tracking simple loops.
My only complaint is the smoothing parameter; by default it predicts the estimated time remaining based on the most recent updates so it can fluctuate wildly; smoothing=0 predicts based on the total runtime which makes more sense given law of large numbers.
well, it seems to me that its quite likely that speed varies across the process, so the more recent updates are probably a more useful default, so if your first part of the process is slower/faster this won't permanently mess up the estimates.
Is there a way to globally disable all tqdm progress bars?
Something like the NO_COLOR environement variable?
These progress bars are nice when you launched a single loop yourself, but when you are running an automated battery of many things they become annoying and pollute your terminal too much. I know that you can silence each particular program that uses tqdm by setting a 'disable' option. But this requires editing the python source code of the program.
import os
import sys
from tqdm import tqdm as _tqdm
def tqdm(*args, **kwargs):
try:
disable = bool(int(os.environ['NO_PROGRESSBARS']))
except KeyError:
disable = not sys.stdout.isatty()
except (ValueError, TypeError):
disable = False
kwargs.setdefault('disable', disable)
return _tqdm(*args, **kwargs)
Then import that instead of tqdm.tqdm.
sys.stdout.isatty() isn't a perfect answer to what people ask when they want to know "am I running in an automated environment, or is a human user looking at my output?", but it's close. More nuance is available online.
But it's not me who is importing tqdm to begin with! I call many programs in parallel from shell scripts (out of my control) and they all call tqdm individually. I need to stop tqdm output from outside these programs.
Your code should be part of tqdm itself, not written by individual programmers.
> am I running in an automated environment, or is a human user looking at my output?
But I want to stop tqdm output precisely because I'm a human looking at it. If you have more than one or two progress bars simultaneously, it becomes useless clutter.
As much as I like to use tqdm myself for my programs, I'm sad that as tqdm becomes more and more popular, my terminal output becomes more and more cluttered, to an absurd amount. Piping the output to a file does not help and is totally the wrong idea. I'm precisely interested in seeing--in real time--the part of the output that does not come from tqdm, such as warnings and errors.
That's a reasonable request. Discussion about a feature along those lines seems to be happening in https://github.com/tqdm/tqdm/issues/614; perhaps you could weigh in there?
I like how tqdm can manage multiple progressbars in the same terminal. I have used it to track the progress of multi day processes. It also produces a nice animation on jupyter notebooks.
Whoaaaa, this tqdm thing is, like, um, toats kewl!!! It's waaay better than the <blink> tag old people used to use, like, before I was even born lol. IM so so happy idont have to think too hard to get flashing light thingys to show up on my screen.
Wo dude, not at all. But you are literally saying exactly what my gramps used to say "but you are wrong." I figured his brane was too old to tell me why I was wrong. Computers are really fast now. Why cant they just give the answer right away?
Yeah, exactly. The software I write is complex enough to require logging of hundreds of progress bars at the same time.
The great thing that just using `logging` module is enough to log something like `Downloading file A, 55% [55Mb/100Mb]` (absolutely equivalent in terms of information to a "graphic bar") and also happens to be composable in a way that I can then reuse that package as part of anything that is also non-interactive.
To clarify, the tqdm.notebook progress bar (which is much prettier) works on Colab. If you want to make the script more agnostic, you can import tqdm.auto.
A fun quirk is that tqdm.notebook didn't work with Colab's dark mode so the text was readable; this was very recently fixed.
I think the author doesn't fully appreciate how insanely difficult it is to write accurate progress meters for anything slightly more complicated than a single for-loop.
The halting problem is easy to solve most of the time. For instance, you can easily tell whether the following two programs halt:
while True:
pass
while False:
pass
What's impossible is solving the halting problem 100% of the time with 100% accuracy. Most people don't need to do that. Solving the halting problem most of the time and then saying "I don't know, that's too hard" the rest of the time is of immense practical value, and many practical systems (the kernel's eBPF verifier, the thing in your browser that detects stuck pages, etc.) do exactly that.
> The number of expected iterations. If unspecified, len(iterable) is used if possible. If float("inf") or as a last resort, only basic progress statistics are displayed (no ETA, no progressbar).
For things with a fixed length, it looks it up ahead of time (i.e. things that return things when you call len() on them). For other things such as generators, you can supply a total=N value when you create the progress bar. If you dont know the total ahead of time, it doesnt give you a percent completion, but still tells you thinks like number of iterations per second and number of iterations finished.
Have you ever used a for loop? You don't know how long it'll take, but you know how many iterations, and at any given point you can know how long past iterations have taken. Extrapolation is not hard, even if not always accurate.
Hey, I know this is a bit off-topic, but I can't wrap my mind around one thing. I'm getting a ton of Google ads on that website. Where do they come from? Is Github adding them? Did the devs add them to their docs?
With the better error messages in 3.10 and 11, plus the focus on speed, it's a fantastic era for the language and it's a ton of fun.
I didn't expect to find back the feeling of "too-good-to-be-true" I had when starting with 2.4, and yet.
In fact, being a dev is kinda awesome right now, no matter where you look. JS and PHP are getting more ergonomics, you get a load of new low level languages to sharpen your hardware, Java is modern now, C# runs on Unix, the Rust and Go communities are busy with shipping fantastic tools (ripgrep, fdfind, docker, cue, etc), windows has a decent terminal and WSL, my mother is actually using Linux and Apple came up with the M1. IDEs and browsers are incredible, and they do eat a lot of space, but I have a 32 Go RAM + 1TO SSD laptop that's as slim as a sheet of paper.
Not to mention there is a lot of money to be made.
I know it's trendy right now to say everything is bad in IT, but I disagree, overall, we have it SO good.