More

mau · 2026-01-06T15:09:21 1767712161

German-style strings is a way to store array of strings for columnar dbs. The idea is to have an array of metadata. Metadata has a fixed size (16 bytes) The metadata includes the string length and either a pair of pointer + string prefix or the full string for short strings. For some operations the string prefix is enough in many cases avoiding the indirection.

This is different from Pascal strings.

mau · 2025-12-29T10:52:24 1767005544

https://archive.is/2025.12.29-103247/https://m.independent.i...

mau · 2025-10-16T15:48:27 1760629707

> Your coworkers and QA will thank you for learning LINQ and ditching the imperative methods that plague your Python brain.

This is a very unfortunate joke: Python has list (and generator) comprehension expression for a long time (2.3?) which are similar to LINQ. At some point in the history many languages stole useful expressions from other paradigms.

Let’s joke on BASIC, it always works.

dragonwriter · 2025-10-16T16:35:25 1760632525

> This is a very unfortunate joke: Python has list (and generator) comprehension expression for a long time (2.3?) which are similar to LINQ.

I love Python, its my main daily driver, both at work and by preference for most of my personal coding, but Python comprehensions and genexps are much more limited than LINQ language level query syntax (Scala’s visually-similar construct is more like LINQ in capabilities) and Python—purely because of core and stdlib convention which also drive convention for the ecosystem, not actual structural features—lacks anything like the method syntax as a common API (unlike, say, Ruby).

EDIT: Thinking about it a little bit, though, it should be possible in theory to implement LINQ in Python without language level changes (including providing something close to but not quite as clean as the language level query syntax[0]) as a library via creative use of inspect.getsource and ast.parse, both for providing the query syntax and for building the underlying expression tree functionality around which providers are built (support for future python versions would require implementing translation layers for the ASTs and rejecting unsupported new constructs). Conceptually, this is similar to how a lot of embedded DSLs in Python for numeric JIT, compiling GPU kernels, etc., from (subsets of) normal Python coded are done.

[0] existing comprehension/genexp syntax looks similar, but relies on simple iteration, not pushing code execution out to a provider which may be doing something very different behind the scenes, like mapping "if..." clauses into SQL WHERE clauses for a database query.

pjc50 · 2025-10-16T16:00:17 1760630417

List comprehension is pretty good, but I prefer LINQ method-style because it's executed left-to-right, whereas I keep having to look up the order of Python.

mau · on Aug 7, 2024

Arrow supports this string format: https://arrow.apache.org/docs/format/Columnar.html#variable-...

From the article:

> As luck would have it, the Arrow spec was also finally making progress with adding the long anticipated German Style string types to the specification. Which, spoiler alert, is the type we implemented.

mau · on April 8, 2023

Congratulations to the team, Pydantic is an amazing library.

If you find JSON serialization/deserialization a bottleneck, another interesting library (with much less features) for Python is msgspec: https://github.com/jcrist/msgspec

jammycrisp · on April 8, 2023

Are there any necessary features that you've found missing in msgspec?

One of the design goals for msgspec (besides much higher performance) was simpler usage. Fewer concepts to wrap your head around, fewer config options to learn about. I personally find pydantic's kitchen sink approach means sometimes I have a hard time understanding what a model will do with a given json structure. IMO the serialization/validation part of your code shouldn't be the most complicated part.

BerislavLopac · on April 8, 2023

The biggest issue missing from most conversion and validation libraries is creating models from JSON Schema. JS is ideal for central, platform agnostic single source of truth for data structures.

timinou · on April 8, 2023

In my use case, I find the lack of features of msgspec more freeing in the long run. Pydantic is good for prototyping, but with msgspec I can build nimble domain specific interfaces with fast serial/deserialisation without having to fight the library. YMMV!

mau · on March 14, 2023

I think the GH app is a workaround for this:

> Deploy keys only grant access to a single repository. More complex projects may have many repositories to pull to the same server.

mau · on Feb 9, 2023

I don't know what were the issues Yelp was facing, I have upgraded several times my past Kafka clusters and really never experienced any issues. Normally the upgrade instructions are documented (e.g. https://kafka.apache.org/31/documentation.html#upgrade) and the regular rolling upgrade comes with no downtimes.

Besides this, operating Kafka never required much effort a part when we needed to re-balance partitions across brokers. Earlier versions of Kafka required to handle it with some external tools to avoid network congestions, but I think this is part of the past now.

On the other hand, Kafka still needs to be used carefully, especially you need to plan topics/partitions/replications/retention but that really depends by the application needs.

Hackbraten · on Feb 9, 2023

I used to work in a project where every other rolling upgrade (Amazon’s managed Kafka offering) would crash all the Streams threads in our Tomcat-based production app, causing downtime due to us having to restart.

The crash happens 10–15 minutes into the downtime window of the first broker. Absolutely no one has been able to figure out why, or even to reproduce the issue.

Running out of things to try, we resorted to randomly changing all sorts of different combinations of consumer group timeouts, which are imho poorly documented so no one really understands which means which anyway. Of course all that tweaking didn’t help either (gunshot debugging never does).

This has been going on for the last two years. As far as I know, the issue still persists. Everyone in that project is dreading Amazon’s monthly patch event.

sumtechguy · on Feb 9, 2023

Check the errors coming back on your poll/commit. The kafka stack should tell you when you can retry items. If it is in the middle of something sometimes it does not always fail nicely but you can retry and it is usually fine. Usually I see that sort of behavior if the whole cluster just 'goes away' (reboots, upgrades, etc). It will yeet out a network error and then just stop doing anything. You have to watch for it and recreate your kafka object (sometimes, sometimes retry is fine). If they are bouncing the whole cluster on you each broker can take a decent amount of time before they are alive again. So if you have 3 and they restart all 3 in quick succession all at once you will see some nasty behavior out of the kafka stack. You can fiddle your retries and timeouts. However, if that is lower than it takes for the cluster to come back you can end up with what looks like a busted kafka stream. I have seen it take anywhere from 3-10 mins for a single broker to restart sometimes (other times it is like 10 seconds). So depending on the upgrade/patch script that can be a decent outage. It goes really sideways if the cluster has a lot of volume to replicate between topics on each broker (your replica factor).

mau · on Jan 3, 2023

You can write code that runs in both interpreters. Same syntax can have different behavior.

I guess nowadays, as the py2 compatibility is a thing of the past, most programs would crash when using that interpreter version.

mau · on Dec 20, 2022

One thing that is underestimated is keep the tools version in sync between your app dev dependencies and pre-commit. This also includes plugins for specific tools (for instance flake8). A solution would be to define the hooks in pre-commit to run the tools inside your venv.

About typings: I agree the eco-system is not mature enough, especially for some frameworks such as Django, but the effort is still valuable and in many cases the static analysis provided by mypy is more useful than not using it at all. So I would suggest to try do your best to make it work.

mau · on Dec 30, 2019

We are currently using Kafka Manager (https://github.com/yahoo/kafka-manager). It seems Kafka HQ have the same features (it is not clear from the doc if you can actually manage the partition arrangement or only view it) plus the possibility to view the actual messages that are streaming in Kafka that might be very useful.

tchiotludo · on Dec 30, 2019

View message inside Kafka was my main goal when building KafkaHQ ! Unfortunately, you can't, for now, reassign the partitions, but maybe I will can add the feature when this one will be ready : https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A... (targeted for Kafka 2.4, but I don't think this was release)

pul · on Dec 30, 2019

You can only set the number of partitions when you create a topic in KafkaHQ, not after the fact. You can view messages, even if they are AVRO encoded and backed by a schema registry.