This is how I would expect it to work. Caching mutable collections is a legitima...

pyuser583 · on July 9, 2021

Can you show me a caching framework/library written in something other than Python that exhibits this behavior?

Edit: Can you show me a caching backend other than local memory (Redis, for example) that implements this behavior?

goodside · on July 10, 2021

This isn’t special behavior that lru_cache “implements”, it’s intrinsic to how mutable contained objects work.

Imagine you `lru_cache` a zero-argument function that sleeps for a long time, then creates a new temp file with random name and returns a writable handle for it. It should be no surprise that a cached version would have to behave differently: Only one random file would ever be made, and repeated calls to the cached function would return the same handle that was opened originally.

If you didn’t expect this, you might complain that writes and seeks suddenly and mysteriously persist across what should be distinct handles, and maybe this `lru_cache` thing isn’t as harmless as advertised. But it’s not the cache’s fault that you cached an object that can mutate state outside the cache, like on a filesystem.

Logically, the behavior remains the same if instead of actual file handles we simulate files in memory, or if we use a mutable list of strings representing lines of data. Generalizing, you could cache any mutable Python collection and see that in-place changes you make to it, much like data written to a file, will still be there when you read the cache the next time.

The reason you don’t see “frameworks” for this is because tracking references to instantiated Python objects outside of the Python process is pointless — objects are garbage-collected and are not guaranteed to stay at the same memory location from one moment to the next. Further, if the lists themselves are small enough to fit in memory, surely there’s no need for out-of-memory scale to cache simple references to those objects.

goodside · on July 10, 2021

Stepping back, I think part of your surprise toward `lru_cache` stems from familiarity with out-of-core or distributed caches, where the immutability of cached values is simply imposed by the architecture. In a distributed cache, modifying the cached value in-place means making another API call, so you can’t mutate the cached object accidentally because you mistook it for a copy.

The only way you can have this confusion is if the actual cached object could be somehow returned. That only happens when everything in the cache is actually just a Python object in your same process.