Cute python trick: an @lru_cache containing bound methods

Oct 24, 2024 6 min read
Python

Here’s a cute Python trick I just discovered. I’m probably not the first one to find it.

Let’s say I want an @lru_cache that can contain bound methods.

I am not talking about a cache that decorates bound methods, like this:

class Leaky:
    @lru_cache
    def mymethod(self):
        return whatever

It’s (hopefully) well-known that this pattern can leak references to self. When people try to do this, what they usually want is a cache whose lifetime is bound to the lifetime of the instance in self. Such a cache exists: @cached_property or any one of its numerous backports or equivalents on PyPI.

But that’s not what I’m interested in.

Contrived problem: `@lru_cache`-decorated functions which receive methods #

Let’s say I have some classes with methods, and I want to learn and remember something expensive about those methods. Let’s also say that the return value of a given method foo may change based on what instance it’s called on, but there’s still something (computationally expensive) I can learn by calling foo–something that generalizes across all instances it could ever be cached on.

Let’s get more concrete, but also more contrived. Let’s say I want to make sure that thing.method() can be called with only one argument, for arbitrary values of thing and method. I need to verify that before I get into my main loop, because my main loop looks like this:

for thing in many_things:
    start_missile_launch()
    thing.method("Oh shit, wrong button. Abort, abort!")
    abort_missile_launch()

You can see how a TypeError: ThingClass.method takes 1 positional argument but 2 were given could really ruin my day. Woe is me.

In the contrived world I live in, there is some good news and a lot of bad news.

Good news: it’s safe to call method implementations before the main loop. Safety is important in code that mentions missiles; that’s obviously why it was coded in Python.

The bad news/constraints:

Many classes’ implementations of method take a very long time.
Instances are not hashable or comparable, and inside-out comparison tricks won’t make it past code review. No keying things by instance.
I also can’t key any caches by instance type, because constructor logic may re-bind method to point to different callables for the same type. I said it was silly, but I have also seen this in the wild. Don’t do this.
many_things may contain a ton of duplicates. Zillions, even.
I don’t know how many different classes and implementations of method are in many_things, and because of that missing hashability/eq-ability, I can’t pre-generate that list. I know, I know. Just go with it. It’ll be cool, I promise.
I can’t use inspect.Signature or anything like that to check the arity of method.
Leaking references to instances is not acceptable.
weakref is not available.
Multithreading is in the mix, so our caching system must be thread-safe (that is: simultaneous edits to the cache from different threads must not cause invalid results or errors).

What I want to do is this:

@lru_cache
def takes_only_one_arg(func):
    try:
        func("test arg")
        return True
    except TypeError:
        return False

for thing in many_things:
    assert takes_only_one_arg(thing.method) 
# If we make it here, we know all the methods accept one argument
...  # Launch the missiles etc.

But that won’t work! Or rather, it will execute successfully, but with two big problems:

The cache won’t … uh … cache very well. Multiple thing.method bindings may refer to the same underlying function on different objects (or subclasses that share an implementation of method, and so on), but each binding to a different instance/type will hash to something different, causing a cache miss.
The cache will leak instances from many_things forever, even after many_things is garbage collected. In other words, this code never prints Deleting instance:

import os
from functools import lru_cache

class Foo:
    def meth(self): ...

    def __del__(self):
        print("Deleting instance")

@lru_cache
def cache(func): ...

def main():
    inst = Foo()
    cache(inst.meth)
    del inst
    print("Leaving main")

main()
os._exit(1)  # Skip final garbage collection

Okay, so what do?

Contrived solution: smuggling bound methods into a cache keyed by unbound methods #

It’s simple, we ~~kill the bat~~ key the cache by the unbound __func__, and smuggle the bound version “around” the cache using a thread-local:

from functools import lru_cache, partial, wraps
from threading import local

_smuggler=local()

@lru_cache
def _get_cached(_ignored, *args, **kwargs):
    return _smuggler.bound(*args, **kwargs)

def lru_cache_by_unbound_method(func):
    @wraps(func)
    def inner(method, *args, **kwargs):
        _smuggler.bound = partial(func, method)
        try:
            return _get_cached(method.__func__, *args, **kwargs)
        finally:
            _smuggler.bound = None
    return inner

That works surprisingly well. But how?

Assuming a bound method (a callable with __func__) is in the first position of the decorated function, @lru_cache_by_unbound_method will stash the bound method in a threading.local.
Then, we key the @lru_cache with the unbound underlying function, and the rest of the arguments.
On cache miss, the unbound function argument will be thrown away and the thread-local stash will be used to get the bound method to call and prime the cache. This saves us from having to deal with runtime-rebinding the method descriptor back to the instance, which wouldn’t be thread-safe or pleasant in general.
The stash is unconditionally cleared after each cache access, preventing reference leaks and saving us from having to deal with some sort of hairy thread-local stack for nested uses of the decorator.

Then we can use our decorator on our contrived missile-launch-aborter code, like so:

@lru_cache_by_unbound_method
def takes_only_one_arg(func):
    ...

for thing in many_things:
    assert takes_only_one_arg(thing.method) 
# If we make it here, we know all the methods accept one argument
for thing in many_things:
    start_missile_launch()
    thing.method("Oh shit, wrong button. Abort, abort!")
    abort_missile_launch()

Is this a good idea? #

Maybe?

The constraints we erected to end up needing this are pretty silly. Even in our specific contrived scenario, it would be much simpler to use a local cache based on the guaranteed hashability of function objects, like this:

unique_methods = {f.method.__func__: f.method for f in many_things}
for bound in unique_methods.values():
    assert takes_only_one_arg(bound)

Even so, there might be scenarios where the code in @lru_cache_by_unbound_method makes more sense. Thinking out loud: perhaps it would be useful in ugly codebases that make a ton of (surprisingly expensive) calls to inspect methods in method decorators on very hot paths, e.g. for some kind of runtime type checking? Even then, it seems likely that the overhead of creating tons of new instances with decorated methods (if instance churn isn’t an issue, @cached_propertys work well) would dwarf any plausible decorator/call overhead, but who knows?

It’s not doing anything too terribly evil, at least, but that could change:

TODOs/Future Improvements #

As written, the bound method has to be in first argument position. I someone could write logic to go pull apart *args and **kwargs, stash any bound methods, then reassemble them in the cache-miss handler. Sounds ugly.
The new decorator could support @lru_cache’s maxsize and typed kwargs without too much trouble, if you’re willing to think through how the nonlocal captures work when defining @lru_cache-decorated functions inside the decorator’s scope next to inner.
A custom callable could be returned instead of inner, which could proxy calls to cache_parameters, cache_info, __wrapped__, etc. to the underlying cache decorator, making the decorator theoretically drop-in compatible with @lru_cache.
Type annotations, because they’re ~~digestible~~ a good idea. Hope you like ParamSpecs!
A self-contained single method form is possible, if you’re willing to read really sad function preludes like this one:

def hideous_decorator(func, _smuggler=threading.local(), _caches=dict(), _lock=threading.Lock(), **deco_args):
    key = tuple(sorted(deco_args.items()))
    with _lock:
        if key not in _caches:
            _caches[key] = lru_cache(**deco_args)
    ...  # Smuggle data around _caches[key](func)(unbound, *args, **kwargs)

The rest is left as an exercise for the reader.

Cute python trick: an @lru_cache containing bound methods

Contrived problem: @lru_cache-decorated functions which receive methods #

Contrived solution: smuggling bound methods into a cache keyed by unbound methods #

Is this a good idea? #

TODOs/Future Improvements #

Contrived problem: `@lru_cache`-decorated functions which receive methods #