Logic Fault

Have you considered running your software on a computer?


Cute python trick: an @lru_cache containing bound methods

Here’s a cute Python trick I just discovered. I’m probably not the first one to find it.

Let’s say I want an @lru_cache that can contain bound methods.

I am not talking about a cache that decorates bound methods, like this:

class Leaky:
    @lru_cache
    def mymethod(self):
        return whatever

It’s (hopefully) well-known that this pattern can leak references to self. When people try to do this, what they usually want is a cache whose lifetime is bound to the lifetime of the instance in self. Such a cache exists: @cached_property or any one of its numerous backports or equivalents on PyPI.

But that’s not what I’m interested in.

Contrived problem: @lru_cache-decorated functions which receive methods #

Let’s say I have some classes with methods, and I want to learn and remember something expensive about those methods. Let’s also say that the return value of a given method foo may change based on what instance it’s called on, but there’s still something (computationally expensive) I can learn by calling foo–something that generalizes across all instances it could ever be cached on.

Let’s get more concrete, but also more contrived. Let’s say I want to make sure that thing.method() can be called with only one argument, for arbitrary values of thing and method. I need to verify that before I get into my main loop, because my main loop looks like this:

for thing in many_things:
    start_missile_launch()
    thing.method("Oh shit, wrong button. Abort, abort!")
    abort_missile_launch()

You can see how a TypeError: ThingClass.method takes 1 positional argument but 2 were given could really ruin my day. Woe is me.

In the contrived world I live in, there is some good news and a lot of bad news.

Good news: it’s safe to call method implementations before the main loop. Safety is important in code that mentions missiles; that’s obviously why it was coded in Python.

The bad news/constraints:

What I want to do is this:

@lru_cache
def takes_only_one_arg(func):
    try:
        func("test arg")
        return True
    except TypeError:
        return False

for thing in many_things:
    assert takes_only_one_arg(thing.method) 
# If we make it here, we know all the methods accept one argument
...  # Launch the missiles etc.

But that won’t work! Or rather, it will execute successfully, but with two big problems:

  1. The cache won’t … uh … cache very well. Multiple thing.method bindings may refer to the same underlying function on different objects (or subclasses that share an implementation of method, and so on), but each binding to a different instance/type will hash to something different, causing a cache miss.
  2. The cache will leak instances from many_things forever, even after many_things is garbage collected. In other words, this code never prints Deleting instance:
import os
from functools import lru_cache

class Foo:
    def meth(self): ...

    def __del__(self):
        print("Deleting instance")

@lru_cache
def cache(func): ...

def main():
    inst = Foo()
    cache(inst.meth)
    del inst
    print("Leaving main")

main()
os._exit(1)  # Skip final garbage collection

Okay, so what do?

Contrived solution: smuggling bound methods into a cache keyed by unbound methods #

It’s simple, we kill the bat key the cache by the unbound __func__, and smuggle the bound version “around” the cache using a thread-local:

from functools import lru_cache, partial, wraps
from threading import local

_smuggler=local()

@lru_cache
def _get_cached(_ignored, *args, **kwargs):
    return _smuggler.bound(*args, **kwargs)

def lru_cache_by_unbound_method(func):
    @wraps(func)
    def inner(method, *args, **kwargs):
        _smuggler.bound = partial(func, method)
        try:
            return _get_cached(method.__func__, *args, **kwargs)
        finally:
            _smuggler.bound = None
    return inner

That works surprisingly well. But how?

Then we can use our decorator on our contrived missile-launch-aborter code, like so:

@lru_cache_by_unbound_method
def takes_only_one_arg(func):
    ...

for thing in many_things:
    assert takes_only_one_arg(thing.method) 
# If we make it here, we know all the methods accept one argument
for thing in many_things:
    start_missile_launch()
    thing.method("Oh shit, wrong button. Abort, abort!")
    abort_missile_launch()

Is this a good idea? #

Maybe?

The constraints we erected to end up needing this are pretty silly. Even in our specific contrived scenario, it would be much simpler to use a local cache based on the guaranteed hashability of function objects, like this:

unique_methods = {f.method.__func__: f.method for f in many_things}
for bound in unique_methods.values():
    assert takes_only_one_arg(bound)

Even so, there might be scenarios where the code in @lru_cache_by_unbound_method makes more sense. Thinking out loud: perhaps it would be useful in ugly codebases that make a ton of (surprisingly expensive) calls to inspect methods in method decorators on very hot paths, e.g. for some kind of runtime type checking? Even then, it seems likely that the overhead of creating tons of new instances with decorated methods (if instance churn isn’t an issue, @cached_propertys work well) would dwarf any plausible decorator/call overhead, but who knows?

It’s not doing anything too terribly evil, at least, but that could change:

TODOs/Future Improvements #

def hideous_decorator(func, _smuggler=threading.local(), _caches=dict(), _lock=threading.Lock(), **deco_args):
    key = tuple(sorted(deco_args.items()))
    with _lock:
        if key not in _caches:
            _caches[key] = lru_cache(**deco_args)
    ...  # Smuggle data around _caches[key](func)(unbound, *args, **kwargs)

The rest is left as an exercise for the reader.