Skip to content

Dealing with cross language memory management #1292

@Thrameos

Description

@Thrameos

One of the largest problems with the JPype's attempt to wrap Python object is Java is the potential to create easy irresolvable memory loops.

Background

Current JPype uses strong references from a JObject to each Java object that is referenced from Java. As we currently operate this is acceptable because the JVM is subordinate to the Python environment. Each memory object in Java is held alive simply because of those global references. The special case is Proxy object in which Java holds a back reference to Python. In this limited case we have special logic to keep the Python logic alive without directly creating a reference loop. The problem here is this only works for the short loops that JpypeProxy creates. Further, I am not sure this "actually" solves the referencing problem because while we do create a reference count increment from the Java side those may not be true strong references.

This problem basically explodes if we ever introduce a way to easily reference a Python object in Java. In Python is very easy to pick up unexpected references as a result of the local dictionary. And users can easily produce their own reference loops by simply using simple containers. As we have strong references from two languages which feature no support for any cooperative garbage collection, I have been stumped at how to add this important feature. I have investigated other language wrappers and found every one of them either punts and ignores the problem, requires the user to solve it themselves by deliberately adding weak references in critical locations, or simply does not allow complex structure to be formed. Ultimately all three approaches are unworkable. Ignoring just produces memory leak reports, weak pointer solutions shoves those reports back on the user ("your problem not ours"), and preventing complex structures means not a usable product.

I could try to solve this by declaring one side or the other dominate at the beginning. If the bridge starts from Python than Java is subservient and Java to Python links are weak (no loops possible). If the bridge starts from Java then Python back linkages are weak (no loops possible). But this is a nightmare both for the library and for the user. Understanding that one can't hold a Python object from Java (without manual connections) because of the side the starts the bridge will require a huge amount of knowledge on the users part. It is worse than ignoring because many users ignoring may never hit the memory leak, and this would actively cause the code to bust. No user will expect it so we will get constant reports of issues.

Proposed approach

Despite a ton of research for several years I have not found anything in terms of libraries that was satisfying. There are solutions for distributed garbage collection but those require leases which we really can't maintain locally, have no support for local distributed networks, and all would be excessively costly. Every time I research it I come up short, give up and conclude it would be nearly impossible to support. Nor have I found anyone that was interested in collaborating on this challenging task.

However, in my latest attempt I believe I have found a potential solution. There are two parts to this solution. The first is the "all weak reference system" in whcih every connection between the two languages is weak. Instead of forming a strong connection between the two languages, each language maintains a key/value list of the shared objects on its side with one strong reference to is own object a weak reference to the other side. Because of the global reference on its own side, the objects reference will never be cleaned, but the reference on the other side can be cleaned. We the place a hook on each side to traverse their lists of weak references. If the weak link is broken we drop the strong reference on that side. Thus memory gets freed. This solves the issue that the Python references we are creating are to hold proxies are not really adequate to keep the object alive. Now both sides can mind their own business and keep their objects alive properly.

Unfortunately this still doesn't deal at all with the reference counting loops. Neither side can explore the other. Java have NO traversal method what-so-ever and thus no way to establish there is a referencing problem. As our weak link set up keeps things alive due to the linkage form the global space we are no better off then before.

However, here is where I believe a solution lies. In Python we can scan locally for referencing relationships starting from the global list. Because Python has a traversal method, we can look to see if something contained from once of those Java to Python links proceeds to a Python to Java linkage. Previously this is where I thought there was no hope, without seeing the rest of the loop it will never be resolvable. What I missed before, was that when we find such a linkage, we can solve it by simply shifting the burden from the Python side where we can see the linkage to the Java side where the linkage is invisible but can be resolved. Whenever we find a Java to Python linked object that links to a Java to Python linkage, we simply remove the global reference the Java side a point its Python reference link to the actual linkage between the two objects. Thus that reference connection is now visible on the Java side. We notify the Python side of the linkage rearrangement so the it can reconfigure as well. At this point both side now see a potential reference loop. Unfortunately this still isn't enough because now one side has a pure weak reference to while the other side may still have a hard reference to keep the loop alive.

This is where I am not yet sure where to go. Image a series of horribly set up objects in which we zipper down back and forth to form a chain. If I try to make one one set a strong linkage, then the other side will simply garbage collect the end of the chain and everything falls apart.

Conclusion

This remains a very challenging problem. We can improve a portion of the system with an all weak reference system, but we can't fully resolve loops because Java does not support a traversal method. We may be able to follow member relationships or simple collections with reflection, but we don't yet have a universal solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions