-
Couldn't load subscription status.
- Fork 197
Example of Cross Language Garbage Collection #1296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| #def gc_callback(event, details): | ||
| # if event == "start": | ||
| # generation = details.get("generation", "unknown") | ||
| # print(f"GC cycle started for generation {generation}") | ||
| # elif event == "stop": | ||
| # generation = details.get("generation", "unknown") | ||
| # print(f"GC cycle ended for generation {generation}") | ||
| # print(helloworld.inspect_gc_generations()) |
Check notice
Code scanning / CodeQL
Commented-out code Note test
| def __init__(self): | ||
| self.a = helloworld.Foreign() | ||
| self.b = helloworld.Foreign() | ||
| pass |
Check warning
Code scanning / CodeQL
Unnecessary pass Warning test
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1296 +/- ##
==========================================
+ Coverage 86.65% 91.26% +4.61%
==========================================
Files 112 27 -85
Lines 10255 2359 -7896
Branches 4019 0 -4019
==========================================
- Hits 8886 2153 -6733
+ Misses 762 206 -556
+ Partials 607 0 -607 🚀 New features to boost your workflow:
|
|
This is definitely something I'll need to sit down with a cup of coffee early in the morning and read through to digest. I will hopefully have some decent input at some point in the future and will try to keep this marked as unread in my notifications (so it will bother me). This is mostly just an acknowledgement. |
|
Out of curiosity, why did you change your coding style? I thought you didn't like K&R (I guess it's actually one true brace) |
|
I absolutely detest K&R due to a reading disorder that make K&R very hard to follow. However, I didn't write that code... it was entirely written by an AI using a specification and some edits from me. I provided a boiler plate that tried to keep it from using K&R but no mater what I do it will fall back to K&R once I get too deep in the conversation. As I have to keep using meld to compare pieces from that it wrote as I update the specification, the easiest thing is to keep it in K&R until it meets all specifications then reformat it once when done. |
|
Still not getting much visility for this in ideas. I need other language bindings such as android and C# to chime in. I think there is a good way to get a cleaner api. Rather than exposing query functions that only work at specific stages and may have a different meaning in Pythons polymorphic gc. Instead I can have the call back receive a struct with the relevant information such as what operations are valid. This will make it clear what operations are allowed at any point and if the operation changes implementation it can supply the relevant function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran out of steam already, will continue reading tomorrow morning.
| * - The Java object is now held alive only by the weak reference | ||
| * originating from Python. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that the Java garbage collector is free to collect it though. Is this supposed to imply that you know that there is another existing Java strong reference somewhere that is being tracked such that the java object won't be collected on you? Or, is this supposed to be occuring when python gc occurs and python no longer holds the java object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If python doesnt need something it transfers the segment to Java. From there we have a few cases. There may be a link in java ether because java directly links it or Python is holding a link indirectly. If python holds no references and java holds no references it was a loop and java collects its half on the next gc. Python will recieve a notice and the decref will immediately take out the rest of loop.
| * - **Case 2:** Weak Links | ||
| * - If the Java object is held alive only by a weak link from Python, it will | ||
| * be garbage collected by Java once Python removes its reference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A weak reference does not keep the object alive though. Am I just confusing the terminology of a weak link with a weak reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally we have one strong link running from python to java. This is replaced with two weak links running between the languages and one strong link being held on by the corresponding reference manager. The critical difference is the ownership of the strong reference is now transferable.
As the refernce manager for java is in java and the reference manager fot python is in python, the cross language references end up being two weak references that cant hold anything alive without the aid of a centralized reference manager. Hence "double weak" linkage between languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh ok, I was missing that there was a strong reference somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is difficult to picture how it can operate if there is only weak links. But it is structured after DGC which does not have any concept of strong references across a boundary. Instead it requires a symmetric system in which two weak connections are shared between shared objects and a reference entity is held by the manager at least until such times as one side has released its claim. When that happens the reference manager transfers control to the connections that still support it if any. Thus although there is no way to probe Java's internal structure (which I have been trying for a long while to do), instead we transfer those connections from Python to Java. Java can then analyze them and break them as needed.
The first step of this process is that I need hooks in the Python gc module to allow me a defined way to always be last in the generation 2 gc cycle because that is the only time that we can analyze the connection structures. The second step is I need to reformulate JPype internals such that I can efficiently manage the references. It must all be O(1) but exactly how to pull it off properly is challenge. Once I have those two pieces in play I can support full seamless integration between Python and Java in both directions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I thought this was hopeless for a long time is.
- There is no possibility of safe traversal of Java objects. You can use byte code asm to add traversal, but many primitives in the Java collections commons are native meaning the only way to add it would be to have a special version on Java.
- The Python gc system is have no method of rescuing objects that are doomed to die meaning somehow you have to produce an zombie state in which an object has been tagged for collection, but is still allowed to live on with all it components.
- The Java method for saving an object is deprecated (finalize) because it was fundamentally broken.
- The tools that I am given that are supported by both languages are just weak references which you can get a notification that there was a break, but nothing you can do about it.
Given the very limited options that both languages provide, I had tried three times to find others who have tackled and solved the problem or any reference of how to accomplish such a task. DGC does something similar but it is mostly based on time leases. The break through is on my third attempt I realized that I can manipulate Python gc to wedge object into the last slot in the GC cycle. That gave me the ability to see relationships, report them, and then rescue objects in Python. Hence my proposal to the Python core developers.
Unfortunately, to get support from Python core that this should be a feature not just for my library, but for many language bridges, I need broad support from many interested parties. Android and C# seem like the best candidates, though some of the GUI libraries likely also face the same problems.
That doesn't mean that I won't try to support this for JPype even if I don't get my hook. I am well capable of hacking Python internals in ways that the Python core developers would be aghast. The problem being that if I do so we are going to have another round of some developer who was never required to submit a PEP nor with any regard to how something functions may decide "lets get another 2% optimization here" and destroys the foundation I am working on. That is how the removed both the ability to add memory before AND after the object. The after was supposed to be the sole domain of the object as per the contract for alloc and free. They broke it. I would much prefer we do this cooperatively. But that requires core developers to reach into the trash can of Python Ideas forum and realize that this is not a niche application thing, but an advancement in language bridge capabilities.
| * - **Case 3:** Synchronization Delays | ||
| * - If Python and Java garbage collection cycles are not synchronized, there | ||
| * may be slight delays in cleanup. This is not critical but could be | ||
| * optimized. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a problem and I would argue that this is how garbage collection works. Things aren't supposed to be cleaned up right away as opposed to c++ destructors and rust. Part of the advantage of garbage collection is reduced memory fragmentation because things allocated together usually get cleaned up together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concern of the ai is that typically it will take up to 3 gc cycles to get a loop entirely free assuming weak references don't actively report their breakage. Longer for complex structures.
If we have active breakage reporting, then it takes one python cycle and one java cycle to clear up shared resources.
|
I have made a lot of progress on this topic. But also hit theory issue. While there is nothing wrong with my approach it touches a live wire in theory. If I can make two gc operate asynchronous that implies we can make N independent gc in an acyclic distributed computing cloud also asynchronous gc. Here is where the priblems start. Like saying I found a p solution for an np hard problem it would mean the solution is undoubtedly wrong. Asynch DGC not known to be np hard but it is viewed as impossible which is a similar huge leap though unlike np hard there are problems where impossible is more based on practicality. Tracking live state in a changing graph for example. There are some theories that state such asynchronous dgc is possible but do not have known practical implementations and they have strict properties that must be met for correctness. If not then the general concensus problem holds which does not favor this as correct. We partially satisfied the key property required BUT it is only partially and the theory proof is an "if and only if". Unfortunately that level of theory is outside my depth so I am struggling. Before I can safely use this i think was need some type of correctness proof. |

This is a demonstration for discussion about how we may be able to support integrated garbage collection between Java and Python. It is foundational to the Java to Python bridge.
I am trying to push to have this as an API in some unspecified future Python version, but that won't help us with the many versions of Python already out there. Instead I have taken to writing a dirty hack of the Python GC system which navigates the dangerous waters of their mutating generational gc to produce the same effect without actually modifying Python.
Anyone who has interest in this topic please place a review or join the discussion.
@marscher @astrelsky