[Mar 31] Fast Conservative Garbage Collection #305

ayakayorihiro · 2022-03-24T15:53:10Z

ayakayorihiro
Mar 24, 2022

This is a thread for Fast Conservative Garbage Collection by Rifat Shahriyar, Stephen M. Blackburn, and Kathryn S. McKinley that we will be discussing in our last class before Spring Break 🌞

@5hubh4m and I (@ayakayorihiro ) will be leading the discussion, please post your thoughts and questions here!

anshumanmohan · 2022-03-30T19:24:26Z

anshumanmohan
Mar 30, 2022

The authors explore a new space -- ambiguous roots + RC -- and find surprisingly good results. This is super neat, but I wonder if some empirical fact/trend in the benchmarks is quietly supporting these results. I think it would be super cool to figure out what this is, if any. Far from weakening these results, I think that finding this out would teach us something about the way we tend to write programs. For example, the generational hypothesis not only helps generational GC, but also reveals an emergent programming pattern.

1 reply

sampsyo Mar 30, 2022
Maintainer

Good point! I’m not exactly sure what this property would be, but it may be spatial locality (i.e., programs tend to access nearby addresses nearby in time—the same property that is exploited by CPU cache lines). That locality certainly plays some sort of role in the Immix block organization.

tonyjie · 2022-03-31T00:14:59Z

tonyjie
Mar 31, 2022

This paper is also very long and contains lots of GC-specific terms and techniques, but I'm trying to grasp the main insight: combining RC with Conservative collector to implement a high performance conservative collector for managed languages. As naive conservative collector falls into tracing algorithms, this advanced GC is also a combination of tracing and RC, which was mentioned in Tuesday's Discussion.

So I'm curious that if this method lies in the design space that we discussed on Tuesday? If it is, can we evaluate this high-performance GC using those metrics (time-space trade-off, ... etc).

1 reply

5hubh4m Mar 31, 2022

This paper is very long indeed. And extremely wordy and dry.

You are right that one of their main contributions was the world’s first conservative RC collector.

atucker · 2022-03-31T01:13:54Z

atucker
Mar 31, 2022

I worry that I'm missing something fairly basic about conservative GC -- is the main reason to do it just that you don't need to reason about what happens to your pointers while you're transforming the code? I understand the point that if you're in a memory unsafe language you're forced to do conservative GC, but I'm wondering if there's more to the story.

2 replies

5hubh4m Mar 31, 2022

According to this paper, exact GC is sometimes impossible, but other times, requires a large engineering effort to implement the right kind of type tracking in the compiler or runtime — which might even have high performance overhead.

So they’re trying to pose the question of “what if conservative GC had a comparable performance to exact GCs” because in that world you can get away with not having to ensure that the runtime can track each thing in the stack exactly.

sampsyo Mar 31, 2022
Maintainer

It’s true that the paper kinda skips over this motivation!

andrewb1999 · 2022-03-31T03:00:40Z

andrewb1999
Mar 31, 2022

I found it surprising that language designers would choose a conservative garbage collector purely for ease of implementation, but in hindsight this makes a lot of sense. The initial 1.0 release of Go used a simple conservative tracing garbage collector, which was eventually refined along with the rest of the language to the precise, concurrent, mark-sweep collector that is used today.

I was also a bit surprised to see that one of the concerns with conservative garbage collection is speed. I expected conservative garbage collection to be a tradeoff between memory usage and performance, but it seems that conservative garbage collectors can perform significantly worse than precise collectors. My guess is this has to do with needing to trace extra paths of values that look like pointers but are actually integers, is that correct?

1 reply

5hubh4m Mar 31, 2022

Great point about Go!

IMO one of the big contributions of the paper was showing people how and why conservative collectors have a performance disadvantage compared to exact collectors. I think most of this overhead is coming from giving up on compaction which worsens cache characteristics of the mutator.

gsvic · 2022-03-31T04:51:07Z

gsvic
Mar 31, 2022

I found the object map proposal interesting, as in my understanding is what actually makes the conservative reference count possible. I understand that it is used in order to filter ambiguous references, but regarding its structure, it's not really clear to me if it's just a bitmap, or something more sophisticated.

I am not sure that this is an accurate correlation, but bloom-filters are also backed-up by bitmaps, and support some kind of probabilistic filtering. If possible, would a bloom-filter work for filtering-out ambiguous references? Looks like there are some approaches that use bloom-filters in order to mark live objects:
https://www.usenix.org/system/files/conference/fast17/fast17-douglis.pdf

0 replies

charles-rs · 2022-03-31T06:25:03Z

charles-rs
Mar 31, 2022

One thing that really surprised me about this paper was just how important locality is!

The performance difference between their collectors and BDW was attributed by the authors almost entirely to locality, and it was an 8% difference. I wonder if this could mean that it's possible for a collected language to beat a manually managed language? At least in C/C++ if you want to defrag your heap, you don't really have any options other than manually freeing and reallocating everything, which is kind of unrealistic.

2 replies

sampsyo Mar 31, 2022
Maintainer

For a kind of mindblowing approach to defragmentation in C/C++, see Mesh!

5hubh4m Mar 31, 2022

My mind is blown.

JonathanDLTran · 2022-03-31T06:26:53Z

JonathanDLTran
Mar 31, 2022

I was found the division of the heap into blocks of data and specific lines an interesting strategy, which I understood to be one way to avoid wasting space while pinning objects. I was wondering if there was a reason the block size was chosen specifically at 32KB and the line size to be 256 Bytes. I was curious if in other works, the authors had tried other line sizes and found that a 256 byte line size was the best empirically on the chosen benchmarks.

0 replies

yy665 · 2022-03-31T12:46:25Z

yy665
Mar 31, 2022

This is a very long paper and has a lot of in-depth GC terms. Maybe I am not well versed in this topic so I am still a bit confused. I am sorry if this is a bad question. What’s the intuition here that makes conservative + RC a combination that stands out? In general, how do people pick their design for GC over a wide range of different technique and design choices?

1 reply

susan-garry Apr 14, 2022

At least for this paper, it was my understanding that the authors were drawn to conservative collection because it is much easier to implement than exact collectors, which also incur additional space overhead because they are doing extra bookkeeping to keep track of which stack addresses are reference pointers.

As for reference counting, I think that the authors opt to only perform reference counting on mature objects. They attribute the reference counting of mature objects to an increase in performance over the vanilla generational collectors we discussed in class because collecting the mature space generally does not happen very often, but assuming that references to objects in the mature space will generally stay constant, reference counting will incur relatively low overheads while ensuring that mature objects can be freed.

chhzh123 · 2022-03-31T13:48:27Z

chhzh123
Mar 31, 2022

I'm not sure if I lack some background knowledge, feeling like I sometimes get lost when reading this paper. Considering the performance and overhead of the proposed GC, I don't know what conservative GC is used for especially in the scenario of managed programming language. It seems like conservative GC is not as good as exact GCs. Are there any industrial PLs that use this kind of conservative GC?

0 replies

zzzDavid · 2022-04-01T05:05:33Z

zzzDavid
Apr 1, 2022

One thing that I find interesting from this paper is how GC can affect the spatial locality of the program's data. Section 6.1 shows that the free-list collectors cause higher cache miss rate because it spreads objects in the address space instead of putting them contiguously. I think this is an important thing to keep in mind while implementing garbage collectors, that the layout of alive objects can affect the mutator's locality.

0 replies

alaiasolkobreslin · 2022-04-07T14:21:41Z

alaiasolkobreslin
Apr 7, 2022

I’m not sure if I was just not understanding this paper that well, but I’m not totally understanding when conservative garbage collection would be favorable to exact GC? I did a quick search to see if there are any programming languages that use conservative garbage collection, and couldn’t find any. But then I stumbled across an implementation of the Boehm-Demers-Weiser (BDW) collector and its known clients. This list is surprisingly long, and some notable compilers include p4c and the Racket compiler. I’m curious as to why these compilers decided to use BDW rather than an exact garbage collector.

1 reply

sampsyo Apr 7, 2022
Maintainer

One strong reason to use that collector is that it is a "drop-in." You can generate code that just uses malloc willy-nilly and doesn't even keep track of where pointers are, and then just link in the Boehm collector and be done with it. Using/implementing a precise collector requires at least recording and communicating where all the roots & edges are in your heap, which can take a lot of effort in itself (and even come at a performance cost).

[Mar 31] Fast Conservative Garbage Collection #305

Uh oh!

Replies: 11 comments · 9 replies

Uh oh!

Uh oh!

Uh oh!

sampsyo Mar 30, 2022 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sampsyo Mar 31, 2022 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sampsyo Mar 31, 2022 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sampsyo Apr 7, 2022 Maintainer

Replies: 11 comments 9 replies

sampsyo Mar 30, 2022
Maintainer

sampsyo Mar 31, 2022
Maintainer

sampsyo Mar 31, 2022
Maintainer

sampsyo Apr 7, 2022
Maintainer