Skip to content

8357471: GenShen: Share collector reserves between young and old #25357

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 38 commits into
base: master
Choose a base branch
from

Conversation

kdnilsen
Copy link
Contributor

@kdnilsen kdnilsen commented May 21, 2025

Genshen independently reserves memory to hold evacuations into young and old generations. We have found that under duress, it is sometimes difficult for mixed evacuations to make progress because the reserves in old are too small and we cannot expand old because young is running so frequently that it does not have the excess memory required to justify expansion of old (and shrinking of young).

This PR exploits the fact that the reserves in young are often much larger than young requires to carry out its anticipated next GC cycle. In this case, we can share the young collector reserves with the old generation. This allows much more effective operation of mixed evacuations when GC is running at or near its full capacity.

The following spreadsheet snapshots highlight the benefits of this change. In control with 6G heap size, we perform large numbers of mixed evacuations, but each mixed evacuation has very low productivity (e.g. one region at a time). This causes excessive delays in reclaiming the garbage from old, which is required to shrink old and expand young. This is why we see the large number of unproductive GC cycles, many of which degenerate and a few of which upgrade to full GC. In the experiment with 6G heap size, there are far fewer mixed cycles, but they are each much more productive. The total number of GC cycles decreases significantly.

image

With 7G heap size, the benefits of this PR manifest as a decrease in mixed evacuations, which also allows us to decrease total GC cycles. By more quickly reclaiming old garbage, we are able to more quickly expand young, which decreases the number of young GC cycles. This reduces CPU load. The impact on response times is not as significant as with the 6G heap size. We see slight improvement at p50-p99.9, with slight degradation at p99.99 through p100.

image

At 8G heap size, the GC is not at all stressed. We see approximately the same numbers of GC cycles, slight degradation of response times at p50-p99, slight improvement in response times at p99.9-p100.

image

The command line for these comparisons follows:

            ~/github/jdk.share-collector-reserves/build/linux-x86_64-server-release/images/jdk/bin/java \
                -XX:+UnlockExperimentalVMOptions \
                -XX:-ShenandoahPacing \
                -XX:+AlwaysPreTouch -XX:+DisableExplicitGC -Xms$s -Xmx$s \
                -XX:ShenandoahMinimumOldTimeMs=25 \
                -XX:ShenandoahFullGCThreshold=1024 \
                -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational \
                -Xlog:"gc*=info,ergo" \
                -Xlog:safepoint=trace -Xlog:safepoint=debug -Xlog:safepoint=info \
                -XX:+UnlockDiagnosticVMOptions \
                -jar ~/github/heapothesys/Extremem/src/main/java/extremem.jar \
                -dInitializationDelay=45s \
                -dDictionarySize=3000000 \
                -dNumCustomers=300000 \
                -dNumProducts=60000 \
                -dCustomerThreads=500 \
                -dCustomerPeriod=7s \
                -dCustomerThinkTime=1s \
                -dKeywordSearchCount=4 \
                -dServerThreads=5 \
                -dServerPeriod=5s \
                -dProductNameLength=10 \
                -dBrowsingHistoryQueueCount=5 \
                -dSalesTransactionQueueCount=5 \
                -dProductDescriptionLength=32 \
                -dProductReplacementPeriod=25s \
                -dProductReplacementCount=10 \
                -dCustomerReplacementPeriod=30s \
                -dCustomerReplacementCount=1000 \
                -dBrowsingExpiration=1m \
                -dPhasedUpdates=true \
                -dPhasedUpdateInterval=90s \
                -dSimulationDuration=25m \
                -dResponseTimeMeasurements=100000 \
                >$t.genshen.share-reserves.$r-evac-ratio.$s.out 2>$t.genshen.share-reserves.$r-evac-ratio.$s.err
            gzip $t.genshen.share-reserves.$r-evac-ratio.$s.out $t.genshen.share-reserves.$r-evac-ratio.$s.err

We have tested this patch through our performance pipeline. Both aarch64 and x86 show similar results, a small increase in concurrent evacuation on the graphchi benchmark, with slight improvements of other metrics on a number of other test workloads:

Genshen aarch64
-------------------------------------------------------------------------------------------------------
+16.35% graphchi/concurrent_evacuation p=0.00000
  Control:      1.895ms (+/-392.33us)        306
  Test:         2.205ms (+/-401.72us)        124

-33.43% specjbb2015/concurrent_marking_old p=0.00213
  Control:    513.923ms (+/-225.22ms)        338
  Test:       385.169ms (+/-231.25ms)         38

-28.58% specjbb2015/cm_parallel_mark_old p=0.00833
  Control:      1.022s  (+/-446.83ms)        333
  Test:       794.476ms (+/-440.83ms)         35

-25.31% crypto.aes/shenandoahfinalupdaterefs_stopped p=0.00000
  Control:      0.113ms (+/-  0.01ms)        285
  Test:         0.090ms (+/-  0.02ms)        158

-18.52% scimark.fft.small/shenandoahfinalupdaterefs_stopped p=0.00000
  Control:      0.106ms (+/-  0.01ms)        474
  Test:         0.090ms (+/-  0.01ms)        180

-15.29% hyperalloc_a3072_o2048/concurrent_marking_old p=0.00103
  Control:    384.599ms (+/- 76.47ms)        277
  Test:       333.581ms (+/- 89.51ms)         55

-15.28% hyperalloc_a3072_o2048/cm_total_old p=0.00105
  Control:    768.676ms (+/-152.94ms)        277
  Test:       666.786ms (+/-178.97ms)         55

-15.28% hyperalloc_a3072_o2048/cm_parallel_mark_old p=0.00105
  Control:    768.676ms (+/-152.94ms)        277
  Test:       666.786ms (+/-178.97ms)         55


Shenandoah aarch64
-------------------------------------------------------------------------------------------------------
-12.07% extremem-phased/update_references p=0.00050
  Control:    479.826ms (+/- 52.78ms)         23
  Test:       428.148ms (+/-  2.26ms)          3

Genshen x86
-------------------------------------------------------------------------------------------------------
+16.35% graphchi/concurrent_evacuation p=0.00000
  Control:      1.895ms (+/-392.33us)        306
  Test:         2.205ms (+/-401.72us)        124

-33.43% specjbb2015/concurrent_marking_old p=0.00213
  Control:    513.923ms (+/-225.22ms)        338
  Test:       385.169ms (+/-231.25ms)         38

-28.58% specjbb2015/cm_parallel_mark_old p=0.00833
  Control:      1.022s  (+/-446.83ms)        333
  Test:       794.476ms (+/-440.83ms)         35

-25.31% crypto.aes/shenandoahfinalupdaterefs_stopped p=0.00000
  Control:      0.113ms (+/-  0.01ms)        285
  Test:         0.090ms (+/-  0.02ms)        158

-18.52% scimark.fft.small/shenandoahfinalupdaterefs_stopped p=0.00000
  Control:      0.106ms (+/-  0.01ms)        474
  Test:         0.090ms (+/-  0.01ms)        180

-15.29% hyperalloc_a3072_o2048/concurrent_marking_old p=0.00103
  Control:    384.599ms (+/- 76.47ms)        277
  Test:       333.581ms (+/- 89.51ms)         55

-15.28% hyperalloc_a3072_o2048/cm_total_old p=0.00105
  Control:    768.676ms (+/-152.94ms)        277
  Test:       666.786ms (+/-178.97ms)         55

-15.28% hyperalloc_a3072_o2048/cm_parallel_mark_old p=0.00105
  Control:    768.676ms (+/-152.94ms)        277
  Test:       666.786ms (+/-178.97ms)         55


Shenandoah x86
-------------------------------------------------------------------------------------------------------
-12.07% extremem-phased/update_references p=0.00050
  Control:    479.826ms (+/- 52.78ms)         23
  Test:       428.148ms (+/-  2.26ms)          3



Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8357471: GenShen: Share collector reserves between young and old (Task - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25357/head:pull/25357
$ git checkout pull/25357

Update a local copy of the PR:
$ git checkout pull/25357
$ git pull https://git.openjdk.org/jdk.git pull/25357/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25357

View PR using the GUI difftool:
$ git pr show -t 25357

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25357.diff

Using Webrev

Link to Webrev Comment

This enables mixed evacuations to be more productive more quickly.
Before this commit, release build made reference to a variable that
is only present in debug builds.
At init update refs, we transfer remaining memory from Collector and
OldCollector partitions in order to allow more mutator allocations
to succeed during update-refs.  A conditional transfer is not reliable
because soft_max_capacity() constraints may cause the transfer to fail.
For non-generational mode, we don't have to set evacuation reserves.
It is computed by ShenandoahFreeSet::reserve_regions()
@bridgekeeper
Copy link

bridgekeeper bot commented May 21, 2025

👋 Welcome back kdnilsen! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented May 21, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented May 21, 2025

⚠️ @kdnilsen This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

@openjdk
Copy link

openjdk bot commented May 21, 2025

@kdnilsen The following labels will be automatically applied to this pull request:

  • hotspot-gc
  • shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot-gc hotspot-gc-dev@openjdk.org shenandoah shenandoah-dev@openjdk.org labels May 21, 2025
@kdnilsen kdnilsen changed the title Share collector reserves 8357471: GenShen: Share collector reserves between young and old May 21, 2025
@openjdk openjdk bot added the rfr Pull request is ready for review label May 21, 2025
@mlbridge
Copy link

mlbridge bot commented May 21, 2025

Webrevs

young_gen->increase_capacity(bytes_to_transfer);
old_gen->decrease_capacity(bytes_to_transfer);
const size_t new_size = young_gen->max_capacity();
log_info(gc)("Forcing transfer of %zu region(s) from %s to %s, yielding increased size: " PROPERFMT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be gc, ergo like the method to transfer regions to old?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Making this change.

@@ -193,6 +193,19 @@ void ShenandoahGenerationSizer::force_transfer_to_old(size_t regions) const {
regions, young_gen->name(), old_gen->name(), PROPERFMTARGS(new_size));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is now really only used for in-place promotions, can we change the log message to indicate the region is being promoted? I think when users see messages about things being "forced" in the log, they start to wonder if young/old sizes need to adjusted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually does have one other use. I'm refactoring to distinguish the different purposes. Replacing ShenandoahGenerationSizer log messages with log_develop_debug messages.

@@ -295,6 +295,7 @@ oop ShenandoahGenerationalHeap::try_evacuate_object(oop p, Thread* thread, Shena

if (copy == nullptr) {
// If we failed to allocate in LAB, we'll try a shared allocation.
#ifdef KELVIN_ORIGINAL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like debugging code? Should we back this out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I forgot to clean this up. Refined the comment and removed the #ifdef controls

@@ -1666,7 +1666,12 @@ void ShenandoahHeap::on_cycle_start(GCCause::Cause cause, ShenandoahGeneration*
set_gc_cause(cause);
set_gc_generation(generation);

generation->heuristics()->record_cycle_start();
if (mode()->is_generational()) {
// young-gen heuristics track young, bootstrap, and global GC cycle times
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an unrelated change. Mixing in bootstrap and global gc cycle times is likely to increase the average time and make the heuristics more aggressive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I will remove this code from this PR. (It will come back in a subsequent PR, to support adaptive old evac ratios. There, we consolidate the accounting for all gc in the "young heuristics", but we parameterize each phase so as to not bias triggering. Marking phase time is parameterized with bytes marked, for example,

@@ -276,6 +276,10 @@ class ShenandoahHeapRegion {
// Return adjusted max heap size
static size_t setup_sizes(size_t max_heap_size);

inline bool is_recycling() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Removing this.

@@ -374,26 +374,42 @@
"runs out of memory too early.") \
\
product(uintx, ShenandoahOldEvacRatioPercent, 75, EXPERIMENTAL, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phew, this is a lot of explanatory text and it reads like the target audience are GC developers. If we are going to expose this as a user configurable option, I think the help text should just explain how the behavior changes as this values goes up or down. Something like:

Increasing this allows for more old regions in mixed collections. Decreasing this reduces the number of old regions in mixed collections.

The first sentence makes it seem as though this is the percentage of the entire heap to reserve for old evacuations, but the next clarifies that this is the percentage of the collection set.

Question about this sentence:

A value of 100 allows a mixed evacuation to focus entirely on old-gen memory, allowing no young-gen regions to be collected.

With a setting of 100, would GenShen still "preselect" young regions of tenuring age with sufficient garbage into the collection set?

I also find the name of the option slightly confusing - is it a ratio? or a percentage? Seems like it's really a percentage (though it controls the ratio of reserves used for the collection).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for honest feedback. Way too many words. I've pared the description back and tried to clarify the
confusion you describe. Am also renaming to ShenandoahOldEvacPercent. Please let me know what you think.

void compute_old_generation_balance(size_t old_xfer_limit, size_t old_cset_regions);

// Transfers surplus old regions to young, or takes regions from young to satisfy old region deficit
TransferResult balance_generations();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we still using this TransferResult thing? Seems like we might be able to delete it with this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Deleting. Thanks.

"full GC events. Faiure to defragment old-gen memory can also " \
"result in unconstrained expansion of old-gen, and shrinkage of " \
"young gen, causing inefficient high frequency of young-gen GC.") \
product(uintx, ShenandoahOldEvacPercent, 75, EXPERIMENTAL, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much easier to read. Thank you for making the change!

heap->young_generation()->set_evacuation_reserve(young_evac_reserve - regions_transferred_to_old * region_size_bytes);
heap->old_generation()->set_evacuation_reserve(old_evac_reserve + regions_transferred_to_old * region_size_bytes);
}
heap->young_generation()->set_evacuation_reserve((size_t) (young_evac_bytes * ShenandoahEvacWaste));
Copy link
Member

@ysramakrishna ysramakrishna May 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are using the amount to be evacuated out of young (suitably marked up to account for waste) from the collection set of a specific cycle to predict the same for the next cycle? And similarly for the promotion bytes.

This seems reasonable, but how does that compare with using the live data identified in the most recent marking cycle instead? I can imagine that the former is more accurate under steady state assumptions and the latter is an overestimate to the extent that not all live data will be evacuated because it's in mostly live, i.e. densely live regions. However, it would be interesting to see how they compare and which tracks reality better. Since this is in the nature of a prediction/estimate, once can consider a control algorithm that tries to move the estimate closer based on minimizing some historical deviation between marked vs evacuated.

This need not be done here, but can be considered a future enhancement/experiment.

}
heap->young_generation()->set_evacuation_reserve((size_t) (young_evac_bytes * ShenandoahEvacWaste));
heap->old_generation()->set_evacuation_reserve((size_t) (old_evac_bytes * ShenandoahOldEvacWaste));
heap->old_generation()->set_promoted_reserve((size_t) (promo_bytes * ShenandoahPromoEvacWaste));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the census from the most recent mark provides both these bits of information, but doesn't account for other criteria (i.e. liveness denseness) that go into the exclusion of certain regions (and their objects) from the collection set. Therein lies the rub, but armed with historical numbers of each and reality, one might be able to predict this well (may be).

@openjdk openjdk bot removed the rfr Pull request is ready for review label Jun 9, 2025
@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 12, 2025
@bridgekeeper
Copy link

bridgekeeper bot commented Jun 28, 2025

@kdnilsen This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-gc hotspot-gc-dev@openjdk.org rfr Pull request is ready for review shenandoah shenandoah-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

3 participants