Skip to content

8360520: RISC-V: C1: Fix primitive array clone intrinsic regression after JDK-8333154 #25976

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

feilongjiang
Copy link
Member

@feilongjiang feilongjiang commented Jun 25, 2025

Hi, please consider.
JDK-8333154 Implemented C1 clone intrinsic that reuses arraycopy code for primitive arrays for RISC-V.
The new instruction flag OmitChecksFlag (introduced by JDK-8302850) is used to avoid instantiation of array copy stubs for primitive array clones.
If OmitChecksFlag is set, all flags (including the unaligned flag) will be cleared before generating the LIR_OpArrayCopy node.
This may lead to incorrect selection of the arraycopy function when -XX:+UseCompactObjectHeaders is enabled, causing the unaligned flag to be set for arraycopy.
We observed performance regression on P550 SBC through the corresponding JMH tests when COH is enabled.

This pr keeps the unaligned flag on RISC-V to ensure the arraycopy function is selected correctly.
The other platforms are not affected as the flag is always 0 when OmitChecksFlag is true.

Test on linux-riscv64:

  • Tier1-3

JMH data on P550 SBC for reference (w/o and w/ the patch):

Before:

Without COH:

Benchmark                 (size)  Mode  Cnt     Score   Error  Units
ArrayClone.byteArraycopy       0  avgt   15    50.854 ± 0.379  ns/op
ArrayClone.byteArraycopy      10  avgt   15    74.294 ± 0.449  ns/op
ArrayClone.byteArraycopy     100  avgt   15    81.847 ± 0.082  ns/op
ArrayClone.byteArraycopy    1000  avgt   15   480.106 ± 0.369  ns/op
ArrayClone.byteClone           0  avgt   15    90.146 ± 0.299  ns/op
ArrayClone.byteClone          10  avgt   15   130.525 ± 0.384  ns/op
ArrayClone.byteClone         100  avgt   15   251.942 ± 0.122  ns/op
ArrayClone.byteClone        1000  avgt   15   407.580 ± 0.318  ns/op
ArrayClone.intArraycopy        0  avgt   15    49.984 ± 0.436  ns/op
ArrayClone.intArraycopy       10  avgt   15    76.302 ± 1.388  ns/op
ArrayClone.intArraycopy      100  avgt   15   267.487 ± 0.329  ns/op
ArrayClone.intArraycopy     1000  avgt   15  1157.444 ± 1.588  ns/op
ArrayClone.intClone            0  avgt   15    90.130 ± 0.257  ns/op
ArrayClone.intClone           10  avgt   15   183.619 ± 0.588  ns/op
ArrayClone.intClone          100  avgt   15   296.491 ± 0.246  ns/op
ArrayClone.intClone         1000  avgt   15   828.695 ± 1.501  ns/op

-------------------------------------------------------------------------
With COH:

Benchmark                 (size)  Mode  Cnt       Score      Error  Units
ArrayClone.byteArraycopy       0  avgt   15      50.667 ±    0.622  ns/op
ArrayClone.byteArraycopy      10  avgt   15      76.917 ±    0.914  ns/op
ArrayClone.byteArraycopy     100  avgt   15      82.928 ±    0.056  ns/op
ArrayClone.byteArraycopy    1000  avgt   15     485.806 ±    0.653  ns/op
ArrayClone.byteClone           0  avgt   15      90.417 ±    1.059  ns/op
ArrayClone.byteClone          10  avgt   15    1634.691 ±    9.870  ns/op
ArrayClone.byteClone         100  avgt   15   18637.149 ±   30.985  ns/op
ArrayClone.byteClone        1000  avgt   15  193437.253 ±  435.771  ns/op
ArrayClone.intArraycopy        0  avgt   15      50.475 ±    0.545  ns/op
ArrayClone.intArraycopy       10  avgt   15      77.515 ±    0.958  ns/op
ArrayClone.intArraycopy      100  avgt   15     264.586 ±    0.237  ns/op
ArrayClone.intArraycopy     1000  avgt   15    1160.459 ±    1.394  ns/op
ArrayClone.intClone            0  avgt   15      90.776 ±    0.309  ns/op
ArrayClone.intClone           10  avgt   15    7794.589 ±   13.752  ns/op
ArrayClone.intClone          100  avgt   15   77303.097 ±  154.991  ns/op
ArrayClone.intClone         1000  avgt   15  773291.729 ± 1505.788  ns/op

After:

Without COH:

Benchmark                 (size)  Mode  Cnt     Score   Error  Units
ArrayClone.byteArraycopy       0  avgt   15    49.421 ± 0.588  ns/op
ArrayClone.byteArraycopy      10  avgt   15    71.687 ± 0.828  ns/op
ArrayClone.byteArraycopy     100  avgt   15    82.570 ± 0.068  ns/op
ArrayClone.byteArraycopy    1000  avgt   15   478.411 ± 0.505  ns/op
ArrayClone.byteClone           0  avgt   15    90.660 ± 0.314  ns/op
ArrayClone.byteClone          10  avgt   15   131.243 ± 0.407  ns/op
ArrayClone.byteClone         100  avgt   15   251.823 ± 0.192  ns/op
ArrayClone.byteClone        1000  avgt   15   404.857 ± 1.985  ns/op
ArrayClone.intArraycopy        0  avgt   15    49.672 ± 0.466  ns/op
ArrayClone.intArraycopy       10  avgt   15    78.996 ± 1.522  ns/op
ArrayClone.intArraycopy      100  avgt   15   263.690 ± 0.175  ns/op
ArrayClone.intArraycopy     1000  avgt   15  1155.155 ± 2.549  ns/op
ArrayClone.intClone            0  avgt   15    90.495 ± 0.296  ns/op
ArrayClone.intClone           10  avgt   15   184.500 ± 0.554  ns/op
ArrayClone.intClone          100  avgt   15   294.608 ± 0.139  ns/op
ArrayClone.intClone         1000  avgt   15   817.005 ± 0.551  ns/op

-------------------------------------------------------------------------

With COH:
Benchmark                 (size)  Mode  Cnt     Score   Error  Units
ArrayClone.byteArraycopy       0  avgt   15    51.322 ± 0.519  ns/op
ArrayClone.byteArraycopy      10  avgt   15    76.479 ± 0.679  ns/op
ArrayClone.byteArraycopy     100  avgt   15    82.936 ± 0.060  ns/op
ArrayClone.byteArraycopy    1000  avgt   15   487.030 ± 0.464  ns/op
ArrayClone.byteClone           0  avgt   15    89.688 ± 0.276  ns/op
ArrayClone.byteClone          10  avgt   15   109.446 ± 0.379  ns/op
ArrayClone.byteClone         100  avgt   15   221.747 ± 0.176  ns/op
ArrayClone.byteClone        1000  avgt   15   430.846 ± 0.370  ns/op
ArrayClone.intArraycopy        0  avgt   15    50.534 ± 0.524  ns/op
ArrayClone.intArraycopy       10  avgt   15    78.986 ± 1.341  ns/op
ArrayClone.intArraycopy      100  avgt   15   263.473 ± 0.168  ns/op
ArrayClone.intArraycopy     1000  avgt   15  1155.394 ± 1.396  ns/op
ArrayClone.intClone            0  avgt   15    89.698 ± 0.217  ns/op
ArrayClone.intClone           10  avgt   15   185.278 ± 0.673  ns/op
ArrayClone.intClone          100  avgt   15   375.374 ± 0.200  ns/op
ArrayClone.intClone         1000  avgt   15   872.398 ± 1.780  ns/op

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8360520: RISC-V: C1: Fix primitive array clone intrinsic regression after JDK-8333154 (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25976/head:pull/25976
$ git checkout pull/25976

Update a local copy of the PR:
$ git checkout pull/25976
$ git pull https://git.openjdk.org/jdk.git pull/25976/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25976

View PR using the GUI difftool:
$ git pr show -t 25976

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25976.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 25, 2025

👋 Welcome back fjiang! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jun 25, 2025

@feilongjiang This change is no longer ready for integration - check the PR body for details.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 25, 2025
@openjdk
Copy link

openjdk bot commented Jun 25, 2025

@feilongjiang The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Jun 25, 2025
@mlbridge
Copy link

mlbridge bot commented Jun 25, 2025

Webrevs

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me. You need another reviewer.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 27, 2025
@feilongjiang
Copy link
Member Author

Since this changed C1 shared code, can I have another reviewer, please? Maybe the original author of this work @galderz @rwestrel could take a look?

@galderz
Copy link
Contributor

galderz commented Jun 27, 2025

I can't really review it since I'm not familiar with neither riscv, nor the flag nor the COH logic.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Jul 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

3 participants