Skip to content

8358880: Performance of parsing with DecimalFormat can be improved #25644

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

j3graham
Copy link
Contributor

@j3graham j3graham commented Jun 4, 2025

This PR replaces construction of intermediate strings to be parsed with more direct manipulation of numbers. It also has a more streamlined mechanism of handling Long.MIN_VALUE when parsing longs by using Long.parseUnsignedLong

As a small side-effect it also eliminates the use of a cached StringBuilder in DigitList.

Testing:

  • GHA
  • Local run of tier 2 and jtreg:jdk/java/text
  • New benchmark: DecimalFormatParseBench

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 2 Reviewers)

Issue

  • JDK-8358880: Performance of parsing with DecimalFormat can be improved (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25644/head:pull/25644
$ git checkout pull/25644

Update a local copy of the PR:
$ git checkout pull/25644
$ git pull https://git.openjdk.org/jdk.git pull/25644/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25644

View PR using the GUI difftool:
$ git pr show -t 25644

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25644.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 4, 2025

👋 Welcome back j3graham! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jun 4, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Jun 4, 2025

@j3graham The following labels will be automatically applied to this pull request:

  • core-libs
  • i18n

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added core-libs core-libs-dev@openjdk.org i18n i18n-dev@openjdk.org labels Jun 4, 2025
@j3graham
Copy link
Contributor Author

j3graham commented Jun 4, 2025

Rough performance results on AArch64 M4:

Original
Benchmark                                   Mode  Cnt      Score     Error   Units
-DecimalFormatParseBench.testParseDoubles  thrpt   15  15200.984 ± 409.547  ops/ms
-DecimalFormatParseBench.testParseLongs    thrpt   15  25777.899 ± 559.096  ops/ms

This PR
Benchmark                                   Mode  Cnt      Score     Error   Units
+DecimalFormatParseBench.testParseDoubles  thrpt   15  28041.325 ± 472.657  ops/ms
+DecimalFormatParseBench.testParseLongs    thrpt   15  34181.146 ± 655.719  ops/ms

temp.append("0".repeat(Math.max(0, decimalAt - count)));
return Long.parseLong(temp.toString());
long pow10 = Math.powExact(10L, Math.max(0, decimalAt - count));
return Math.multiplyExact(v, pow10);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two methods throw ArithmeticException. This needs to be rethrown as NumberFormatException.

Copy link
Contributor Author

@j3graham j3graham Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a little odd. The parse methods that call getLong are not supposed to throw NumberFormatException either. So wherever getLong is called, it must be preceded by a check to fitsIntoLong, which should avoid any exceptions here. That said, rethrowing as NFE would avoid new surprises. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will leave this question to I18N reviewers, who are ultimately in charge of DigitList.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing implementation does not throw NumberFormatException/ArithmeticException, but ParseException if parsing is failing. I would expect the same here.

Copy link
Contributor Author

@j3graham j3graham Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not seeing where the original could throw ParseException.

Copy link
Member

@naotoj naotoj Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I was unclear. I mean the parse() in the NumberFormat do not throw NumberFormatException/ArithmeticException, but ParseException, so if this piece of code need to throw something, it should be ParseException

Copy link
Contributor Author

@j3graham j3graham Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parse() methods where this code gets used in DecimalFormat unfortunately don't throw ParseException. The current calls to getLong always are guarded with a call to fitsIntoLong, which should avoid any exceptions actually being thrown here. So there is no parse failure as such - instead it tries to parse it as a double or a BigDecimal. If getLong were to be called without the guard, the exception would have come from Long.parseLong, which throws a NumberFormatException.

I've added a commit to follow @liach's suggestion to at least handle the ArithmeticException so as to not introduce new exceptions into the mix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sounds reasonable.

@j3graham j3graham changed the title Improve performance of parsing with DecimalFormat 8358880: Performance of parsing with DecimalFormat can be improved Jun 9, 2025
@ALUMINIS650
Copy link

ALUMINIS650 commented Jun 9, 2025

Hi @ALUMINIS650, thanks for making a comment in an OpenJDK project!

All comments and discussions in the OpenJDK Community must be made available under the OpenJDK Terms of Use. If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please Use "Add GitHub user ALUMINIS650" for the summary.

If you are not an OpenJDK Author, Committer or Reviewer, simply check the box below to accept the OpenJDK Terms of Use for your comments.

Your comment will be automatically restored once you have accepted the OpenJDK Terms of Use.

@j3graham j3graham marked this pull request as ready for review June 9, 2025 17:10
@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 9, 2025
@mlbridge
Copy link

mlbridge bot commented Jun 9, 2025

Webrevs

Copy link
Member

@justin-curtis-lu justin-curtis-lu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvements. I think we need to prioritize behavioral compatibility with this change, so we will want to run the JCK tests as well for the extra safety.

Copy link
Member

@justin-curtis-lu justin-curtis-lu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On our CI, I did a java_text JCK run as well as tiers 1-3 on all platforms which both came back good. I just have a final comment.

@@ -1824,6 +1837,17 @@ private static BinaryToASCIIConverter getBinaryToASCIIConverter(float f) {
return buf;
}

static ASCIIToBinaryConverter readDoubleSignlessDigits(int decExp, char[] digits, int length) {
if (decExp < MIN_DECIMAL_EXPONENT) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check needed? I think ASCIIToBinaryConverter will return the proper zero value when doubleValue() is invoked.

  if (decExponent < MIN_DECIMAL_EXPONENT - 1) {
      return (isNegative) ? -0.0 : 0.0;

And if this explicit check is a shortcut, I don't think we would need one for an edge case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately some check is required (a test fails), but I now realize what I had was wrong. The issue is that on line 1084 (https://github.com/openjdk/jdk/pull/25644/files#diff-79e6fd549b5ec5e7f49658581beddcb07fcbb0c09ae8e1117c385b66514da6d2R1084)) exp can overflow and become positive again. I've updated the check to avoid the overflow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it, I see your point. We would have goten underflow in ASCIIToBinaryConverter.doubleValue() for some extreme cases without a check.

Is there a specific example you have that requires the switch to the newer check? Adding a comment along those lines might be helpful. Actually, I thought DigitList caps decimalAt to Integer.MIN/MAX, so then the first check you had would have been fine. (Maybe I am missing something?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a specific example, so I've reverted to my original check. I'm a bit unsettled by the check for an extreme value later in doubleValue() comparing against MIN_DECIMAL_EXPONENT - 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the original check you had is easier to understand what is happening without further context, so I prefer your switch back.

I think we are fine from (negative) "extreme values" in doubleValue() because of the check you have implemented in the first place. i.e. we avoid any potential underflow from int exp = decExponent - kDigits;. I think we do need a comment to accompany the check. (Why do we check? why not check the max exponent value?)

Also, should the check be against MIN_DECIMAL_EXPONENT - 1 for consistency with doubleValue()? (Functionally, I don't think it matters.)

@jddarcy
Copy link
Member

jddarcy commented Jun 17, 2025

/reviewers 2 reviewer

@openjdk
Copy link

openjdk bot commented Jun 17, 2025

@jddarcy
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 2 Reviewers).

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 15, 2025

@j3graham This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

Copy link
Member

@liach liach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable from a code cleanup point of view.

Copy link
Member

@justin-curtis-lu justin-curtis-lu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current form looks good to me. The long parsing did not change much on my machine performance wise, but I think it is a good simplification to include.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org i18n i18n-dev@openjdk.org rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

6 participants