Skip to content

fix(firestore): Further improved performance of UTF-8 string comparison logic #7098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 35 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
5f3ad6f
Utf8Performance{Unit/Integration}Test.kt added
dconeybe Jun 27, 2025
3d231d3
new algorithm skeleton added
dconeybe Jun 27, 2025
ab55ee3
add code to fdc demo app so it can be built in release mode
dconeybe Jun 27, 2025
4b45cd2
fix signing for release
dconeybe Jun 27, 2025
ba905d4
add Util.java from Firestore
dconeybe Jun 30, 2025
cd29244
Utf8PerformanceIntegrationTest.kt added
dconeybe Jun 30, 2025
bc51956
MainViewModel.kt: call the performance test
dconeybe Jun 30, 2025
57a0bab
run the test
dconeybe Jun 30, 2025
a7a99ef
improve tests
dconeybe Jun 30, 2025
733a1fb
Long -> Duration
dconeybe Jun 30, 2025
28f0766
remove firebase dependencies
dconeybe Jun 30, 2025
8127e5a
enable r8
dconeybe Jun 30, 2025
e3f682d
start test button and Trace statements
dconeybe Jun 30, 2025
e8b455c
tweaks
dconeybe Jun 30, 2025
e0d90ca
Util1.java Util2.java and Util3.java added
dconeybe Jul 1, 2025
b92f6ed
fix build
dconeybe Jul 1, 2025
8cc3206
Util4 added, but doesn't support SMP yet.
dconeybe Jul 2, 2025
958954d
Util4: Got SMP done, just need to handle invalid surrogate pairs
dconeybe Jul 2, 2025
28054c0
work
dconeybe Jul 3, 2025
c4293a1
Utf8CompareTest.kt added
dconeybe Jul 3, 2025
76365ad
fix test
dconeybe Jul 3, 2025
a1625fb
fixes
dconeybe Jul 3, 2025
11308be
Utf8PerformanceIntegrationTest.kt: fix
dconeybe Jul 3, 2025
f9f57d0
revert all changes, except Utf8Compare.java
dconeybe Jul 3, 2025
933336b
Util.java: Remove compareUtf8Strings() and switch to the one from Utf…
dconeybe Jul 3, 2025
3612d6b
CHANGELOG.md entry added
dconeybe Jul 3, 2025
29bedd8
Merge branch 'main' into dconeybe/firestore/Utf8StringComparePerforma…
dconeybe Jul 3, 2025
0cf7a45
./gradlew :firebase-firestore:spotlessApply
dconeybe Jul 3, 2025
83ffa93
fix off-by-one error in utf8 logic (thanks copilot!)
dconeybe Jul 3, 2025
5c10ab1
Merge remote-tracking branch 'origin/main' into Utf8StringComparePerf…
dconeybe Jul 3, 2025
02b2f7a
reduce diff from main branch, to make it easier for code review
dconeybe Jul 3, 2025
7d88250
massively simplify the comparison
dconeybe Jul 3, 2025
8e7514b
simplify
dconeybe Jul 3, 2025
429cb1e
Merge branch 'main' into dconeybe/firestore/Utf8StringComparePerforma…
dconeybe Jul 3, 2025
02c2371
reword comment
dconeybe Jul 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions firebase-firestore/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
# Unreleased
* [fixed] Further improved performance of UTF-8 string ordering logic,
which had degraded in v25.1.2 and received some improvements in v25.1.3.
[#7053](//github.com/firebase/firebase-android-sdk/issues/7053)


# 25.1.4
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@

package com.google.firebase.firestore.util;

import static java.lang.Character.isSurrogate;

import android.annotation.SuppressLint;
import android.os.Handler;
import android.os.Looper;
Expand Down Expand Up @@ -87,46 +89,42 @@ public static int compareIntegers(int i1, int i2) {

/** Compare strings in UTF-8 encoded byte order */
public static int compareUtf8Strings(String left, String right) {
int i = 0;
while (i < left.length() && i < right.length()) {
int leftCodePoint = left.codePointAt(i);
int rightCodePoint = right.codePointAt(i);

if (leftCodePoint != rightCodePoint) {
if (leftCodePoint < 128 && rightCodePoint < 128) {
// ASCII comparison
return Integer.compare(leftCodePoint, rightCodePoint);
} else {
// substring and do UTF-8 encoded byte comparison
ByteString leftBytes = ByteString.copyFromUtf8(getUtf8SafeBytes(left, i));
ByteString rightBytes = ByteString.copyFromUtf8(getUtf8SafeBytes(right, i));
int comp = compareByteStrings(leftBytes, rightBytes);
if (comp != 0) {
return comp;
} else {
// EXTREMELY RARE CASE: Code points differ, but their UTF-8 byte representations are
// identical. This can happen with malformed input (invalid surrogate pairs), where
// Java's encoding leads to unexpected byte sequences. Meanwhile, any invalid surrogate
// inputs get converted to "?" by protocol buffer while round tripping, so we almost
// never receive invalid strings from backend.
// Fallback to code point comparison for graceful handling.
return Integer.compare(leftCodePoint, rightCodePoint);
}
}
// noinspection StringEquality
if (left == right) {
return 0;
}

// Find the first differing characters in the strings and, if found, use them to determine the
// overall comparison result. This simple and efficient formula serendipitously works because
// of the properties of UTF-8 and UTF-16 encodings; that is, if both UTF-16 characters are
// surrogates or both are non-surrogates then the relative ordering of those individual
// characters is the same as the relative ordering of the lexicographical ordering of the UTF-8
// encoding of those characters (or character pairs, in the case of surrogate pairs). Also, if
// one is a surrogate and the other is not then it is assumed to be the high surrogate of a
// surrogate pair (otherwise it would not constitute a valid surrogate pair) and, therefore,
// would necessarily be ordered _after_ the non-surrogate because all surrogate pairs represent
// characters with code points above 0xFFFF and such characters produce a 4-byte UTF-8 encoding
// whose first byte is 11110xxx, and since the other character is a non-surrogate it represents
// a character with a code point less than or equal to 0xFFFF and produces a 1-byte, 2-byte, or
// 3-byte UTF-8 encoding whose first (or only) byte is 0xxxxxxx, 110xxxxx, or 1110xxxx,
// respectively, which is always less than 11110xxx when interpreted as a 2's-complement
// unsigned integer.
final int length = Math.min(left.length(), right.length());
for (int i = 0; i < length; i++) {
final char leftChar = left.charAt(i);
final char rightChar = right.charAt(i);
if (leftChar != rightChar) {
return (isSurrogate(leftChar) == isSurrogate(rightChar))
? Util.compareIntegers(leftChar, rightChar)
: isSurrogate(leftChar) ? 1 : -1;
}
// Increment by 2 for surrogate pairs, 1 otherwise.
i += Character.charCount(leftCodePoint);
}

// Compare lengths if all characters are equal
// Use the lengths of the strings to determine the overall comparison result since either the
// strings were equal or one is a prefix of the other.
return Integer.compare(left.length(), right.length());
}

private static String getUtf8SafeBytes(String str, int index) {
int firstCodePoint = str.codePointAt(index);
return str.substring(index, index + Character.charCount(firstCodePoint));
}

/**
* Utility function to compare longs. Note that we can't use Long.compare because it's only
* available after Android 19.
Expand Down
Loading