fixes arm

lemire · lemire · commit e2e9f7423cc3 · 2024-06-06T10:37:03.000-04:00
diff --git a/README.md b/README.md
@@ -6,15 +6,31 @@ This is a fast C# library to validate UTF-8 strings.
 
 ## Motivation
 
-We seek to speed up the `Utf8Utility.GetPointerToFirstInvalidByte` function. Using the algorithm used by Node.js, Oracle GraalVM  and other important systems.
-
-- John Keiser, Daniel Lemire, [Validating UTF-8 In Less Than One Instruction Per Byte](https://arxiv.org/abs/2010.03090), Software: Practice and Experience 51 (5), 2021
+We seek to speed up the `Utf8Utility.GetPointerToFirstInvalidByte` function from the C# runtime library.
+[The function is private in the Microsoft Runtime](https://github.com/dotnet/runtime/blob/4d709cd12269fcbb3d0fccfb2515541944475954/src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Validation.cs), but we can expose it manually.
 
-The algorithm in question is part of popular JavaScript runtimes such as Node.js and Bun, [by PHP](https://github.com/php/php-src/blob/90e0ce7f0db99767c58dc21e4213c0f8763f657a/ext/mbstring/mbstring.c#L5270), by  Oracle GraalVM and many important systems. 
+Specifically, we provide the function `SimdUnicode.UTF8.GetPointerToFirstInvalidByte` which is a faster
+drop-in replacement:
+```cs
+// Returns &inputBuffer[inputLength] if the input buffer is valid.
+/// <summary>
+/// Given an input buffer <paramref name="pInputBuffer"/> of byte length <paramref name="inputLength"/>,
+/// returns a pointer to where the first invalid data appears in <paramref name="pInputBuffer"/>.
+/// The parameter <paramref name="Utf16CodeUnitCountAdjustment"/> is set according to the content of the valid UTF-8 characters encountered, counting -1 for each 2-byte character, -2 for each 3-byte character, and -3 for each 4-byte character.
+/// The parameter <paramref name="ScalarCodeUnitCountAdjustment"/> is set according to the content of the valid UTF-8 characters encountered, counting -1 for each 4-byte character.
+/// </summary>
+/// <remarks>
+/// Returns a pointer to the end of <paramref name="pInputBuffer"/> if the buffer is well-formed.
+/// </remarks>
+public unsafe static byte* GetPointerToFirstInvalidByte(byte* pInputBuffer, int inputLength, out int Utf16CodeUnitCountAdjustment, out int ScalarCodeUnitCountAdjustment);
+```
 
-[The function is private in the Microsoft Runtime](https://github.com/dotnet/runtime/blob/4d709cd12269fcbb3d0fccfb2515541944475954/src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Validation.cs), but we can expose it manually.
+The function uses advanced instructions (SIMD) on 64-bit ARM and x64 processors, but fallbacks on a
+conventional implementation on other systems. We provide extensive tests and benchmarks.
 
+We apply the algorithm used by Node.js, Bun, Oracle GraalVM, by the PHP interpreter and other important systems. The algorithm has been described in the follow article:
 
+- John Keiser, Daniel Lemire, [Validating UTF-8 In Less Than One Instruction Per Byte](https://arxiv.org/abs/2010.03090), Software: Practice and Experience 51 (5), 2021
 
 
 ## Requirements
diff --git a/src/UTF8.cs b/src/UTF8.cs
@@ -1037,13 +1037,13 @@ private unsafe static (int utfadjust, int scalaradjust) calculateErrorPathadjust
                             prevIncomplete = AdvSimd.SubtractSaturate(currentBlock, maxValue);
                             Vector128<sbyte> largestcont = Vector128.Create((sbyte)-65); // -65 => 0b10111111
                             contbytes += -AdvSimd.Arm64.AddAcross(AdvSimd.CompareLessThanOrEqual(Vector128.AsSByte(currentBlock), largestcont)).ToScalar();
-                            Vector128<byte> fourthByteMinusOne = Vector128.Create((byte)(0b11110000u - 1));
 
                             // computing n4 is more expensive than we would like:
-                            var largerthan0f = AdvSimd.CompareGreaterThan(currentBlock, fourthByteMinusOne);
-                            var largerthan0fones = AdvSimd.And(largerthan0f, Vector128.Create((byte)1));
-                            var largerthan0fonescount = AdvSimd.Arm64.AddAcross(largerthan0fones).ToScalar();
-                            n4 += largerthan0fonescount;
+                            Vector128<byte> fourthByteMinusOne = Vector128.Create((byte)(0b11110000u - 1));
+                            Vector128<byte> largerthan0f = AdvSimd.CompareGreaterThan(currentBlock, fourthByteMinusOne);
+                            byte n4add = (byte)AdvSimd.Arm64.AddAcross(largerthan0f).ToScalar();
+                            int negn4add = (int)(byte)-n4add;
+                            n4 += negn4add;
                         }
                         asciibytes -= (sbyte)AdvSimd.Arm64.AddAcross(AdvSimd.CompareLessThan(currentBlock, v80)).ToScalar();
                     }