You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+27-5Lines changed: 27 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,9 @@ This is a fast C# library to validate UTF-8 strings.
7
7
## Motivation
8
8
9
9
We seek to speed up the `Utf8Utility.GetPointerToFirstInvalidByte` function from the C# runtime library.
10
-
[The function is private in the Microsoft Runtime](https://github.com/dotnet/runtime/blob/4d709cd12269fcbb3d0fccfb2515541944475954/src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Validation.cs), but we can expose it manually.
10
+
[The function is private in the Microsoft Runtime](https://github.com/dotnet/runtime/blob/4d709cd12269fcbb3d0fccfb2515541944475954/src/libraries/System.Private.CoreLib/src/System/Text/Unicode/Utf8Utility.Validation.cs), but we can expose it manually. The C# runtime
11
+
function is well optimized and it makes use of advanced CPU instructions. Nevertheless, we propose
12
+
an alternative that can be several times faster.
11
13
12
14
Specifically, we provide the function `SimdUnicode.UTF8.GetPointerToFirstInvalidByte` which is a faster
13
15
drop-in replacement:
@@ -35,7 +37,7 @@ We apply the algorithm used by Node.js, Bun, Oracle GraalVM, by the PHP interpre
35
37
36
38
## Requirements
37
39
38
-
We recommend you install .NET 8: https://dotnet.microsoft.com/en-us/download/dotnet/8.0
40
+
We recommend you install .NET 8 or better: https://dotnet.microsoft.com/en-us/download/dotnet/8.0
39
41
40
42
41
43
## Running tests
@@ -74,8 +76,6 @@ Or to target specific categories:
74
76
dotnet test --filter "Category=scalar"
75
77
```
76
78
77
-
78
-
79
79
## Running Benchmarks
80
80
81
81
To run the benchmarks, run the following command:
@@ -98,6 +98,28 @@ cd benchmark
98
98
sudo dotnet run -c Release
99
99
```
100
100
101
+
## Results (x64)
102
+
103
+
To be completed.
104
+
105
+
## Results (ARM)
106
+
107
+
On an Apple M2 system, our validation function is two to three times
108
+
faster than the standard library.
109
+
110
+
| data set | SimdUnicode speed (GB/s) | .NET speed (GB/s) |
-[Add optimized UTF-8 validation and transcoding apis, hook them up to UTF8Encoding](https://github.com/dotnet/coreclr/pull/21948/files#diff-2a22774bd6bff8e217ecbb3a41afad033ce0ca0f33645e9d8f5bdf7c9e3ac248)
Copy file name to clipboardExpand all lines: benchmark/UTF8_runtime.cs
+2-16Lines changed: 2 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -9,22 +9,8 @@
9
9
usingSystem.Runtime.Intrinsics.Arm;
10
10
usingSystem.Runtime.Intrinsics.X86;
11
11
12
-
// Changes made from the Runtime (most of the stuff in the runtime is behind some private/internal class or some such. The path of least resistance was to copy paste.
0 commit comments