Skip to content

SIMD vectorization of Array.sum<int>, etc #18509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/release-notes/.FSharp.Core/10.0.100.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,6 @@
### Changed

* Random functions support for zero element chosen/sampled ([PR #18568](https://github.com/dotnet/fsharp/pull/18568))
* Array.sum and Seq.sum to call System.Linq.Enumerable methods on base-types (float/float32/int/int64) to utilize vectorization. [PR #18509](https://github.com/dotnet/fsharp/pull/18509)

### Breaking Changes
29 changes: 27 additions & 2 deletions src/FSharp.Core/array.fs
Original file line number Diff line number Diff line change
Expand Up @@ -1578,8 +1578,7 @@ module Array =
checkNonNull "array" array
Microsoft.FSharp.Primitives.Basics.Array.permute indexMap array

[<CompiledName("Sum")>]
let inline sum (array: ^T array) : ^T =
let inline private fsharpSumImpl (array: ^T array) : ^T =
checkNonNull "array" array
let mutable acc = LanguagePrimitives.GenericZero< ^T>

Expand All @@ -1588,6 +1587,32 @@ module Array =

acc

let isNetFramework = System.Runtime.InteropServices.RuntimeInformation.FrameworkDescription.StartsWith ".NET Framework"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to functionally compile to a static readonly bool, so that the JIT can optimize things appropriately?

Is this going to do the "wrong" thing on custom runtimes or scenarios where vectorization may not be available or possible? For example, there was no SIMD acceleration on 32-bit Unix for a while and there is non on Arm32 today. Likewise, acceleration can be disabled via environment variables for testing purposes.

In general it's expected that Enumerable.Sum is going to do the most optimal thing over time based on the underlying hardware and other user options (like if you're compiling for size vs speed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to functionally compile to a static readonly bool, so that the JIT can optimize things appropriately?

This is runtime check, not compile-time check. Because nothing says the code is compiled and run on similar machines.

Is this going to do the "wrong" thing on custom runtimes or scenarios where vectorization may not be available or possible? For example, there was no SIMD acceleration on 32-bit Unix for a while and there is non on Arm32 today. Likewise, acceleration can be disabled via environment variables for testing purposes.

No, because Enumerable.Sum already checks that within its implementation.
The only reason for this check is because Enumerable.Sum is slow on old .NET Framework.

Copy link
Member

@tannergooding tannergooding Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is runtime check, not compile-time check. Because nothing says the code is compiled and run on similar machines.

Yes, which is why you should ensure it compiles down to a static readonly bool in IL. Because that will cause it to be initialized at runtime in Tier 0 and then allow the JIT to treat it as a constant in Tier 1 (or for NativeAOT), allowing the check to be elided once we do know the actual machine/runtime it's running on.

The only reason for this check is because Enumerable.Sum is slow on old .NET Framework.

The point was that you're doing a specific check for .NET Framework, which doesn't account for custom runtimes or other scenarios that may or may not be relevant. So I'm just asking if the nuance of that has been fully considered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tannergooding , I think you have it right - Enumerable.Sum will do the best thing that is available on all versions of modern .NET. Even without vectorization, it will still detect when the sequence passed to it can be treated as an ReadOnlySpan.

It just does not apply to desktop framework, which is still supported and can be used with latest F# and latest FSharp.Core - there the Enumerable.Sum is slower than Array.sum in FSharp.Core.

@tannergooding :
Is there a recommended way to locally benchmark a modern .NET version, however with intentionally disabled vectorization? (to proof that .NET 9/10 ; even when not vectorized, does not carry the drastical perf worsening visible at .NET Framework implementation of Enumerable.Sum compared to FSharp.Core's Array.sum ? )

Copy link
Member

@Happypig375 Happypig375 Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@T-Gro I think he means that the check should be emitted such that it should be optimized away by JIT and directly use the relevant implementation depending on framework - Array.sum on .NET Framework, Enumerable.Sum otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is one part of the concern - F#'s emit for let does not use initonly because of the file+module based initialization semantics which guarantees execution order as written in the file.

The other part of the concern is if indeed Enumerable.Sum is not worse for any configuration not .NET Framework, especially environments without vectorization support.


[<CompiledName("Sum")>]
let inline sum (array: ^T array) : ^T =
fsharpSumImpl array
when ^T : float =
if isNetFramework then fsharpSumImpl array
else
let r = (System.Linq.Enumerable.Sum : IEnumerable<float> -> float) (# "" array : IEnumerable<float> #)
(# "" r : 'T #)
when ^T : float32 =
if isNetFramework then fsharpSumImpl array
else
let r = (System.Linq.Enumerable.Sum : IEnumerable<float32> -> float32) (# "" array : IEnumerable<float32> #)
(# "" r : 'T #)
when ^T : int =
if isNetFramework then fsharpSumImpl array
else
let r = (System.Linq.Enumerable.Sum : IEnumerable<int> -> int) (# "" array : IEnumerable<int> #)
(# "" r : 'T #)
when ^T : int64 =
if isNetFramework then fsharpSumImpl array
else
let r = (System.Linq.Enumerable.Sum : IEnumerable<int64> -> int64) (# "" array : IEnumerable<int64> #)
(# "" r : 'T #)

[<CompiledName("SumBy")>]
let inline sumBy ([<InlineIfLambda>] projection: 'T -> ^U) (array: 'T array) : ^U =
checkNonNull "array" array
Expand Down
5 changes: 5 additions & 0 deletions src/FSharp.Core/array.fsi
Original file line number Diff line number Diff line change
Expand Up @@ -2466,6 +2466,11 @@ module Array =
[<CompiledName("SortByDescending")>]
val inline sortByDescending: projection: ('T -> 'Key) -> array: 'T array -> 'T array when 'Key: comparison

/// Internal use of Array.sum to detect if vectorization can be used.
/// Due to sum "inline" this can't be private.
[<System.ComponentModel.EditorBrowsable(System.ComponentModel.EditorBrowsableState.Never)>]
val isNetFramework : bool

/// <summary>Returns the sum of the elements in the array.</summary>
///
/// <param name="array">The input array.</param>
Expand Down
34 changes: 30 additions & 4 deletions src/FSharp.Core/seq.fs
Original file line number Diff line number Diff line change
Expand Up @@ -478,7 +478,8 @@ module Internal =
static member Bind(g: Generator<'T>, cont) =
match g with
| :? GenerateThen<'T> as g ->
GenerateThen<_>.Bind(g.Generator, (fun () -> GenerateThen<_>.Bind(g.Cont(), cont)))
GenerateThen<_>
.Bind(g.Generator, (fun () -> GenerateThen<_>.Bind(g.Cont(), cont)))
| g -> (new GenerateThen<'T>(g, cont) :> Generator<'T>)

let bindG g cont =
Expand Down Expand Up @@ -1463,15 +1464,40 @@ module Seq =
else
mkDelayedSeq (fun () -> countByRefType projection source)

[<CompiledName("Sum")>]
let inline sum (source: seq< ^a >) : ^a =
let inline private fsharpSumImpl (source: seq< ^a >) : ^a =
use e = source.GetEnumerator()
let mutable acc = LanguagePrimitives.GenericZero< ^a>

while e.MoveNext() do
acc <- Checked.(+) acc e.Current

acc
acc

let isNetFramework = System.Runtime.InteropServices.RuntimeInformation.FrameworkDescription.StartsWith ".NET Framework"

[<CompiledName("Sum")>]
let inline sum (source: seq< ^a >) : ^a =
fsharpSumImpl source
when ^a: int64 =
if isNetFramework then fsharpSumImpl source
else
let r = (System.Linq.Enumerable.Sum: IEnumerable<int64> -> int64) (# "" source : IEnumerable<int64> #)
(# "" r : 'a #)
when ^a: int =
if isNetFramework then fsharpSumImpl source
else
let r = (System.Linq.Enumerable.Sum: IEnumerable<int> -> int) (# "" source : IEnumerable<int> #)
(# "" r : 'a #)
when ^a: float32 =
if isNetFramework then fsharpSumImpl source
else
let r = (System.Linq.Enumerable.Sum: IEnumerable<float32> -> float32) (# "" source : IEnumerable<float32> #)
(# "" r : 'a #)
when ^a: float =
if isNetFramework then fsharpSumImpl source
else
let r = (System.Linq.Enumerable.Sum: IEnumerable<float> -> float) (# "" source : IEnumerable<float> #)
(# "" r : 'a #)

[<CompiledName("SumBy")>]
let inline sumBy ([<InlineIfLambda>] projection: 'T -> ^U) (source: seq<'T>) : ^U =
Expand Down
5 changes: 5 additions & 0 deletions src/FSharp.Core/seq.fsi
Original file line number Diff line number Diff line change
Expand Up @@ -2329,6 +2329,11 @@ module Seq =
[<CompiledName("SortByDescending")>]
val inline sortByDescending: projection: ('T -> 'Key) -> source: seq<'T> -> seq<'T> when 'Key: comparison

/// Internal use of Seq.sum to detect if vectorization can be used.
/// Due to sum "inline" this can't be private.
[<System.ComponentModel.EditorBrowsable(System.ComponentModel.EditorBrowsableState.Never)>]
val isNetFramework : bool

/// <summary>Returns the sum of the elements in the sequence.</summary>
///
/// <remarks>The elements are summed using the <c>+</c> operator and <c>Zero</c> property associated with the generated type.</remarks>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,11 @@ type CollectionsBenchmark() =
|> Array.updateAt (x.Length - 1) 1
|> ignore

[<Benchmark>]
member x.ArraySum() =
array
|> Array.sum
|> ignore
/// Seq
[<Benchmark>]
member x.SeqBaseline() =
Expand Down
Loading