Parallel Query Job, all cores working on the same chunk #37
-
Hello, I'm making a bunch (50.000) of meshes orbit a point via Friflo and Godot's Multimesh. When using Parallel Query Job and crude debugging, it seems like one chunk is processed multiple time and the others are not. private void UpdateTransforms(float delta)
{
GD.Print("Start");
var queryJob = _meshesQuery.ForEach((transforms, speeds, entities) =>
{
int i;
for (i = 0; i < entities.Length; i++)
{
float angle = speeds[i].value * delta;
System.Numerics.Vector3 position = new(transforms[i].m41, transforms[i].m42, transforms[i].m43);
Matrix4x4 rotation = Matrix4x4.CreateRotationY(angle);
position = System.Numerics.Vector3.Transform(position, rotation);
transforms[i].value = transforms[i].value with
{
M41 = position.X,
M42 = position.Y,
M43 = position.Z
};
}
GD.Print($"ChunkSize: {i}");
});
queryJob.RunParallel();
int n = 0;
foreach (var (transforms, _) in _transQuery.Chunks)
{
var transformSpan = transforms.Span;
int same = 0;
for (int i = 0; i < transforms.Length; i++)
{
var oldTransform = Multimesh.GetInstanceTransform(n);
var newTransform = AsTransform3D(ref transformSpan[i].value);
if (oldTransform == newTransform) { same++; }
Multimesh.SetInstanceTransform(n++, newTransform);
}
GD.Print($"Unmodified: {same}");
}
GD.Print($"Total: {n}");
} This output the following each frame, with a runner on 4 cores: ChunkSize: 12512 37.488 meshes are not moving at all and 12.512 of them are moving at 4x the given speed. If running with 5 cores, 10.000 meshes are moving 5x faster, etc. So my loop is applied by each core on the same chunk. It works fine, meaning all 50.000 meshes orbit, with regular query or queryJob.Run() not parallel. What am I doing wrong? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
hi, I tried to reproduce the issue. This issue is present in In case you still want to use It was fixed by Engine - ECS: fixed Chunk<> indexer So I guess you are using If the preview version had the same issue. I need following infos: Can you log Also please log the sizeof() both component types - Transfrom & Speed. |
Beta Was this translation helpful? Give feedback.
-
You are absolutely right, I am using 2.2.0 and switching to 3.0.0-preview (17 while writing this) fix the issue. Here are the log with your required infos for 2.2.0:
The same log with 3.0.0preview:
So everything works now, thank you! I do have another question related to Parallel Querying. I'm fairly new to the ECS way of doing and thinking so this may appear naïve, but I thought as long as the query involves multiple chunks, using multithreading can only be beneficial. Yet when I try the code in the first message (which now works perfectly thanks to switching to 3.0.0preview) it is twice slower than the same without parallelization. This is the same code, it gives the same log output, but iterating chunks instead of using jobs: private void UpdateTransforms(float delta)
{
foreach (var (transforms, speeds, _) in _meshesQuery.Chunks)
{
var speedSpan = speeds.Span;
var transformSpan = transforms.Span;
int i;
for(i = 0; i < crows.Length; i++)
{
float angle = speedSpan[i].value * delta;
System.Numerics.Vector3 position = new(transformSpan[i].m41, transformSpan[i].m42, transformSpan[i].m43);
Matrix4x4 rotation = Matrix4x4.CreateRotationY(angle);
position = System.Numerics.Vector3.Transform(position, rotation);
transformSpan[i].value = transformSpan[i].value with
{
M41 = position.X,
M42 = position.Y,
M43 = position.Z
};
}
});
int n = 0;
foreach (var (transforms, _) in _transQuery.Chunks)
{
var transformSpan = transforms.Span;
int same = 0;
for (int i = 0; i < transforms.Length; i++)
{
var oldTransform = Multimesh.GetInstanceTransform(n);
var newTransform = AsTransform3D(ref transformSpan[i].value);
if (oldTransform == newTransform) { same++; }
Multimesh.SetInstanceTransform(n++, newTransform);
}
}
} Query job: 17 FPS Am I using a query job in the wrong context? If so, when to use parallel query to improve performances? Or maybe I'm just doing it plainly wrong? Edit: In contrast, this gives 32 FPS: private void UpdateTransforms(float delta)
{
Parallel.ForEach(_meshesQuery.Chunks, chunk =>
{
var (transforms, speeds, _) = chunk;
var speedSpan = speeds.Span;
var transformSpan = transforms.Span;
for(int i = 0; i < speeds.Length; i++)
{
float angle = speedSpan [i].speed * delta;
System.Numerics.Vector3 position = new(transformSpan[i].m41, transformSpan[i].m42, transformSpan[i].m43);
Matrix4x4 rotation = Matrix4x4.CreateRotationY(angle);
position = System.Numerics.Vector3.Transform(position, rotation);
transformSpan[i].value = transformSpan[i].value with
{
M41 = position.X,
M42 = position.Y,
M43 = position.Z
};
}
});
int n = 0;
foreach (var (transforms, _) in _transQuery.Chunks)
{
var transformSpan = transforms.Span;
for(int i = 0; i < transforms.Length; i++)
{
Multimesh.SetInstanceTransform(n++, AsTransform3D(ref transformSpan[i].value));
}
}
} |
Beta Was this translation helpful? Give feedback.
-
To summarize the three alternatives:
I assume the entity count is still 50.000 in all cases. Recommendations - but I only can guess here.
General optimization for all cases. System.Numerics.Vector3 position = new(transformSpan[i].m41, transformSpan[i].m42, transformSpan[i].m43);
// better
ref var transform = ref transformSpan[i];
System.Numerics.Vector3 position = new(transform.m41, transform.m42, transform.m43); Maybe you find some useful performance infos at |
Beta Was this translation helpful? Give feedback.
hi,
I tried to reproduce the issue.
When doing this I remembered I had an issue with the Chunk<> indexer.
This has an impact on the usage of
transforms[i]
andspeeds[i]
in your code.This issue is present in
2.2.0
and was fixed in all newer3.0.0-preview
version.In case you still want to use
2.2.0
you should use the Chunk<>.Span.The Span is using the
start
index. The indexer not - which is a bug in2.2.0
.It was fixed by Engine - ECS: fixed Chunk<> indexer
So I guess you are using
2.2.0
. Correct?If the preview version had the same issue. I need following infos:
Can you log
queryJob.ParallelComponentMultiple
queryJob.MinParallelChunkLength
before calling
queryJob.RunParallel();
Also ple…