Skip to content

Ignore disabled hyperthreads #291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

Flamefire
Copy link

@Flamefire Flamefire commented May 6, 2025

Only consider processors with an APIC_ID as valid.

For processors with Hyperthreads but disabled SMT the APIC_ID will never be set by CPUInfo and stays zero for the "disabled" threads. Among other inconsistencies this causes the first active thread/processor to be considered "the same" as all the disabled ones as they share APIC_ID=0

It also doesn't make sense to make and decisions based on the APIC_ID if we don't have any information on it.

So this patch adds a check for CPUINFO_LINUX_FLAG_APIC_ID.
The flag CPUINFO_LINUX_FLAG_APIC_ID will be set if and only if the APIC_ID has been set. If it has not then CPUInfo doesn't has any information about the APIC_ID, hence the default value of zero. See

processor->apic_id = apic_id;
processor->flags |= CPUINFO_LINUX_FLAG_APIC_ID;

On a dual socket AMD EPYC 9334 system (2x32 cores, 128 threads total (2/core)) the current output of cache-info is

Max cache size (upper bound): 33554432 bytes
L1 instruction cache: 64 x 32 KB, 8-way set associative (64 sets), 64 byte lines, shared by 65 processors
L1 data cache: 64 x 32 KB, 8-way set associative (64 sets), 64 byte lines, shared by 65 processors
L2 unified cache: 64 x 1 MB (inclusive), 8-way set associative (2048 sets), 64 byte lines, shared by 65 processors
L3 unified cache: 8 x 32 MB (exclusive), 16-way set associative (32768 sets), 64 byte lines, shared by 72 processors

Note the "65 processors" being one-off.

With this patch:

Max cache size (upper bound): 33554432 bytes
L1 instruction cache: 64 x 32 KB, 8-way set associative (64 sets), 64 byte lines, shared by 1 processors
L1 data cache: 64 x 32 KB, 8-way set associative (64 sets), 64 byte lines, shared by 1 processors
L2 unified cache: 64 x 1 MB (inclusive), 8-way set associative (2048 sets), 64 byte lines, shared by 1 processors
L3 unified cache: 8 x 32 MB (exclusive), 16-way set associative (32768 sets), 64 byte lines, shared by 8 processor

This matches the Zen4 architecture having 32KB Data and 32KB instruction L1 cache per core, 1MB L2 cache per core and 256MB L3 cache total.

Fixes #238

@malfet
Copy link
Contributor

malfet commented May 12, 2025

@Flamefire can you please fix lint? (namely run clang-format on your PR)
Also, do you mind adding a few more references in the doc to the PR description that will help with reviewing this change

@Flamefire
Copy link
Author

@Flamefire can you please fix lint? (namely run clang-format on your PR) Also, do you mind adding a few more references in the doc to the PR description that will help with reviewing this change

Fixed the clang format issue and squashed it into the initial commit

I'm not sure what to put into the description in addition to what I have. I added a paragraph linking to the logic of CPUInfo itself. I.e. the issue is that while CPUInfo "knows" that it doesn't has information on the APIC ID it still uses it as-if it was valid.
As for WHY the information is not available I wasn't able to find any reliable information. But the change at least makes sense from the perspective of the existing logic and observations.

@malfet
Copy link
Contributor

malfet commented May 22, 2025

@pytorchbot rebase

Only consider processors with an APIC_ID as valid.

For processors with Hyperthreads but disabled SMT the APIC_ID will never be set and stays zero for the "disabled" threads.
Among other inconsistencies this causes the first active thread/processor to be considered "the same" as all the disabled ones as they share `APIC_ID=0`

It also doesn't make sense to make and decisions based on the APIC_ID if we don't have any information on it.
@Flamefire
Copy link
Author

Rebased manually. Not sure why the bot didn't do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect cache-info output with nosmt linux kernel command line
3 participants