Skip to content

(docs) Describe the flows of setting NUMA node affinity in Xen by xenopsd #6335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

bernhardkaindl
Copy link
Collaborator

Describe xc_vcpu_setaffinity() and document its use by xenguest and xenopsd.

This PR is my third iteration of documenting how xenopsd and xenguest
interact, configure, and set the vCPU affinity. With the clarifications from
@edwintorok, I now have confidence that I finally got this right.

@bernhardkaindl
Copy link
Collaborator Author

I hope this looks good. I plan to document it further and add any missing pieces step by step.

A 2nd PR to improve the walkthrough of xenguest is in progress.
If you have any suggestions, I can incorporate them into the follow-up PRs.

Preview site links:

@bernhardkaindl bernhardkaindl force-pushed the docs-describe-setting-numa-affinity branch 2 times, most recently from 554d434 to 68d4187 Compare March 3, 2025 16:57
@bernhardkaindl bernhardkaindl requested review from mg12 and edwintorok and removed request for edwintorok March 3, 2025 17:15
@bernhardkaindl
Copy link
Collaborator Author

bernhardkaindl commented Mar 4, 2025

@mg12 / @edwintorok: JFYI, in case you can approve this PR as well.

I resolved the review discussions so far.

To summarize it:

  • There was a discussion about the inconsistency of passing the hard-affinity in Xenstore and setting the soft-affinity directly in xenopsd using the dedicated hypercall for it.
    • andyhhp commented on this fact with:

      xenguest is XenServer's local version of part of the original xend toolstack. Parameter passing is a total mess, and could do with an overhaul. e.g. passing affinities in unary is horrible.

    • It could be said that it at least readable this way (even tough it could be better, e.g. "all" instead of "11111111", as when there are hosts with a diffrent number of pCPUs in a pool, the hard-affinity of "1111111" could break on a migration):
      /local/domain/23/platform/vcpu/0/affinity = "11111111"
      /local/domain/23/platform/vcpu/1/affinity = "11111111
    • The result by @edwintorok was that the XenStore is part of the VM's ABI, thus we cannot change it.

I any case, the fact is this this is now apparently an ABI that we can't change independent how horrible it might be.

The primary concern is that we need to have a place where we can find what is going on.
Only with the understanding what is currently done, we can find good solution for the design.

This PR is just the initial step that enables us to be aware of what the current code does, as before this PR all of this was hidden and not easily accessible.

Thus, lets review the PR not on the basis on possible future design changes yet, but primarily on the basis of:

Yeah, this is the current state.

But let's go forward, ("relatively" small) step by ("relatively" small) step.

I think it's a very useful update to have the added flowchart for documenting the exiting NUMA code, and it helps in designing improvements for it:

flowchart TD

subgraph VM.create["xenopsd VM.create"]

    %% Is xe vCPU-params:mask= set? If yes, write to Xenstore:

    is_xe_vCPUparams_mask_set?{"

            Is
            <tt>xe vCPU-params:mask=</tt>
            set? Example: <tt>1,2,3</tt>
            (Is used to enable vCPU<br>hard-affinity)

        "} --"yes"--> set_hard_affinity("Write hard-affinity to XenStore:
                        <tt>platform/vcpu/#domid/affinity</tt>
                        (xenguest will read this and other configuration data
                         from Xenstore)")

end

subgraph VM.build["xenopsd VM.build"]

    %% Labels of the decision nodes

    is_Host.numa_affinity_policy_set?{
        Is<p><tt>Host.numa_affinity_policy</tt><p>set?}
    has_hard_affinity?{
        Is hard-affinity configured in <p><tt>platform/vcpu/#domid/affinity</tt>?}

    %% Connections from VM.create:
    set_hard_affinity --> is_Host.numa_affinity_policy_set?
    is_xe_vCPUparams_mask_set? == "no"==> is_Host.numa_affinity_policy_set?

    %% The Subgraph itself:

    %% Check Host.numa_affinity_policy

    is_Host.numa_affinity_policy_set?

        %% If Host.numa_affinity_policy is "best_effort":

        -- Host.numa_affinity_policy is<p><tt>best_effort -->

            %% If has_hard_affinity is set, skip numa_placement:

            has_hard_affinity?
                --"yes"-->exec_xenguest

            %% If has_hard_affinity is not set, run numa_placement:

            has_hard_affinity?
                --"no"-->numa_placement-->exec_xenguest

        %% If Host.numa_affinity_policy is off (default, for now),
        %% skip NUMA placement:

        is_Host.numa_affinity_policy_set?
            =="default: disabled"==>
            exec_xenguest
end

%% xenguest subgraph

subgraph xenguest

    exec_xenguest

        ==> stub_xc_hvm_build("<tt>stub_xc_hvm_build()")

            ==> configure_vcpus("<tT>configure_vcpus()")

                %% Decision
                ==> set_hard_affinity?{"
                        Is <tt>platform/<br>vcpu/#domid/affinity</tt>
                        set?"}

end

%% do_domctl Hypercalls

numa_placement
    --Set the NUMA placement using soft-affinity-->
    XEN_VCPUAFFINITY_SOFT("<tt>xc_vcpu_setaffinity(SOFT)")
        ==> do_domctl

set_hard_affinity?
    --yes-->
    XEN_VCPUAFFINITY_HARD("<tt>xc_vcpu_setaffinity(HARD)")
        --> do_domctl

xc_domain_node_setaffinity("<tt>xc_domain_node_setaffinity()</tt>
                            and
                            <tt>xc_domain_node_getaffinity()")
                                <--> do_domctl

%% Xen subgraph

subgraph xen[Xen Hypervisor]

    subgraph domain_update_node_affinity["domain_update_node_affinity()"]
        domain_update_node_aff("<tt>domain_update_node_aff()")
        ==> check_auto_node{"Is domain's<br><tt>auto_node_affinity</tt><br>enabled?"}
          =="yes (default)"==>set_node_affinity_from_vcpu_affinities("
            Calculate the domain's <tt>node_affinity</tt> mask from vCPU affinity
            (used for further NUMA memory allocation for the domain)")
    end

    do_domctl{"do_domctl()<br>op->cmd=?"}
        ==XEN_DOMCTL_setvcpuaffinity==>
            vcpu_set_affinity("<tt>vcpu_set_affinity()</tt><br>set the vCPU affinity")
                ==>domain_update_node_aff
    do_domctl
        --XEN_DOMCTL_setnodeaffinity (not used currently)
            -->is_new_affinity_all_nodes?

    subgraph  domain_set_node_affinity["domain_set_node_affinity()"]

        is_new_affinity_all_nodes?{new_affinity<br>is #34;all#34;?}

            --is #34;all#34;

                --> enable_auto_node_affinity("<tt>auto_node_affinity=1")
                    --> domain_update_node_aff

        is_new_affinity_all_nodes?

            --not #34;all#34;

                --> disable_auto_node_affinity("<tt>auto_node_affinity=0")
                    --> domain_update_node_aff
    end

%% setting and getting the struct domain's node_affinity:

disable_auto_node_affinity
    --node_affinity=new_affinity-->
        domain_node_affinity

set_node_affinity_from_vcpu_affinities
    ==> domain_node_affinity@{ shape: bow-rect,label: "domain:&nbsp;node_affinity" }
        --XEN_DOMCTL_getnodeaffinity--> do_domctl

end
click is_Host.numa_affinity_policy_set?
"https://github.com/xapi-project/xen-api/blob/90ef043c1f3a3bc20f1c5d3ccaaf6affadc07983/ocaml/xenopsd/xc/domain.ml#L951-L962"
click numa_placement
"https://github.com/xapi-project/xen-api/blob/90ef043c/ocaml/xenopsd/xc/domain.ml#L862-L897"
click stub_xc_hvm_build
"https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L2329-L2436" _blank
click get_flags
"https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L1164-L1288" _blank
click do_domctl
"https://github.com/xen-project/xen/blob/7cf163879/xen/common/domctl.c#L282-L894" _blank
click domain_set_node_affinity
"https://github.com/xen-project/xen/blob/7cf163879/xen/common/domain.c#L943-L970" _blank
click configure_vcpus
"https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L1297-L1348" _blank
click set_hard_affinity?
"https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L1305-L1326" _blank
click xc_vcpu_setaffinity
"https://github.com/xen-project/xen/blob/7cf16387/tools/libs/ctrl/xc_domain.c#L199-L250" _blank
click vcpu_set_affinity
"https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1353-L1393" _blank
click domain_update_node_aff
"https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1809-L1876" _blank
click check_auto_node
"https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1840-L1870" _blank
click set_node_affinity_from_vcpu_affinities
"https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1867-L1869" _blank
Loading

…opsd

Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
@bernhardkaindl bernhardkaindl force-pushed the docs-describe-setting-numa-affinity branch from 68d4187 to 40833fb Compare March 6, 2025 13:27
@psafont psafont added this pull request to the merge queue Mar 11, 2025
Merged via the queue into xapi-project:master with commit d24d122 Mar 11, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants