Skip to content

[doc] Host network device ordering #6387

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

minglumlu
Copy link
Member

No description provided.

@minglumlu minglumlu marked this pull request as ready for review March 25, 2025 07:45
on whether the device is embedded or not, PCI cards in ascending slot order, and
ports in ascending PCI bus/device/function order breadth-first. Since the hosts
are identical, the orders generated by the `biosdevname` are consistent across
the hosts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there are a few more assumptions that could be stated:

  • The order of mac, PCI, eth is total
  • NICs have pairwise disjoint MAC addresses
  • NICs have pairwise disjoint network (ethX) slots
  • Two NICs may have identical PCI slots (but it is rare)
  • The sequence of PCI slots in use may have holes
  • The sequence of eth slots may have holes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see biosdevname explained with an example. What does this look like?

Copy link
Member Author

@minglumlu minglumlu Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of mac, PCI, eth is total

This is true in one enumeration. But this is not in multiple enumerations. E.g. a NIC (mac, PCI, eth) could be used to determine its position in one enumeration, but the PCI or eth might change in the next enumeration. Relying on the total order may result in different positions in different enumerations.

Two NICs may have identical PCI slots (but it is rare)

This is not rare. But anyway it needs to be supported.

The sequence of eth slots may have holes

This doesn't have holes.
Update: Ah, I see. The eth slots means the position in the document. Yes, in that case, they may have holes.

order should remain the same regardless of how many times the host is rebooted.

To achieve this, the initial order should be saved persistently on the host's
local storage so it can be referenced in subsequent orderings. When performing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to use file based storage or use the xapi database for this? Given that this is needed early in system start, xapi could be not available or at least in a failure case - such that working with files could be easier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, file based storage is to be used. The networkd doesn't use xapi.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to use a single file for this to avoid inconsistencies. Could be an easy to parse line-oriented format or JSON.

Therefore, the network devices in the saved order should have their MAC
addresses saved together, effectively mapping each position to a MAC address.
When performing an ordering, the stable position can be found by searching the
last saved order using the MAC address.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tying mac and ethX is plausible but contradicts the principle that PCI slot order (which is observable from the outside) determines NIC/ethX assignment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One example is:
initially, a:AA:0 c:CC:1 b:DD:2, where the letters in lower case stands for MAC, and the ones in upper case stand for PCI address, and the nubmers are positions.
Now, a new card is plugged at BB: a:AA x:BB c:CC b:DD
The PCI slot order will result the x:BB with a position 1. But what is expected is a:AA:0 x:BB:3 c:CC:1 b:DD:2.

position, MAC address, and PCI address are saved for future reference, and its
position will be reserved. This means there may be a gap in the order: a
position that was previously assigned to a network device is now vacant because
the device has been removed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need to discuss when such a slot can be re-used. But I agree that the information from a removed card should be remembered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the re-used scenario is actually Replacement.3 on the above section. But it's better to mention here when the reserved position is released and assigned to another device. It affects the Removed and then added back too. If the reserved position is released for the old device by replacement, the old device is not remembered and will be regarded as Newly added when it is added again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the reserved position is released and assigned to another device.

replacement and removed and then added back are just for this. Note the replacement can be removed and then add a new card on the same PCI slot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replacement is not newly added. The reserved position will not be released.

Rare cases that can not be handled automatically
------------------------------------------------

In summary, to keep the order stable, the auto-generated order needs to be saved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much of a problem are two NICs with identical PCI slot? How does this arise? Is this a card with multiple ethernet ports? Or can you really have two physical cards with the same PCI slot (I would be surprised).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two NICs with identical PCI slot is not rare case. The rare cases are the ones that the sorting logic's output doesn't match with the user's expectation.
The sorting logic uses MAC or PCI slot to track the changes. But it may fail. E.g. a user may unplug a device from a PCI slot and plug it to another slot. The current logic thinks the network of the NIC doesn't change. But what the user want is remove the NIC and just use the NIC for a new network. And after a few host boot up, a new NIC is plugged as replacement for the original network.
The sorting logic can only handle limited cases.

@lindig
Copy link
Contributor

lindig commented Mar 25, 2025

I believe examples for different scenarios could illustrate the principles.

Enumerating Host Interfaces

At boot time, dom0 enumerates the ethernet interfaces; typically they
are synonymous with with physical networks: in a pool we expect
interfaces in the same physical slot accross hosts to be on the same
network.

Notation

mac:pci:eth
!mac:pci:eth

A NIC is characterised by

  • MAC address, which is unique
  • PCI slot, which is not unique and multiple MAC addresses can share a
    PCI slot (rarely). PCI slots correspong to hardware slots and thus are
    physically observable.
  • eth slot - the network slot assigned to this interface by dom0. At any
    given time, no slot is assigned twice but the sequence of slots may
    have holes.
  • The !mac:pci:eth notation indicates that this slot was previously used
    but currently is free.

On a Linus system, MAC and PCI addresses have a specific format but we
are using here symbolic names for simplicity: MAC addresses use
lower case letters, PCI slots upper case letters and slots use numbers.

Invariants

  • The order of mac, PCI, eth is total
  • NICs have pairwise disjoint MAC addresses
  • NICs have pairwise disjoint network slots
  • NICs may have identical PCI slots (but it is rare)
  • The sequence of PCI slots in use may have holes
  • The sequence of eth slots may have holes

Common Cases

We believe the following cases are the most common:

  1. No changes to NICs from one system reboot to the next
  2. Replacing an existing NIC (new MAC, same PCI)
  3. Adding an additional NIC (new MAC, new PCI)
  4. Removing a NIC
  5. Moving a NIC (same MAC, new PCI)
  6. Anything involving more than one NIC

Priniciples

It would be good to establish a simple principle how NICs and eth slots
are related such that this becomes predictable.

  • At least initially, eth slots are aligned with PCI slots. This is to
    to make the connection between cabling and eth slots predictable.
    NICs in identical PCI slots have the same eth slot.

  • What happens when a NIC moves to a different PCI slot? Does it get a
    new eth slot (as per principle above) or does it keep the eth slot it
    occupied previously?

  • When a NIC is removed (and not replaced) its eth slot becomes
    available.

Scenarios

At boot time, the previous assignment of interfaces to slots is known
and the goal is to re-create as much as possible.

No Prior Assignment

This is a first-boot scenario: interfaces have never been assigned to
slots.

previous: (none)
input: a:AA b:DD c:CC
output: a:AA:0 c:CC:1 b:DD:2

Slots are assigned by order of PCI slots. Rationale: PCI slots are more
predictable than MAC addresses and correspond to physical locations.

No Change

A previous assignment exists, it has a hole, which is maintained. All
cards previously found are found again and assigned the same slots.

previous: a:AA:0 c:CC:1 b:DD:4
input: a:AA b:DD c:CC
output: a:AA:0 c:CC:1 b:DD:4

Changed MAC address (card replacement)

The card in PCI slot DD has a new MAC address. We consider it a
replacement for the previous card and retain the eth slot.

previous: a:AA:0 c:CC:1 b:DD:4
input: a:AA k:DD c:CC
output: a:AA:0 c:CC:1 k:DD:4

Changed PCI slot (card moved)

Card with MAC address c is found in a different PCI slot. We still
assign its prior network slot to it. This is debatable; we could assign
a new eth slot.

previous: a:AA:0 c:CC:1 b:DD:4
input: a:AA b:DD c:GG
output: a:AA:0 c:GG:1 b:DD:4

New Card

A new card is found. It goes into the first free network slot.

previous: a:AA:0 c:CC:1 b:DD:4
input: a:AA b:DD c:CC e:EE
output: a:AA:0 c:CC:1 e:EE:2 b:DD:4

Shared PCI Slot

We take the MAC address of b:CC to maintain its slot.

previous: a:AA:0 c:CC:1 b:DD:4
input: a:AA b:CC c:CC
output: a:AA:0 c:CC:1 b:CC:4

Remove Card

Card c::CC:4 is removed - but it is remembered.

previous: a:AA:0 c:CC:1 b:BB:4
input: a:AA b: b:BB
output: a:AA:0 c:CC:1 !b:BB:4

New Card

A new card takes a free slot - but it is not slot 4 because it does not
match the previous card that was in that slot.

previous: a:AA:0 c:CC:1 !b:BB:4
input: a:AA b: d:DD c:CC
output: a:AA:0 c:CC:1 !b:BB:4 d:DD:2

Card is Moved

Previously removed card b:BB comes back in a different PCI slot but
retains its old network slot. We match it based on its MAC address.

previous: a:AA:0 c:CC:1 !b:BB:4
input: a:AA b: b:GG c:CC
output: a:AA:0 c:CC:1 b:GG:4

Copy link
Contributor

@changlei-li changlei-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The document is readable and clear. I get the overview of the network device ordering. Though, I have two suggestions:
It's better to mention the static rules
It's better to adopt Christian's scenario examples

position, MAC address, and PCI address are saved for future reference, and its
position will be reserved. This means there may be a gap in the order: a
position that was previously assigned to a network device is now vacant because
the device has been removed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the re-used scenario is actually Replacement.3 on the above section. But it's better to mention here when the reserved position is released and assigned to another device. It affects the Removed and then added back too. If the reserved position is released for the old device by replacement, the old device is not remembered and will be regarded as Newly added when it is added again.

@minglumlu
Copy link
Member Author

Hi @lindig @changlei-li
Thanks for your comments 😃 . I'm updating the doc. Will get back to you once I've the changes.

@minglumlu
Copy link
Member Author

The document is readable and clear. I get the overview of the network device ordering. Though, I have two suggestions: It's better to mention the static rules It's better to adopt Christian's scenario examples

No static rules anymore. This is a legacy concept. Now it is just the user's configuration on the initial order.

@minglumlu
Copy link
Member Author

Hi @lindig @changlei-li
I updated the document. Could you please help to review again? Thank you!

@minglumlu minglumlu requested review from psafont and robhoes March 26, 2025 11:12
Copy link
Member

@robhoes robhoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@@ -0,0 +1,342 @@
---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the file to https://github.com/xapi-project/xen-api/tree/master/doc/content/design

and change the header accordingly

---
title: Host Network Device Ordering on Networkd
layout: default
design_doc: true
revision: 1
status: proposed (9.0)
---

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually prefer this under xcp-networkd directly as a "how it works" doc rather than a design doc.

Copy link
Member

@psafont psafont Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is a proposal, with use-cases and all, not how it works now :/
Having it in the design docs also gives more information, like the version that started having the behaviour

It makes sense to be in the design section as well as being mentioned in the networkd section. (with links to the design, even)

Therefore, the order derived from these values is used solely for determining
the initial order and the order of newly added devices.

Priniciples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Priniciples
Principles

@minglumlu minglumlu force-pushed the private/mingl/host-net-dev-order branch from 7a0bfb1 to 8d96b36 Compare March 27, 2025 04:33
Signed-off-by: Ming Lu <ming.lu@cloud.com>
@minglumlu minglumlu force-pushed the private/mingl/host-net-dev-order branch from 8d96b36 to 1a2f5ff Compare March 27, 2025 04:41
@minglumlu minglumlu merged commit b89ed21 into xapi-project:feature/host-network-device-ordering Mar 27, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants