Skip to content

Initial implementation for switch port settings #426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

internet-diglett
Copy link

No description provided.

@sudomateo
Copy link
Collaborator

sudomateo commented Apr 18, 2025

@internet-diglett and I spent a few hours today updating the implementation to fix some issues with serialization and typing as well as chatting about a path forward. There's currently an issue with the nested addresses.addresses.address_lot.id attribute being computed that makes Terraform want to replace the resource every run.

EDIT: See #418 (comment) for more details but schema.*NestedAttribute is what we should be using, not schema.Block.

I'm not exactly sure why that is but perhaps this resource is a good candidate for nested blocks described in #418 rather than the nested arrays of objects.

I tested the following schema locally and did not run into the aforementioned issue.

Blocks: map[string]schema.Block{
    "link": schema.ListNestedBlock{
        NestedObject: schema.NestedBlockObject{
            Attributes: map[string]schema.Attribute{
                "name": schema.StringAttribute{
                    Required: true,
                },
            },
            Blocks: map[string]schema.Block{
                "address": schema.ListNestedBlock{
                    NestedObject: schema.NestedBlockObject{
                        Attributes: map[string]schema.Attribute{
                            "addr": schema.StringAttribute{
                                Required: true,
                            },
                        },
                        Blocks: map[string]schema.Block{
                            "address_lot": schema.SingleNestedBlock{
                                Attributes: map[string]schema.Attribute{
                                    "id": schema.StringAttribute{
                                        Computed: true,
                                    },
                                    "name": schema.StringAttribute{
                                        Required: true,
                                    },
                                },
                            },
                        },
                    },
                },
            },
        },
    },
},

Here's what the configuration ended up looking like.

resource "oxide_switch_port_configuration" "example" {
  name        = "foo"
  description = "Managed by Terraform."

  link {
    name = "foo"
    address {
      addr = "foo"
      address_lot {
        name = "bar"
      }
    }
  }
}

And no more delete and create anew!

> terraform plan
oxide_switch_port_configuration.example: Refreshing state... [id=973beb3d-e658-40c3-8dba-5b3d10ef42c0]

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.

@sudomateo
Copy link
Collaborator

sudomateo commented Apr 18, 2025

I did some more testing and if you change the schema for both addresses from schema.SetNestedAttribute to schema.ListNestedAttribute then there is no longer an issue with plans wanting to replace the nested attributes. I'm not sure what's actually happening under the hood to change that but it works.

I still maintain that we should push for the more modern schema.Block usage but that can be discussed later.

EDIT: See #418 (comment) for more details but schema.*NestedAttribute is what we should be using, not schema.Block.

internet-diglett added a commit to oxidecomputer/omicron that referenced this pull request May 31, 2025
This PR aims to improve the usability of the switch port settings API by
returning data of similar shape and content as what users provide when
the create the switch port settings.

Issues resolved:
* Before this PR, you only need one call to create a complete switch
port settings object, but you have to perform several lookups to get the
complete switch port configuration. We were already returning the object
ids of the nested objects, so it makes a lot of sense to just return the
actual object (especially since the user supplied the entire object when
they configured it)
* The `HashMap<String_of_link_name, T>` pattern also did not give the
expected behavior in our go client generation. Changing this to a
`Vec<T_with_link_name_inside>` matches what the user provides during
creation and also resolves this behavior.
* Some of our `Vec<T>` fields can be empty but must be present. In the
Go client, an empty array is omitted from the json object. We needed to
add a `default` annotation to these fields so Serde handles this
situation correctly.

## Related
- oxidecomputer/oxide.go#278
- oxidecomputer/terraform-provider-oxide#426

Closes #5780
hawkw pushed a commit to oxidecomputer/omicron that referenced this pull request Jun 2, 2025
This PR aims to improve the usability of the switch port settings API by
returning data of similar shape and content as what users provide when
the create the switch port settings.

Issues resolved:
* Before this PR, you only need one call to create a complete switch
port settings object, but you have to perform several lookups to get the
complete switch port configuration. We were already returning the object
ids of the nested objects, so it makes a lot of sense to just return the
actual object (especially since the user supplied the entire object when
they configured it)
* The `HashMap<String_of_link_name, T>` pattern also did not give the
expected behavior in our go client generation. Changing this to a
`Vec<T_with_link_name_inside>` matches what the user provides during
creation and also resolves this behavior.
* Some of our `Vec<T>` fields can be empty but must be present. In the
Go client, an empty array is omitted from the json object. We needed to
add a `default` annotation to these fields so Serde handles this
situation correctly.

## Related
- oxidecomputer/oxide.go#278
- oxidecomputer/terraform-provider-oxide#426

Closes #5780
@internet-diglett internet-diglett marked this pull request as ready for review June 19, 2025 03:54
@internet-diglett internet-diglett changed the title WIP: initial implementation for switch port settings Initial implementation for switch port settings Jun 20, 2025
@sudomateo sudomateo force-pushed the switch-port-settings-resource branch from fbde127 to bf25088 Compare June 27, 2025 19:09
@sudomateo
Copy link
Collaborator

Quick update here. I haven't forgotten about this. A few things came up this week that I was handling first. Will review this next week.

internet-diglett and others added 3 commits July 10, 2025 15:11
This patch made the following changes.

* Renamed the resource from `oxide_switch_port_configuration` to
`oxide_switch_port_settings` to be consistent with the upstream API.

* Changed the schema to reflect the schema from the
`networking_switch_port_settings_create` API. Further
work is required to account for the schema returned by the
`networking_switch_port_settings_view` API.

* Created new functions to convert to and from Terraform models to Oxide
models.
@sudomateo sudomateo force-pushed the switch-port-settings-resource branch from bf25088 to 416afe0 Compare July 11, 2025 01:18
@sudomateo
Copy link
Collaborator

I made the following changes in 416afe0.

  • Renamed the resource from oxide_switch_port_configuration to oxide_switch_port_settings to be consistent with the upstream API.

  • Changed the schema to reflect the schema from the networking_switch_port_settings_create API. Further work is required to account for the schema returned by the networking_switch_port_settings_view API (e.g., nested IDs).

I following Running Omicron (Simulated) and used it to test these changes locally with the following configuration.

resource "oxide_switch_port_settings" "example" {
  name        = "example"
  description = "example"

  addresses = [
    {
      link_name = "phy0"
      addresses = [
        {
          address     = "10.85.0.30/24"
          address_lot = "example"
        },
      ]
    },
  ]

  links = [
    {
      autoneg = false
      fec     = "none"
      lldp = {
        enabled = false
      }
      mtu       = 1500
      link_name = "phy0"
      speed     = "speed1_g"
    },
  ]

  port_config = {
    geometry = "qsfp28x1"
  }

  routes = [
    {
      link_name = "phy0"
      routes = [
        {
          dst = "0.0.0.0/0"
          gw  = "10.85.0.1"
        },
        {
          dst = "1.1.1.1/32"
          gw  = "10.85.0.1"
        },
      ]
    },
  ]
}

However, I received the following error.

oxide_switch_port_settings.example: Creating...
╷
│ Error: Error creating switch port settings
│
│   with oxide_switch_port_settings.example,
│   on main.tf line 16, in resource "oxide_switch_port_settings" "example":
│   16: resource "oxide_switch_port_settings" "example" {
│
│ API error: POST http://127.0.0.1:12220/v1/system/networking/switch-port-settings
│ ----------- RESPONSE -----------
│ Status: 400
│ Message: unable to parse JSON body: addresses[0].addresses[0].address: data did not match any variant of untagged enum IpNet at
│ line 1 column 41
│ RequestID: 321a037c-4ccd-4a86-8c94-581e382a23f9
│ ------- RESPONSE HEADERS -------
│ Content-Type: [application/json]
│ X-Request-Id: [321a037c-4ccd-4a86-8c94-581e382a23f9]
│ Content-Length: [210]
│ Date: [Thu, 10 Jul 2025 22:55:25 GMT]
│
╵

It's likely that a simulated Omicron isn't sufficient to test these changes since that environment does not run the Rack Setup Service (RSS) to initialize the networking settings. I'll pair with @internet-diglett tomorrow since they have environments to test against.

The sheer size of the networking_switch_port_settings_create and networking_switch_port_settings_view APIs make this resource a behemoth to work with. Additionally, the APIs are asymmetric in the sense that the networking_switch_port_settings_create API takes a request body in a nested format (e.g., addresses[].addresses[].address) but the response body and the networking_switch_port_settings_view API return a non-nested format (e.g., addresses[].address). This, coupled with the link_name being adjacent to the array of resources it's managing on create but being nested within each array element on read, means that the Terraform resource needs to perform additional logic to ensure consistency that's not required by other APIs.

@karencfv
Copy link
Collaborator

Oof, looks like more fun with enums and Go 🙃

Looking at that error, I think the API isn't accepting your POST request, it's a 400 so I don't think this would work anywhere, sadly.

unable to parse JSON body: addresses[0].addresses[0].address: data did not match any variant of untagged enum IpNet

I think the API isn't recognising the way you are passing the "10.85.0.30/24" as an IpNet variant. I might be wrong but perhaps you may be missing a ValueString() when constructing NetworkingSwitchPortSettingsCreateParams?

Copy link
Collaborator

@karencfv karencfv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the entire PR but took a quick look at where the error you were encountering might be occurring. I hope this helps!

@sudomateo
Copy link
Collaborator

Thanks Karen! I thought I covered all those oxide.IpNet cases but I was mistaken.

With the changes in 3d7d79a I'm now getting the following error.

oxide_switch_port_settings.example: Creating...
╷
│ Error: Error creating switch port settings
│
│   with oxide_switch_port_settings.example,
│   on main.tf line 16, in resource "oxide_switch_port_settings" "example":
│   16: resource "oxide_switch_port_settings" "example" {
│
│ API error: POST http://127.0.0.1:12220/v1/system/networking/switch-port-settings
│ ----------- RESPONSE -----------
│ Status: 500 Internal
│ Message: Internal Server Error
│ RequestID: 12b35f5a-2255-4244-a5aa-f9c84032fb81
│ ------- RESPONSE HEADERS -------
│ Content-Type: [application/json]
│ X-Request-Id: [12b35f5a-2255-4244-a5aa-f9c84032fb81]
│ Content-Length: [124]
│ Date: [Fri, 11 Jul 2025 13:57:48 GMT]
│
╵

Which the logs tell me is an issue with the serialization for local_pref. Will debug.

{"msg":"request completed","v":0,"name":"omicron-dev","level":30,"time":"2025-07-11T13:57:48.670343242Z","hostname":"ms-ox01","pid":3113,"uri":"/v1/system/networking/switch-port-settings","method":"POST","req_id":"12b35f5a-2255-4244-a5aa-f9c84032fb81","remote_addr":"127.0.0.1:41766","local_addr":"127.0.0.1:12220","component":"dropshot_external","name":"e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c","error_message_external":"Internal Server Error","error_message_internal":"Unknown diesel error creating SwitchPortSettings called \"example\": Error deserializing field 'local_pref': Received more than 2 bytes while decoding an i16. Was an Integer expression accidentally marked as SmallInt?","latency_us":133766,"response_code":500}

@sudomateo
Copy link
Collaborator

I was thinking the local_pref name referred to the attribute under bgp_peers. It doesn't. It actually refers to the rib_priority attribute under routes. The rib_priority attribute used to be called local_pref until it was renamed in oxidecomputer/omicron#6693. The error that's being reported is coming from diesel, which still uses the local_pref name as the name of the column in the database.

In an effort to eliminate uncertainty from the Go SDK and Terraform, I tested against the API directly with the following request body. Note rib_priority is set to 0.

{
  "addresses": [
    {
      "addresses": [
        {
          "address": "10.85.0.30/24",
          "address_lot": "example",
          "vlan_id": 0
        }
      ],
      "link_name": "phy0"
    }
  ],
  "bgp_peers": [],
  "description": "example",
  "groups": [],
  "links": [
    {
      "autoneg": false,
      "fec": "none",
      "link_name": "phy0",
      "lldp": {
        "enabled": false
      },
      "mtu": 1500,
      "speed": "speed1_g",
      "tx_eq": {}
    }
  ],
  "name": "example",
  "port_config": {
    "geometry": "qsfp28x1"
  },
  "routes": [
    {
      "link_name": "phy0",
      "routes": [
        {
          "dst": "0.0.0.0/0",
          "gw": "10.85.0.1",
          "rib_priority": 0
        }
      ]
    }
  ]
}

I still received an internal server error.

$ curl --request POST --header "Authorization: Bearer $OXIDE_TOKEN" --header "Content-Type: application/json" $OXIDE_HOST/v1/system/networking/switch-port-settings --data @/tmp/settings.json
{
  "request_id": "6fdee63a-3630-404b-8d37-335ffcedc38b",
  "error_code": "Internal",
  "message": "Internal Server Error"
}

With the same error as before.

{"msg":"request completed","v":0,"name":"omicron-dev","level":30,"time":"2025-07-11T15:39:23.279978169Z","hostname":"ms-ox01","pid":43902,"uri":"/v1/system/networking/switch-port-settings","method":"POST","req_id":"6fdee63a-3630-404b-8d37-335ffcedc38b","remote_addr":"127.0.0.1:48642","local_addr":"127.0.0.1:12220","component":"dropshot_external","name":"e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c","error_message_external":"Internal Server Error","error_message_internal":"Unknown diesel error creating SwitchPortSettings called \"example\": Error deserializing field 'local_pref': Received more than 2 bytes while decoding an i16. Was an Integer expression accidentally marked as SmallInt?","latency_us":129375,"response_code":500}

I then removed rib_priority from the request body.

diff --git a/tmp/settings.json b/tmp/settings.json
index de538ff54..ce48cdd43 100644
--- a/tmp/settings.json
+++ b/tmp/settings.json
@@ -38,7 +38,6 @@
         {
           "dst": "0.0.0.0/0",
           "gw": "10.85.0.1",
-          "rib_priority": 0
         }
       ]
     }

Now the error is AddressLot not found, which I believe is expected since my simulated setup does not have a true networking stack configured.

$ curl --request POST --header "Authorization: Bearer $OXIDE_TOKEN" --header "Content-Type: application/json" $OXIDE_HOST/v1/system/networking/switch-port-settings --data @/tmp/settings.json
{
  "request_id": "bb5dfa11-f570-4022-af91-4ec3085440d9",
  "error_code": "InvalidRequest",
  "message": "AddressLot not found"
}

I then pushed aaa884d to allow setting rib_priority and vid to nil and Terraform is now reporting the same error when rib_priority is omitted from the configuration.

╷
│ Error: Error creating switch port settings
│
│   with oxide_switch_port_settings.example,
│   on main.tf line 16, in resource "oxide_switch_port_settings" "example":
│   16: resource "oxide_switch_port_settings" "example" {
│
│ API error: POST http://127.0.0.1:12220/v1/system/networking/switch-port-settings
│ ----------- RESPONSE -----------
│ Status: 400 InvalidRequest
│ Message: AddressLot not found
│ RequestID: 67fcc057-acce-4a41-ae2e-b051f90e1e28
│ ------- RESPONSE HEADERS -------
│ Content-Length: [129]
│ Date: [Fri, 11 Jul 2025 15:50:36 GMT]
│ Content-Type: [application/json]
│ X-Request-Id: [67fcc057-acce-4a41-ae2e-b051f90e1e28]
│
╵

If I'm interpreting all this correctly then somewhere in Omicron rib_priority's Option<u8> is becoming larger than diesel's Int2.

sudomateo added a commit to oxidecomputer/omicron that referenced this pull request Jul 12, 2025
I renamed the `local_pref` column to `rib_priority` to complete the
rename in #6693.

I also changed the type of the renamed column from `INT8` to
`INT2`, clamping existing values to `0` or `255`. This was missed
in #6693 and led to the
following error when reading the value from the database.

```
{"msg":"request completed","v":0,"name":"omicron-dev","level":30,"time":"2025-07-11T13:57:48.670343242Z","hostname":"ms-ox01","pid":3113,"uri":"/v1/system/networking/switch-port-settings","method":"POST","req_id":"12b35f5a-2255-4244-a5aa-f9c84032fb81","remote_addr":"127.0.0.1:41766","local_addr":"127.0.0.1:12220","component":"dropshot_external","name":"e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c","error_message_external":"Internal Server Error","error_message_internal":"Unknown diesel error creating SwitchPortSettings called \"example\": Error deserializing field 'local_pref': Received more than 2 bytes while decoding an i16. Was an Integer expression accidentally marked as SmallInt?","latency_us":133766,"response_code":500}
```

The error occurred because the Rust type was `SqlU8` and the database
type was `INT8`. Reads would fail because `INT8` columns could not be
read into `SqlU8` types without loss of precision.

This was caught in
oxidecomputer/terraform-provider-oxide#426
when implementing a Terraform provider resource for switch
port settings. It's likely that this has been broken since
#6693 and any user that
attempted to set `rib_priority` in their Rack Setup Service (RSS) would
have encountered this error. However, `rib_priority` is an uncommon
configuration option and none of our customer's RSS configurations show
this as being set.

Given this information, it seems safe to assume that no customer has the
`rib_priority` column set so the clamping logic implemented here should
work well for customer installations.
@sudomateo
Copy link
Collaborator

I submitted oxidecomputer/omicron#8587 to address the issue with rib_priority in Omicron. With the changes in aaa884d I'm not blocked and can proceed with testing this against an environment with proper networking configuration. I'll sync with @internet-diglett to do this on Monday.

sudomateo added a commit to oxidecomputer/omicron that referenced this pull request Jul 12, 2025
I renamed the `local_pref` column to `rib_priority` to complete the
rename in #6693.

I also changed the type of the renamed column from `INT8` to
`INT2`, clamping existing values to `0` or `255`. This was missed
in #6693 and led to the
following error when reading the value from the database.

```
{"msg":"request completed","v":0,"name":"omicron-dev","level":30,"time":"2025-07-11T13:57:48.670343242Z","hostname":"ms-ox01","pid":3113,"uri":"/v1/system/networking/switch-port-settings","method":"POST","req_id":"12b35f5a-2255-4244-a5aa-f9c84032fb81","remote_addr":"127.0.0.1:41766","local_addr":"127.0.0.1:12220","component":"dropshot_external","name":"e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c","error_message_external":"Internal Server Error","error_message_internal":"Unknown diesel error creating SwitchPortSettings called \"example\": Error deserializing field 'local_pref': Received more than 2 bytes while decoding an i16. Was an Integer expression accidentally marked as SmallInt?","latency_us":133766,"response_code":500}
```

The error occurred because the Rust type was `SqlU8` and the database
type was `INT8`. Reads would fail because `INT8` columns could not be
read into `SqlU8` types without loss of precision.

This was caught in
oxidecomputer/terraform-provider-oxide#426
when implementing a Terraform provider resource for switch
port settings. It's likely that this has been broken since
#6693 and any user that
attempted to set `rib_priority` in their Rack Setup Service (RSS) would
have encountered this error. However, `rib_priority` is an uncommon
configuration option and none of our customer's RSS configurations show
this as being set.

Given this information, it seems safe to assume that no customer has the
`rib_priority` column set so the clamping logic implemented here should
work well for customer installations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants