Replies: 6 comments 8 replies
-
I'm still getting familiarized with the JSON spec. Here's how I'm making sense of the proposal: I conceptualize the LMT hierarchies as System::Link::TorV::Port::Lane::Offset. The testStep corresponds to the Port scope, because a Port has one common test spec, and Ports are tested sequentially. On the other end of the hierarchy, the lowest level of an LMR measurement has the following raw data: step (with +/- direction), status (response payload[7:6]), error count, sample count. That doesn't align with the MeasurementSeriesElement vs. the validator scheme. I see why we have to interpret it into a "bit errors" measurement. Do we plan to keep the raw data in this output? Somehow, I feel it's more appropriate to assign the lane number, instead of the BDF, to the measurementSeriesId. Then each measurementSeriesElement can represent an offset. Overall, the LMT outputs a lot more details that we need to tuck away in the extension fields. The LMR reported capability parameters may go into the MeasurementSeriesStart.Metadata. We should standardize the format for that. |
Beta Was this translation helpful? Give feedback.
-
Summarizing the proposal provided by Dan via email:
Summarizing the discussion happened on 10/5 meeting with Dan (Google), Hua (Google), Francesco (Meta), Leland (Meta), Adrian (Meta), Sathish (Meta):
Next steps:
|
Beta Was this translation helpful? Give feedback.
-
@sksekar, Adrian and I had a 2.5hr meeting on 2024-02-23. Here are my takeaways from the meeting: The spec we are defining is to facilitate the consumer of the diag output to interpret, to process, and to present the test result. The diag themselves may also play a role in processing and presenting the readings from the hardware. The spec should support a variety of LMT diags and use cases. Therefore, it needs to be expandable and unrestrictive as much as possible. Our approach is to each come up with a spec proposal that suits our existing diags: https://github.com/opencomputeproject/ocp-diag-pci_lmt and https://github.com/google/pcie_lmt. We then resolve any conflict between them and find common ground. We have agreed on a hierarchy of the LMT subjects (LMT is the diag. LMR is the PCI-SIG-specified PCIe feature. LMT conducts LMR.):
At the very basic level, an LMR step margin operation returns three readings from the hardware: status, error_count, sample_count. We should start with mapping those to the output spec. This enables raw measurement collection. Here are a few rule-of-thumb I learned about mapping info to the output spec:
Based on the above takeaways from the meeting, here's my proposal. Let's first review the ideas. The details need to be refined. An LMT subject is mapped to a subcomponent. Here's an example :
The name The "location" parser pattern for "PCIELMT-MARGINPOINT-PCI" is
There are three measurements per a
The value is specified by the PCIe spec: 3:NAK; 2:Margining in progress; 1:Set up for margin in progress. 0:Too many errors
There are another set of raw measurements which are the RX LMR parameters read from the lane. These include Those measurements has the RXLANE subcomponent:
Also at this RXLANE level, the diag can output processed measurements, such as I can't think of anything need to be specified at the TestStep level. One consideration is that the amount of measurement output can be a lot if we require the raw measurements. |
Beta Was this translation helpful? Give feedback.
-
Thanks @mimir-d. Some comments/follow-up
SGTM. My preference would be use
Looking closer, I think we can surface these information as part of Device/Margining Capabilities measurement which is already surfaced per Device.
Yes, that's correct. In our case, user is allowed to select the goal (eye-scan vs spot-check) using the config file provided as input.
Yes, there is no restriction in doing both (eye-scan and spot-check) as different steps. |
Beta Was this translation helpful? Give feedback.
-
As discussed, attached is an output example for our discussion: |
Beta Was this translation helpful? Give feedback.
-
We saw the ocp-diag-core-viewer demo a few weeks ago. With that, I tuned the pcie_lmt OCP output the way I'd like to see as a user. This pcie_lmt_ocp.json is a sample output. I can also demo it from the ocp-diag-core-viewer in the meeting. The pcie_lmt can stream the OCP artifacts to a file or a named pipe. Our use case has a diag-runner which creates this OCP pipe and listens to it. It converts the PCIe-domain BDF info to the DUT-specific HW-Info.This way, the pcie_lmt can stay generic and only PCI-SIG-aware. The diag-runner is also generic in the sense that it can run various diags as a sub-process. The pcie_lmt runs parallel TestSteps, each maps to an RX-port. Within each TestStep, the lanes are also running in parallel. I'm counting on the sorting and filtering features of the result viewers. Instead of dumping all the raw measurements, I now only output what "matters". The raw measurements are still dumped in a log for reference. As a user, I'd like to see the interpreted results upfront. So the pcie_lmt outputs eye size, eye corner margin, BER, and/or status. Irrelevant and/or implied information, such as 0-error margin points in an eye-scan, are omitted. Still, there are more info to fit in a Measurement artifact. I'm overloading the |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Objective
Add support for emitting test results compliant with OCP Test and Validation Output Specification from the PCIe LMT Diagnostic tool.
Background
A sample LMT test run performs:
Current Output Format
CSV format
JSON format
Proposal
Current proposal is to use:
Sample Execution
pci_lmt -o ocp config_file
To Be Discussed
Beta Was this translation helpful? Give feedback.
All reactions