Computing is about communicating. Some would also say about networking. Digital independence tags along on the wave of "Recommendations and Roadmap for European Sovereignty in open source HW, SW and RISC-V Technologies (2021)", calling for the development of critical open source IP blocks, such as PCIE Root Complex (RC)
. This is the first step in that direction.
Our project aims to open Artix7 PCIe Gen2 RC IP blocks for use outside of proprietary tool flows. While still reliant on Xilinx Series7 Hard Macros (HMs), it will surround them with open-source soft logic for PIO accesses — The RTL
and, even more importantly, the layered sofware Driver with Demo App
.
All that with full HW/SW opensource co-sim
. Augmented with a rock-solid openBackplane
in the basement of our hardware solution, the geek community will thus get all it takes for building their own, end-to-end openCompute systems.
The project‘s immediate goal is to empower the makers with ability to drive PCIE-based peripherals from their own soft RISC-V SOCs.
Given that the PCIE End-Point (EP) with DMA is already available in opensource, the opensource PCIE peripherals do exist for Artix7. Except that they are always, without exception, controlled by the proprietary RC on the motherboard side, typically in the form of RaspberryPi ASIC, or x86 PC. This project intends to change that status quo.
Our long-term goal is to set the stage for the development of full opensource PCIE stack, gradually phasing out Xilinx HMs from the solution. That’s a long, ambitious track, esp. when it comes to mixed-signal SerDes and high-quality PLLs. We therefore anticipate a series of follow on projects that would build on the foundations we hereby set.
This first phase is about implementing an open source PCIE Root Complex (RC) for Artix7 FPGA, utilizing Xilinx Series7 PCIE HM and GTP IP blocks, along with their low-jitter PLL.
- PCIE Primer by Simon Southwell
Almost all consumer PCIE installations have the RC chip soldered down on the motherboard, typically embodied in the CPU or "North Bridge" ASIC, where PCIE connectors are used solely for the EP cards. Similarly, all FPGA boards on the market are designed for EP applications. As such, they expect clock, reset and a few other signals from the infrastructure. It is only the professional and military-grade electronics that may have both RC and EP functions on add-on cards, with a backplane or mid-plane connecting them (see VPX chassis, or VITA 46.4).
This dev activity is about creating the minimal PCIE infrastructure necessary for using a plethora of ready-made FPGA EP cards as a Root Complex. This infrastructure takes the physical form of a mini backplane that provides the necessary PCIE context similarly to what a typical motherboard would give, but without a soldered-down RC chip that would be conflicting with our own FPGA RC node.
Such approach is less work and less risk than to design our own PCIE motherboard, with a large FPGA on it. But, it is also a task that we did not appreciate from the get-go. In a bit of a surprise, half-way through planning, we've realized that a suitable, ready-made backplane was not available on the market. This initial disappointment then turned into excitement knowing that this new outcome would make the project even more attractive / more valuable for the community... esp. when Envox.eu has agreed to step in and help. They will take on the PCIE backplane PCB development activity.
- Create requirements document.
- Select components. Schematic and PCB layout design.
- Review and iterate design to ensure robust operation at 5GHz, possibly using openEMS for simulation of high-speed traces.
- Manufacture prototype. Debug and bringup, using AMD-proprietary on-chip IBERT IP core to assess Signal Integrity.
- Produce second batch that includes all improvements. Distribute it, and release design files with full documentation.
- Procure FPGA development boards and PCIE accessories.
- Put together a prototype system. Bring it up using proprietary RTL IP, proprietary SW Driver, TestApp and Vivado toolchain.
- HW development of opensource RTL that mimics the functionality of PCIE RC proprietary solution.
- SW development of opensource driver for the PCIE RC HW function. This may, or may not be done within Linux framework.
- Design SOC based on RISC-V CPU with PCIE RC as its main peripheral.
This dev activity is significantly beefed up compared to our original plan, which was to use a much simpler PCIE EP BFM, and non-SOC sim framework. While that would have reduced the time and effort spent on the sim, prompted by NLnet astute questions, we're happy to announce that wyvernSemi is now also onboard!
Their VProc can be used not only to faithfully model the RISC-V CPU and SW interactions with HW, but it also comes with an implementation of the PCIE model. The PCIE model has some EP capabilities with a configurtable configurations space, which can be paired in sim with our RC RTL design. Moreover, the existence of both RC and EP models paves the way for future plug-and-play, pick-and-choose opensource sims of the entire PCIE subsystem.
With the full end-to-end simulation thus in place, we hope that the need for hardware debugging, using ChipScope, expensive test equipment and PCIE protocol analyzers would be alleviated.
- Extension of the existing PCIE RC model for some additional configurability of the EP capabilities.
- Testbench development and build up. Execution and debug of sim testcases.
- Documentation of EP model, TB and sim environment, with objectives to make it all simple enough to pickup, adapt and deploy in other projects.
- One-by-one replace proprietary design elements from PART2.b with our opensource versions (except for Vivado and TestApp). Test it along the way, fixing problems as they occur.
- Develop our opensource PIO TestApp software and representative Demo.
- Build design with openXC7, reporting issues and working with developers to fix them, possibly also trying ScalePNR flow.
Given that PCIE is an advanced, high-speed design, and our accute awareness of nextpnr-xilinx and openXC7 shortcomings, we expect to run into showstoppers on the timing closure front. We therefore hope that the upcoming ScalePNR flow will be ready for heavy-duty testing within this project.
- Basic PCIE EP for LiteFury
- Regymm PCIE
- LiteX PCIE EP
- PCIE EP DMA - Wupper
- Xilinx UG477 - 7Series Integrated Block PCIe
- XIlinx DS821 - 7series_PCIE Datasheet
- Xapp1052 - BusMaster DMA for EP
The hardware platform for this project is the SQRL Acorn CLE-215+, a versatile FPGA development board. Although originally designed as a crypto-accelerator, its powerful Artix-7 FPGA and modular design make it an excellent choice for general-purpose PCIe development.
The system consists of two main components:
- M.2 FPGA Module (Acorn CLE-215+): This is the core of the system, a compact board in an M.2 form factor. It houses the Xilinx Artix-7 XC7A200T FPGA and is designed to be plugged into a standard M.2 M-key slot.
(a) M.2 FPGA Module (Top View)![]() |
(b) M.2 FPGA Module (Bottom View)![]() |
- PCIe Adapter Board (Acorn Baseboard Mini): A carrier board that holds the M.2 FPGA module. Its primary function is to adapt the M.2 interface to a standard PCIe x4 edge connector, allowing the entire assembly to be installed and tested in a regular PC motherboard slot.
(a) PCIe Adapter Board (Top View)![]() |
(b) PCIe Adapter Board (Bottom View)![]() |
The fully assembled Acorn CLE-215+ development board, ready for use in a PCIe slot.![]() |
It is important to note that the Acorn CLE-215+ is functionally identical to the more widely known NiteFury board, with the primary difference being the amount of onboard memory. The Acorn model features 1 GB of DDR3 RAM, while the standard NiteFury has 512 MB. Therefore, the NiteFury schematic serves as a direct and accurate reference for the board's hardware layout.
The central component of the SQRL Acorn CLE-215+ system is the Xilinx Artix-7 XC7A200T-FBG484 chip. This FPGA is crucial for implementing the PCIe Endpoint functionality, possessing a range of features that make it highly suitable for this purpose.
The key specifications are summarized below:
Specification | Value |
---|---|
Family | Xilinx Artix-7 |
Speed Grade | -3 |
Logic Cells (LUT4-Equivalent)¹ | 215,360 |
LUT6 | 134,600 |
Flip-Flops | 269,200 |
Block RAM | 13 Mbit |
DSP Slices | 740 |
GTP Transceivers | 4 (up to 6.6 Gbit/s) |
DDR3 SDRAM (Board) | 1 GB, 16-bit |
QSPI Flash (Board) | 32 MB |
¹ The 'Logic Cells' count is a Xilinx metric derived from the physical 6-input LUTs to provide an estimated equivalent in simpler 4-input LUTs for comparison purposes. The number of physical LUTs and other resources are the exact counts for the XC7A200T chip.
Properly programming and operating the Artix-7 FPGA on the SQRL board required two key hardware modifications.
The JTAG connector on the Acorn CLE-215+ is non-standard and not directly compatible with the standard 14-pin connector on the Xilinx Platform Cable. A custom adapter cable is therefore required.
Custom JTAG Cable connecting the Xilinx Programmer to the board![]() |
JTAG Connector Pinout on the Board![]() |
The connector on the board is a Molex Pico-Lock 1.50mm pitch male header. This is not a standard 2.54mm or 2.00mm header, so standard DuPont-style cables will not fit.
To simplify making the cable, we highly recommend purchasing a pre-assembled cable with the correct female connector.
- Recommended Part: Molex 0369200603 on Digi-Key
This cable has the correct female connector on both ends. The easiest method is to cut the cable in half, which gives you two connector cables with open ends. You can then splice one of these cable ends onto the wires of your Xilinx programmer cable, matching the signals according to the following wiring diagram.
JTAG Connection Guide: Physical Pinout and Wiring Diagram.
The board cannot be programmed or operated solely from the PCIe/M.2 slot power. It requires an external 12V supply to function correctly, especially when complex designs and high-speed transceivers are active. Power is provided via a standard 6-pin PCIe power connector from an ATX power supply.
External 12V power connection.
The complete system, including the custom cabling, is mounted in a test PC chassis for verification.
The complete FPGA system mounted in a PCIe slot.
After the hardware was prepared, the connection was verified using the Vivado Hardware Manager. As shown below, the tool successfully detected the JTAG programmer and identified the xc7a200t_0
FPGA chip. This confirms that the physical connections are correct and the board is ready for programming.
Successful device detection in Vivado Hardware Manager.
The openpcue2-rc test bench aims to have a flexible approach to simulation which allows a common test environment to be used whilst selecting between alternative CPU components, one of which uses the VProc virtual processor co-simulation element. This allows simulations to be fully HDL, with a RISC-V processor RTL implementation such as picoRV32, IBEX or EDUBOS5, or to co-simulate software using the virtual processor, with a significant speed up in simulation times. The test bench has the following features:
- A VProc virtual processor based
soc_cpu.VPROC
component- Selectable between this or an RTL softcore
- Can run natively compiled test code
- Can run the application compiled natively with the auto-generated co-sim HAL
- Can run RISC-V compiled code using the rv32 RISC-V ISS model
- The pcieVHost VIP is used to drive the logic's PCIe link
- Uses a C sparse memory model
- An HDL component instantiated in logic gives logic access to this memory
- An API is provided to VProc running code for direct access from the pcieVHost software, which implements this sparse memory C model.
The figure below shows an oveview block diagram of the test bench HDL.
More details on the architecture and usage of the Wireguard test bench can be found in the README.md in the 5.sim
directory.
The Wireguard control and status register harware abstraction layer (HAL) software is auto-generated, as is the CSR RTL, using peakrdl
. For co-simulation purposes an additional layer is auto-generated from the same SystemRDL specification using systemrdl-compiler
that accompanies the peakrdl
tools. This produces two header files that define a common API to the application layer for both the RISC-V platform and the VProc based co-simulation verification environment. The details of the HAL generation can be found in the README.md in the 4.build/
directory.
More details of the test bench, the pcievhost component and its usage can be found in the 5.sim/README.md file.
- WIP
The design was implemented using the Xilinx Vivado Design Suite. The process follows a standard but critical workflow to ensure a functional PCIe Endpoint.
1. PCIe IP Core Generation
The foundation of the design is the PCIe Endpoint core, created using the Vivado IP Generator. This powerful tool abstracts the immense complexity of the PCIe protocol. Within the generator, all fundamental parameters are configured:
- Link settings (e.g., Lane Width, Max Speed).
- Device identifiers (Vendor ID, Device ID, Class Code).
- Base Address Register (BAR) memory space requirements.
2. Custom RTL Application Logic (Wrapper)
The generated IP core functions as a "black box" with a standard AXI4-Stream interface. To bring it to life, a custom RTL module (Verilog wrapper) was developed. This application logic is responsible for:
- Parsing incoming TLP packets from the host (e.g., Memory Read/Write requests).
- Handling the actual data access to the FPGA's internal Block RAM.
- Constructing and sending
Completion
TLP packets back to the host in response to read requests.
3. Physical Constraints (XDC File)
To map the logical design onto the physical FPGA chip, a manual XDC (Xilinx Design Constraints) file is crucial. This file is not automatically generated and serves as the bridge between RTL and the physical world. It must define:
- The precise pin locations on the FPGA for the PCIe differential pairs (TX/RX lanes).
- The pin location and timing characteristics of the reference clock.
- The location of the system reset signal.
After programming the FPGA with the generated bitstream, the system was tested in a real-world environment to verify its functionality. The verification process was conducted in three main stages.
The first and most fundamental test was to confirm that the host operating system could correctly detect and enumerate the FPGA as a PCIe device. This was successfully verified on both Windows and Linux.
- On Windows, the device appeared in the Device Manager, confirming that the system recognized the new hardware.
- On Linux, the
lspci
command was used to list all devices on the PCIe bus. The output clearly showed the Xilinx card with the correct Vendor and Device IDs, classified as a "Memory controller".
Device detected in Windows Device Manager![]() |
`lspci` output on Linux, identifying the device.![]() |
While enumeration confirms device presence, directly testing read/write functionality required an isolated environment to prevent conflicts with the host OS. A Virtual Machine (VM) with PCI Passthrough was configured for this purpose.
This step was non-trivial due to a common hardware issue: IOMMU grouping. The standard Linux kernel grouped our FPGA card with other critical system devices (like USB and SATA controllers), making it unsafe to pass it through directly.
The solution involved a multi-step configuration of the host system:
1. BIOS/UEFI Configuration
The first step was to enable hardware virtualization support in the system's BIOS/UEFI:
- AMD-V (SVM - Secure Virtual Machine Mode): This option enables the core CPU virtualization extensions necessary for KVM.
- IOMMU (Input-Output Memory Management Unit): This is critical for securely isolating device memory. Enabling it is a prerequisite for VFIO and safe PCI passthrough.
2. Host OS Kernel and Boot Configuration
A standard Linux kernel was not sufficient due to the IOMMU grouping issue. To resolve this, the following steps were taken:
- Install XanMod Kernel: A custom kernel, XanMod, was installed because it includes the necessary ACS Override patch. This patch forces the kernel to break up problematic IOMMU groups.
- Modify GRUB Boot Parameters: The kernel's bootloader (GRUB) was configured to activate all required features on startup. The following parameters were added to the
GRUB_CMDLINE_LINUX_DEFAULT
line:amd_iommu=on
: Explicitly enables the IOMMU on AMD systems.pcie_acs_override=downstream,multifunction
: Activates the ACS patch to resolve the grouping problem.vfio-pci.ids=10ee:7014
: This crucial parameter instructs the VFIO driver to automatically claim our Xilinx device (Vendor ID10ee
, Device ID7014
) at boot, effectively hiding it from the host OS.
3. KVM Virtual Machine Setup
With the host system properly prepared, the final step was to assign the device to a KVM virtual machine using virt-manager
. Thanks to the correct VFIO configuration, the Xilinx card appeared as an available "PCI Host Device" and was successfully passed through.
This setup created a safe and controlled environment to perform direct, low-level memory operations on the FPGA without risking host system instability.
With the FPGA passed through to the VM, the final test was to verify the end-to-end communication path. This was done using the devmem2
utility to perform direct PIO (Programmed I/O) on the memory space mapped by the card's BAR0 register.
The process was simple and effective:
- The base physical address of BAR0 (e.g.,
fc500000
) was identified usinglspci -v
. - A test value (
0xB
) was written to this base address. - The same address was immediately read back.
The successful readback of the value 0xB
confirms that the entire communication chain is functional: from the user-space application, through the OS kernel and PCIe fabric, to the FPGA's internal memory and back.
- PCIE Utils
- Debug PCIE issues using 'lspci' and 'setpci'
- Using bysybox (devmem) for register access
- PCIE Sniffing
- Stark 75T Card
- ngpscope
- PCI Leech
- PCI Leech/ZDMA
- LiteX PCIE Screamer
- LiteX PCIE Analyzer
- Wireshark PCIe Dissector
- PCIe Tool Hunt
- PCIe network simulator
- An interesting PCIE tidbit: Peer-to-Peer communicaton. Also see this
- NetTLP - An invasive method for intercepting PCIE TLPs
We are grateful to NLnet Foundation for their sponsorship of this development activity.
The wyvernSemi's wisdom and contribution made a great deal of difference -- Thank you, we are honored to have you on the project.
The Envox, our next-door buddy, is responsible for the birth of our backplane, which we like to call BB (not to be mistaked for their gorgeous blue beauty BB3 🙂)