Respectful Discussion and Suggestions Regarding the esp_eth and the emac driver (IDFGH-16281)

### Checklist

- [x] Checked the issue tracker for similar issues to ensure this is not a duplicate.
- [x] Provided a clear description of your suggestion.
- [x] Included any relevant context or examples.

### Issue or Suggestion Description

Over the past period, I have conducted a more in-depth review of the esp\_eth component and the related network protocol stack source code. Combined with my testing, I have identified the following issues:

## Poor small-packet performance in iperf testing.
When I forcibly set the `MSS` to 90, there were instances where no packets were received for an extended period! Besides, I also used hping for ICMP flood testing and the system was crashed when the packet interval is less than 500 µs.

I believe this issue is primarily caused by overly frequent interrupts. Whether during small-packet reception or under an ICMP flood, interrupts are triggered at a high rate, and each interrupt results in a context switch—an operation with considerable overhead.

On the other hand, our driver does not currently account for scenarios where multiple packets coexist in the EMAC. Regardless of whether it is the DM9051 or another EMAC, edge-triggered mode is invariably used for interrupt configuration. However, when multiple packets coexist in the EMAC, the interrupt pin level will not be pulled low after only one packet has been read. Unfortunately, our driver is currently unable to perform another read in such cases.

```c
  if (emac->int_gpio_num >= 0) {
      gpio_func_sel(emac->int_gpio_num, PIN_FUNC_GPIO);
      gpio_input_enable(emac->int_gpio_num);
      gpio_pulldown_en(emac->int_gpio_num);
      gpio_set_intr_type(emac->int_gpio_num, GPIO_INTR_POSEDGE);
      gpio_intr_enable(emac->int_gpio_num);
      gpio_isr_handler_add(emac->int_gpio_num, dm9051_isr_handler, emac);
  }
```

To address this, I converted the interrupt-driven approach to polling, which alleviated the issue . However, the existing polling implementation still has substantial room for optimization.

With regard to network-performance optimization, **Intel’s DPDK** offers valuable inspiration. Taking into account the characteristics of the ESP32 platform, I believe the following improvements are feasible:

1.  **Replace hard interrupts with polling.** We can poll the interrupt pin level. Notably, all Espressif products released in recent years support `Dedicated GPIO`, which allows specialized instructions to read the interrupt pin levels of multiple network interfaces rapidly, thereby significantly reducing the time cost of polling.

2. **Process multiple packets per wake-up.** Register the receive handlers of all network interfaces in a single queue and assign one task to service them. Each time any interface receives data, it wakes this task, which then iterates over each interface, reading packets until the corresponding interrupt pin level is deasserted. This approach can markedly reduce scheduling overhead and fully utilize the EMAC’s internal SRAM as buffering, mitigating packet loss.

3. **Leverage hardware offloading.** Tasks such as CRC and checksum calculation can be delegated more extensively to the EMAC. Erroneous packets can be dropped before reaching lwIP, preventing unnecessary load. Unfortunately, checksum offload is currently disabled by default. Meanwhile, in IDF’s lwIP configuration, TCP checksums are enabled by default and cannot be disabled via menuconfig; I believe this work can and should be handled by the EMAC.
```c
    /* do not generate checksum for UDP, TCP and IPv4 packets */
    ESP_GOTO_ON_ERROR(dm9051_register_write(emac, DM9051_TCSCR, 0x00), err, TAG, "write TCSCR failed");
    /* disable check sum for receive packets */
    ESP_GOTO_ON_ERROR(dm9051_register_write(emac, DM9051_RCSCSR, 0x00), err, TAG, "write RCSCSR failed");
```
Additionally, for lwIP implementation, platforms like the ESP32-S3 that include SIMD instruction extensions can achieve better performance. For example, using the `VLD` instruction to load 128-bit wide data in a single operation, together with` VCMP` and other extended instructions for rapid wide comparisons and logic operations, can significantly improve system throughput.

## Use of `memcpy` from buffer to `stack_input` in the driver without implementation of zero-copy.
```C
        if (status & ISR_PR) {
            do {
                uint32_t buf_len;
                if (emac->parent.receive(&emac->parent, emac->rx_buffer, &buf_len) == ESP_OK) {
                    /* if there is waiting frame */
                    if (buf_len > 0) {
                        uint8_t *buffer = malloc(buf_len);
                        if (buffer == NULL) {
                            ESP_LOGE(TAG, "no mem for receive buffer");
                        } else {
                            memcpy(buffer, emac->rx_buffer, buf_len);
                            ESP_LOGD(TAG, "receive len=%" PRIu32, buf_len);
                            /* pass the buffer to stack (e.g. TCP/IP layer) */
                            emac->eth->stack_input(emac->eth, buffer, buf_len);
                        }
                    }
                } else {
                    ESP_LOGE(TAG, "frame read from module failed");
                }
            } while (emac->packets_remain);
        }
```
Firstly, through tracing, I found that the emac->parent.receive method is not used anywhere except within the emac_xxx_task function. I honestly do not understand the intended purpose of this method. 

Based on this observation, I have rewritten the emac_ch390_receive method (using ch390 as an example). 
```c
static esp_err_t emac_ch390_receive2(esp_eth_mac_t *mac, uint8_t **buf, uint32_t *length)
{
  // ... some other code
        if (ready & CH390_PKT_RDY) {
            ESP_GOTO_ON_ERROR(ch390_io_memory_read(emac, (uint8_t *) & (rx_header), sizeof(rx_header)),
                              err, TAG, "peek rx header failed");
            *length = (rx_header.length_high << 8) + rx_header.length_low;
            if (rx_header.status & RSR_ERR_MASK) {
                ch390_drop_frame(emac, *length);
                *length = 0;
                return ESP_ERR_INVALID_RESPONSE;
            } else if (*length > ETH_MAX_PACKET_SIZE) {
                /* reset rx memory pointer */
                ESP_GOTO_ON_ERROR(ch390_io_register_write(emac, CH390_MPTRCR, MPTRCR_RST_RX), err, TAG, "reset rx pointer failed");
                return ESP_ERR_INVALID_RESPONSE;
            } else {
                *buf=heap_caps_malloc(*length,MALLOC_CAP_DMA);
                ESP_GOTO_ON_ERROR(ch390_io_memory_read(emac, *buf, *length), err, TAG, "read rx data failed");
                *length -= ETH_CRC_LEN;
                *buf=realloc(*buf, *length);
            }
        } else {
            *length = 0;
        }
        return ESP_OK;
    }
// ... some other code
```
In the emac_ch390_task, I chose to pass the buffer pointer directly to stack_input.
```c
        if (status & ISR_PR) {
            do {
                // if (emac->parent.receive(&emac->parent, emac->rx_buffer, &emac->rx_len) == ESP_OK) {
                //     if (emac->rx_len == 0) {
                //         break;
                //     } else {
                //         ESP_LOGD(TAG, "receive len=%lu", emac->rx_len);

                //         /* allocate memory and check whether allocation failed */
                //         buffer = malloc(emac->rx_len);
                //         if (buffer == NULL) {
                //             ESP_LOGE(TAG, "no memory for receive buffer");
                //             continue;
                //         }

                //         /* pass the buffer to stack (e.g. TCP/IP layer) */
                //         memcpy(buffer, emac->rx_buffer, emac->rx_len);
                //         emac->eth->stack_input(emac->eth, buffer, emac->rx_len);
                //     }
                // } else {
                //     ESP_LOGE(TAG, "frame read from module failed");
                //     break;
                // }

                if (emac_ch390_receive2(&emac->parent, &buffer, &emac->rx_len) == ESP_OK) {
                    if (emac->rx_len == 0) {
                        break;
                    } else {
                        ESP_LOGD(TAG, "receive len=%lu", emac->rx_len);
                        emac->eth->stack_input(emac->eth, buffer, emac->rx_len);
                    }
                } else {
                    ESP_LOGE(TAG, "frame read from module failed");
                    break;
                }
            } while (1);
        }
```

I fully understand that addressing these issues will inevitably require significant modifications to the **esp\_eth** component. **I would like to inquire whether you would be open to accepting a pull request of this.**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Respectful Discussion and Suggestions Regarding the esp_eth and the emac driver (IDFGH-16281) #87

Checklist

Issue or Suggestion Description

Poor small-packet performance in iperf testing.

Use of `memcpy` from buffer to `stack_input` in the driver without implementation of zero-copy.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Respectful Discussion and Suggestions Regarding the esp_eth and the emac driver (IDFGH-16281) #87

Description

Checklist

Issue or Suggestion Description

Poor small-packet performance in iperf testing.

Use of memcpy from buffer to stack_input in the driver without implementation of zero-copy.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Use of `memcpy` from buffer to `stack_input` in the driver without implementation of zero-copy.