Skip to content

Bluetooth: Controller: Fix conn timeout due to delayed conn upd ind #92572

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

cvinayak
Copy link
Contributor

@cvinayak cvinayak commented Jul 3, 2025

Add connection parameter update to throughput test to cover
any connection timeout performing connection update while
subjected to high throughput scenarios.

Unlike legacy Zephyr Controller's double headed Tx queue
linked list, the new split Zephyr Controller's new LLCP does
not place a control PDU at the head of the Tx queue causing
control PDUs to be delayed on air.

Pause the data enqueue until pending PDUs are transmitted
across before enqueuing LE Connection Update Ind PDU, and
thereafter resume other PDUs being enqueued again.

There can still be connection timeout if enqueued PDUs are
not being transmitted/acked during repeated retransmissions.

Below steps will be used to ensure OTA DFU with high throughput and connection updates is handled gracefully, without connection timeout or assertions in used HCI Controller or Peripheral Device under test:

# Build and flash uart_hci onto nRF52840-DK
west build -p auto -d build_uart_hci/ --board nrf52840dk/nrf52840 samples/bluetooth/hci_uart
west flash -d build_uart_hci

# Build device firmware A
west build -p auto -d build_A --board nrf54l15dk/nrf54l15/cpuapp samples/subsys/mgmt/mcumgr/smp_svr/ -DCONFIG_DEBUG=y -DCONFIG_MCUMGR_TRANSPORT_BT_PERM_RW=y -DCONFIG_BT_DEVICE_NAME=\"firmwareA\" -DEXTRA_CONF_FILE="overlay-bt.conf" --sysbuild
# Build and flash device firmware B onto nRF54l15-DK
west build -p auto -d build_B --board nrf54l15dk/nrf54l15/cpuapp samples/subsys/mgmt/mcumgr/smp_svr/ -DCONFIG_DEBUG=y -DCONFIG_MCUMGR_TRANSPORT_BT_PERM_RW=y -DCONFIG_BT_DEVICE_NAME=\"firmwareB\" -DEXTRA_CONF_FILE="overlay-bt.conf" --sysbuild
west flash -d build_B

# Attach HCI
sudo btattach -S 1000000 -B /dev/ttyACM1 &
# Start FOTA to firmware A
sudo mcumgr --conntype ble --hci 1 --timeout 60 --connstring peer_name=\"firmwareA\" image upload build_A/smp_svr/zephyr/zephyr.bin
# Start FOTA to firmware B
sudo mcumgr --conntype ble --hci 1 --timeout 60 --connstring peer_name=\"firmwareB\" image upload build_B/smp_svr/zephyr/zephyr.bin

@cvinayak cvinayak force-pushed the github_test_conn_update branch from f3ac945 to f4d05b7 Compare July 3, 2025 04:47
@cvinayak cvinayak changed the title tests: bsim: Bluetooth: Add param update to throughput test Bluetooth: Controller: Fix conn timeout due to delayed conn upd ind Jul 3, 2025
@cvinayak cvinayak marked this pull request as ready for review July 3, 2025 05:07
Comment on lines +20 to +21
/* Count down number of metrics intervals before performing a param update */
#define PARAM_UPDATE_COUNTDOWN PHY_UPDATE_COUNTDOWN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this feels like even more test code that isn't related to GATT :( Would prefer that we test things like this in BSIM, instead of polluting a GATT sample with PHY tests (personal opinion)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be related to GATT but a typical user application requiring a GATT Write without Response will perform both PHY update and Connection Update to tune the throughput. In that sense, is a sample that demonstrate GATT Write without Response, PHY Update and Connection Update (there are no samples for these procedures currently in the upstream repo!).

@cvinayak cvinayak force-pushed the github_test_conn_update branch 3 times, most recently from 2750d88 to 8e7d7e8 Compare July 4, 2025 12:17
cvinayak added 4 commits July 5, 2025 05:59
Remove duplicate define CONN_DATA_BUFFERS and use
CONFIG_BT_BUF_ACL_TX_COUNT instead.

Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
Add connection parameter update to throughput test to cover
any connection timeout performing connection update while
subjected to high throughput scenarios.

Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
Make the throughput test to fail when increasing Tx buffers
that will delay a LE Connection Ind PDU from being
transmitted on air due to previously queued Data PDUs.

Unlike legacy Zephyr Controller's double headed Tx queue
linked list, the new split Zephyr Controller's new LLCP does
not place a control PDU at the head of the queue causing
control PDUs to be delayed on air.

Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
Unlike legacy Zephyr Controller's double headed Tx queue
linked list, the new split Zephyr Controller's new LLCP does
not place a control PDU at the head of the Tx queue causing
control PDUs to be delayed on air.

Pause the data enqueue until pending PDUs are transmitted
across before enqueuing LE Connection Update Ind PDU, and
thereafter resume other PDUs being enqueued again.

There can still be connection timeout if enqueued PDUs are
not being transmitted/acked during repeated retransmissions.

Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
@cvinayak cvinayak force-pushed the github_test_conn_update branch from 8e7d7e8 to a3347a3 Compare July 5, 2025 04:06
Copy link

sonarqubecloud bot commented Jul 5, 2025

@cvinayak cvinayak requested a review from Copilot July 7, 2025 07:15
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds mechanisms to pause data ACL transmissions while waiting for LLCP Connection Update PDUs on-air, preventing connection timeouts under high throughput. It also adjusts test buffer sizes for bsim throughput tests and extends the central GATT write sample to exercise connection parameter updates.

  • Increased BT event and ACL TX buffer counts in throughput test config
  • Introduced LLCP_TX_QUEUE_PAUSE_DATA_CONN_UPD, pause/resume logic in the controller, and exposed a TX enqueue count API
  • Refactored FIFO definitions to use CONFIG_BT_BUF_ACL_TX_COUNT and updated the sample app to periodically send connection parameter updates

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/bsim/bluetooth/ll/throughput/prj.conf Increased BT HCI RX event and ACL TX buffer counts for high-throughput tests
subsys/bluetooth/controller/ll_sw/ull_llcp_internal.h Added new pause mask LLCP_TX_QUEUE_PAUSE_DATA_CONN_UPD
subsys/bluetooth/controller/ll_sw/ull_llcp_conn_upd.c Paused data enqueue until pending PDUs drain and resumed after sending Connection Update Ind
subsys/bluetooth/controller/ll_sw/ull_conn_internal.h Exposed ull_conn_tx_enqueue_count_get to get current TX queue length
subsys/bluetooth/controller/ll_sw/ull_conn.c Refactored MFIFO usage to use CONFIG_BT_BUF_ACL_TX_COUNT and implemented enqueue count
samples/bluetooth/central_gatt_write/src/gatt_write_common.c Added scheduling of periodic connection parameter updates in the sample app
Comments suppressed due to low confidence (4)

subsys/bluetooth/controller/ll_sw/ull_llcp_conn_upd.c:441

  • The new pause/resume logic around connection updates isn't covered by existing tests; consider adding a bsim test to validate that control PDUs are sent promptly and data resumes correctly under high throughput.
	if (llcp_lr_ispaused(conn) || !llcp_tx_alloc_peek(conn, ctx) ||

samples/bluetooth/central_gatt_write/src/gatt_write_common.c:21

  • [nitpick] The macro PARAM_UPDATE_COUNTDOWN is very similar to the variable param_update_countdown, which may cause confusion; consider using a more distinct name or referencing PHY_UPDATE_COUNTDOWN directly where needed.
#define PARAM_UPDATE_COUNTDOWN PHY_UPDATE_COUNTDOWN

tests/bsim/bluetooth/ll/throughput/prj.conf:27

  • [nitpick] Please add a comment explaining why the RX event buffer count was increased (e.g., to avoid HCI timeouts under high throughput).
CONFIG_BT_BUF_EVT_RX_COUNT=7

tests/bsim/bluetooth/ll/throughput/prj.conf:35

  • [nitpick] Similarly, a brief note on why the ACL TX buffer count is bumped would help future maintainers understand this change.
CONFIG_BT_BUF_ACL_TX_COUNT=6

Comment on lines 1478 to +1493
}
}

uint16_t ull_conn_tx_enqueue_count_get(struct ll_conn *conn)
{
uint16_t count = 0;
memq_link_t *curr;

curr = memq_peek(conn->lll.memq_tx.head, conn->lll.memq_tx.tail, NULL);
while (curr != NULL) {
count++;

curr = memq_peek(curr->next, conn->lll.memq_tx.tail, NULL);
}

return count;
Copy link
Preview

Copilot AI Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function iterates through the entire TX queue on each call (O(n) complexity); consider maintaining a counter that tracks the queue length to avoid walking the list every time.

Suggested change
}
}
uint16_t ull_conn_tx_enqueue_count_get(struct ll_conn *conn)
{
uint16_t count = 0;
memq_link_t *curr;
curr = memq_peek(conn->lll.memq_tx.head, conn->lll.memq_tx.tail, NULL);
while (curr != NULL) {
count++;
curr = memq_peek(curr->next, conn->lll.memq_tx.tail, NULL);
}
return count;
conn->tx_queue_count++;
}
}
uint16_t ull_conn_tx_enqueue_count_get(struct ll_conn *conn)
{
return conn->tx_queue_count;

Copilot uses AI. Check for mistakes.

@@ -448,6 +450,7 @@ static void lp_cu_send_conn_update_ind(struct ll_conn *conn, struct proc_ctx *ct
ctx->state = LP_CU_STATE_WAIT_NTF_AVAIL;
} else {
lp_cu_send_conn_update_ind_finalize(conn, ctx, evt, param);
llcp_tx_resume_data(conn, LLCP_TX_QUEUE_PAUSE_DATA_CONN_UPD);
Copy link
Preview

Copilot AI Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The resume call is duplicated in multiple branches; consider centralizing resume logic (e.g., in lp_cu_send_conn_update_ind_finalize) to reduce code duplication and ease maintenance.

Suggested change
llcp_tx_resume_data(conn, LLCP_TX_QUEUE_PAUSE_DATA_CONN_UPD);

Copilot uses AI. Check for mistakes.

Copy link
Contributor

@Thalley Thalley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me, but with a few minor comments

Comment on lines +186 to +194
.interval_min = 0x0029,
.interval_max = 0x0029,
.latency = 0,
.timeout = 31,
}, {
.interval_min = 0x0028,
.interval_max = 0x0028,
.latency = 0,
.timeout = 30,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using the BT_GAP_MS_TO_CONN_INTERVAL or BT_GAP_US_TO_CONN_INTERVAL and BT_GAP_MS_TO_CONN_TIMEOUT or BT_GAP_US_TO_CONN_TIMEOUT for more human-readable values

Comment on lines +205 to +206

err = bt_conn_le_param_update(conn, &update_params[param_update_count & 0x1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to add something like

Suggested change
err = bt_conn_le_param_update(conn, &update_params[param_update_count & 0x1]);
/* Switch between the 2 update_params based on param_update_count */
err = bt_conn_le_param_update(conn, &update_params[param_update_count & 0x1]);

Given that this is a sample

@cvinayak
Copy link
Contributor Author

cvinayak commented Jul 7, 2025

@thoh-ot or @erbr-ot Please give your review to the LLCP changes...

Copy link
Contributor

@thoh-ot thoh-ot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QACK.

I feel a better solution to pausing data would be to port the CONFIG_BT_CTLR_PHY_UPDATE_IND_TX_ACKED_INSTANT_OVERRIDE feature from PHY Update and then correct the instant in the LLL when transmitting the actual PDU.

@cvinayak
Copy link
Contributor Author

cvinayak commented Jul 9, 2025

correct the instant in the LLL

I thought we were against modification of PDUs in the LLL once ULL has enqueued it.

is CONFIG_BT_CTLR_PHY_UPDATE_IND_TX_ACKED_INSTANT_OVERRIDE in the upstream repo?

@cvinayak cvinayak added the DNM This PR should not be merged (Do Not Merge) label Jul 10, 2025
@thoh-ot
Copy link
Contributor

thoh-ot commented Jul 10, 2025

correct the instant in the LLL

I thought we were against modification of PDUs in the LLL once ULL has enqueued it.

is CONFIG_BT_CTLR_PHY_UPDATE_IND_TX_ACKED_INSTANT_OVERRIDE in the upstream repo?

Ahh, correct it is only in our downstream repo. It does also feel a bit hackish to modify the PDU in the LLL, but there we have LLL specific information, that can try to correct stale instant values. Dunno the reason why we never upstreamed that feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth Controller area: Bluetooth Host Bluetooth Host (excluding BR/EDR) area: Bluetooth area: Samples Samples DNM This PR should not be merged (Do Not Merge)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants