-
Notifications
You must be signed in to change notification settings - Fork 7.6k
Bluetooth: Controller: Fix conn timeout due to delayed conn upd ind #92572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Bluetooth: Controller: Fix conn timeout due to delayed conn upd ind #92572
Conversation
f3ac945
to
f4d05b7
Compare
/* Count down number of metrics intervals before performing a param update */ | ||
#define PARAM_UPDATE_COUNTDOWN PHY_UPDATE_COUNTDOWN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this feels like even more test code that isn't related to GATT :( Would prefer that we test things like this in BSIM, instead of polluting a GATT sample with PHY tests (personal opinion)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not be related to GATT but a typical user application requiring a GATT Write without Response will perform both PHY update and Connection Update to tune the throughput. In that sense, is a sample that demonstrate GATT Write without Response, PHY Update and Connection Update (there are no samples for these procedures currently in the upstream repo!).
2750d88
to
8e7d7e8
Compare
Remove duplicate define CONN_DATA_BUFFERS and use CONFIG_BT_BUF_ACL_TX_COUNT instead. Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
Add connection parameter update to throughput test to cover any connection timeout performing connection update while subjected to high throughput scenarios. Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
Make the throughput test to fail when increasing Tx buffers that will delay a LE Connection Ind PDU from being transmitted on air due to previously queued Data PDUs. Unlike legacy Zephyr Controller's double headed Tx queue linked list, the new split Zephyr Controller's new LLCP does not place a control PDU at the head of the queue causing control PDUs to be delayed on air. Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
Unlike legacy Zephyr Controller's double headed Tx queue linked list, the new split Zephyr Controller's new LLCP does not place a control PDU at the head of the Tx queue causing control PDUs to be delayed on air. Pause the data enqueue until pending PDUs are transmitted across before enqueuing LE Connection Update Ind PDU, and thereafter resume other PDUs being enqueued again. There can still be connection timeout if enqueued PDUs are not being transmitted/acked during repeated retransmissions. Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
8e7d7e8
to
a3347a3
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds mechanisms to pause data ACL transmissions while waiting for LLCP Connection Update PDUs on-air, preventing connection timeouts under high throughput. It also adjusts test buffer sizes for bsim throughput tests and extends the central GATT write sample to exercise connection parameter updates.
- Increased BT event and ACL TX buffer counts in throughput test config
- Introduced
LLCP_TX_QUEUE_PAUSE_DATA_CONN_UPD
, pause/resume logic in the controller, and exposed a TX enqueue count API - Refactored FIFO definitions to use
CONFIG_BT_BUF_ACL_TX_COUNT
and updated the sample app to periodically send connection parameter updates
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
tests/bsim/bluetooth/ll/throughput/prj.conf | Increased BT HCI RX event and ACL TX buffer counts for high-throughput tests |
subsys/bluetooth/controller/ll_sw/ull_llcp_internal.h | Added new pause mask LLCP_TX_QUEUE_PAUSE_DATA_CONN_UPD |
subsys/bluetooth/controller/ll_sw/ull_llcp_conn_upd.c | Paused data enqueue until pending PDUs drain and resumed after sending Connection Update Ind |
subsys/bluetooth/controller/ll_sw/ull_conn_internal.h | Exposed ull_conn_tx_enqueue_count_get to get current TX queue length |
subsys/bluetooth/controller/ll_sw/ull_conn.c | Refactored MFIFO usage to use CONFIG_BT_BUF_ACL_TX_COUNT and implemented enqueue count |
samples/bluetooth/central_gatt_write/src/gatt_write_common.c | Added scheduling of periodic connection parameter updates in the sample app |
Comments suppressed due to low confidence (4)
subsys/bluetooth/controller/ll_sw/ull_llcp_conn_upd.c:441
- The new pause/resume logic around connection updates isn't covered by existing tests; consider adding a bsim test to validate that control PDUs are sent promptly and data resumes correctly under high throughput.
if (llcp_lr_ispaused(conn) || !llcp_tx_alloc_peek(conn, ctx) ||
samples/bluetooth/central_gatt_write/src/gatt_write_common.c:21
- [nitpick] The macro
PARAM_UPDATE_COUNTDOWN
is very similar to the variableparam_update_countdown
, which may cause confusion; consider using a more distinct name or referencingPHY_UPDATE_COUNTDOWN
directly where needed.
#define PARAM_UPDATE_COUNTDOWN PHY_UPDATE_COUNTDOWN
tests/bsim/bluetooth/ll/throughput/prj.conf:27
- [nitpick] Please add a comment explaining why the RX event buffer count was increased (e.g., to avoid HCI timeouts under high throughput).
CONFIG_BT_BUF_EVT_RX_COUNT=7
tests/bsim/bluetooth/ll/throughput/prj.conf:35
- [nitpick] Similarly, a brief note on why the ACL TX buffer count is bumped would help future maintainers understand this change.
CONFIG_BT_BUF_ACL_TX_COUNT=6
} | ||
} | ||
|
||
uint16_t ull_conn_tx_enqueue_count_get(struct ll_conn *conn) | ||
{ | ||
uint16_t count = 0; | ||
memq_link_t *curr; | ||
|
||
curr = memq_peek(conn->lll.memq_tx.head, conn->lll.memq_tx.tail, NULL); | ||
while (curr != NULL) { | ||
count++; | ||
|
||
curr = memq_peek(curr->next, conn->lll.memq_tx.tail, NULL); | ||
} | ||
|
||
return count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function iterates through the entire TX queue on each call (O(n) complexity); consider maintaining a counter that tracks the queue length to avoid walking the list every time.
} | |
} | |
uint16_t ull_conn_tx_enqueue_count_get(struct ll_conn *conn) | |
{ | |
uint16_t count = 0; | |
memq_link_t *curr; | |
curr = memq_peek(conn->lll.memq_tx.head, conn->lll.memq_tx.tail, NULL); | |
while (curr != NULL) { | |
count++; | |
curr = memq_peek(curr->next, conn->lll.memq_tx.tail, NULL); | |
} | |
return count; | |
conn->tx_queue_count++; | |
} | |
} | |
uint16_t ull_conn_tx_enqueue_count_get(struct ll_conn *conn) | |
{ | |
return conn->tx_queue_count; |
Copilot uses AI. Check for mistakes.
@@ -448,6 +450,7 @@ static void lp_cu_send_conn_update_ind(struct ll_conn *conn, struct proc_ctx *ct | |||
ctx->state = LP_CU_STATE_WAIT_NTF_AVAIL; | |||
} else { | |||
lp_cu_send_conn_update_ind_finalize(conn, ctx, evt, param); | |||
llcp_tx_resume_data(conn, LLCP_TX_QUEUE_PAUSE_DATA_CONN_UPD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The resume call is duplicated in multiple branches; consider centralizing resume logic (e.g., in lp_cu_send_conn_update_ind_finalize
) to reduce code duplication and ease maintenance.
llcp_tx_resume_data(conn, LLCP_TX_QUEUE_PAUSE_DATA_CONN_UPD); |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK to me, but with a few minor comments
.interval_min = 0x0029, | ||
.interval_max = 0x0029, | ||
.latency = 0, | ||
.timeout = 31, | ||
}, { | ||
.interval_min = 0x0028, | ||
.interval_max = 0x0028, | ||
.latency = 0, | ||
.timeout = 30, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using the BT_GAP_MS_TO_CONN_INTERVAL
or BT_GAP_US_TO_CONN_INTERVAL
and BT_GAP_MS_TO_CONN_TIMEOUT
or BT_GAP_US_TO_CONN_TIMEOUT
for more human-readable values
|
||
err = bt_conn_le_param_update(conn, &update_params[param_update_count & 0x1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to add something like
err = bt_conn_le_param_update(conn, &update_params[param_update_count & 0x1]); | |
/* Switch between the 2 update_params based on param_update_count */ | |
err = bt_conn_le_param_update(conn, &update_params[param_update_count & 0x1]); |
Given that this is a sample
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QACK.
I feel a better solution to pausing data would be to port the CONFIG_BT_CTLR_PHY_UPDATE_IND_TX_ACKED_INSTANT_OVERRIDE
feature from PHY Update and then correct the instant in the LLL when transmitting the actual PDU.
I thought we were against modification of PDUs in the LLL once ULL has enqueued it. is |
Ahh, correct it is only in our downstream repo. It does also feel a bit hackish to modify the PDU in the LLL, but there we have LLL specific information, that can try to correct stale instant values. Dunno the reason why we never upstreamed that feature. |
Add connection parameter update to throughput test to cover
any connection timeout performing connection update while
subjected to high throughput scenarios.
Unlike legacy Zephyr Controller's double headed Tx queue
linked list, the new split Zephyr Controller's new LLCP does
not place a control PDU at the head of the Tx queue causing
control PDUs to be delayed on air.
Pause the data enqueue until pending PDUs are transmitted
across before enqueuing LE Connection Update Ind PDU, and
thereafter resume other PDUs being enqueued again.
There can still be connection timeout if enqueued PDUs are
not being transmitted/acked during repeated retransmissions.
Below steps will be used to ensure OTA DFU with high throughput and connection updates is handled gracefully, without connection timeout or assertions in used HCI Controller or Peripheral Device under test: