Skip to content

Bluetooth: Controller: Fix LLCP event_counter value used during prepare #92726

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cvinayak
Copy link
Contributor

@cvinayak cvinayak commented Jul 5, 2025

Fix LLCP to use event_counter value used during prepare to compare instant to be the value as when in prepare and not incorrectly use a stale value (when two radio events overlap).

@cvinayak cvinayak force-pushed the github_llcp_event_counter_fix branch 4 times, most recently from c47bfa7 to 5a1f863 Compare July 6, 2025 21:38
@cvinayak cvinayak marked this pull request as ready for review July 7, 2025 07:13
@cvinayak cvinayak requested a review from Copilot July 7, 2025 13:19
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the LLCP instant comparison to use the event counter value captured during the prepare phase, preventing stale counter usage when radio events overlap.

  • Introduce ull_conn_event_counter_at_prepare() and replace direct calls to ull_conn_event_counter() in all LLCP instant checks.
  • Adjust channel map update instant to include slave latency.
  • Add declaration and implementation of the prepare-phase event counter helper.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
subsys/bluetooth/controller/ll_sw/ull_llcp_phy.c Replaced instant checks to use the new prepare-phase counter helper.
subsys/bluetooth/controller/ll_sw/ull_llcp_conn_upd.c Swapped out raw event counter calls for prepare-phase variant in connection update.
subsys/bluetooth/controller/ll_sw/ull_llcp_chmu.c Added slave latency to channel map instant; switched runtime checks to use prepare-phase counter.
subsys/bluetooth/controller/ll_sw/ull_llcp_cc.c Updated CIS create instant checks to utilize the prepare-phase counter.
subsys/bluetooth/controller/ll_sw/ull_conn_internal.h Declared ull_conn_event_counter_at_prepare().
subsys/bluetooth/controller/ll_sw/ull_conn.c Implemented ull_conn_event_counter_at_prepare() and replaced a counter usage in parameter update.
Comments suppressed due to low confidence (2)

subsys/bluetooth/controller/ll_sw/ull_conn_internal.h:87

  • Add a brief comment describing that this returns the event counter value as captured during the prepare phase (event_counter + latency_prepare + lazy).
uint16_t ull_conn_event_counter_at_prepare(struct ll_conn *conn);

subsys/bluetooth/controller/ll_sw/ull_conn.c:2180

  • No unit tests appear to cover ull_conn_event_counter_at_prepare. Adding tests for various latency_prepare and llcp.prep.lazy scenarios will help validate its behavior.
uint16_t ull_conn_event_counter_at_prepare(struct ll_conn *conn)

Comment on lines +121 to +122
ctx->data.chmu.instant = ull_conn_event_counter(conn) + conn->lll.latency +
CHMU_INSTANT_DELTA;
Copy link
Preview

Copilot AI Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This manual calculation may not match the helper's logic (which adds latency_prepare and lazy). Consider using ull_conn_event_counter_at_prepare(conn) + CHMU_INSTANT_DELTA to keep consistency across instant calculations.

Suggested change
ctx->data.chmu.instant = ull_conn_event_counter(conn) + conn->lll.latency +
CHMU_INSTANT_DELTA;
ctx->data.chmu.instant = ull_conn_event_counter_at_prepare(conn) + CHMU_INSTANT_DELTA;

Copilot uses AI. Check for mistakes.

@cvinayak cvinayak requested a review from Thalley July 8, 2025 07:45
Thalley
Thalley previously approved these changes Jul 8, 2025
Copy link
Contributor

@Thalley Thalley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion, but otherwise LGTM

@cvinayak cvinayak force-pushed the github_llcp_event_counter_fix branch from 5a1f863 to 25d1ea7 Compare July 8, 2025 12:01
@cvinayak cvinayak requested a review from Thalley July 8, 2025 12:01
@cvinayak cvinayak force-pushed the github_llcp_event_counter_fix branch from 25d1ea7 to c85adc3 Compare July 8, 2025 12:18
Thalley
Thalley previously approved these changes Jul 8, 2025
@cvinayak cvinayak added the bug The issue is a bug, or the PR is fixing a bug label Jul 8, 2025
@cvinayak cvinayak added this to the v4.2.0 milestone Jul 8, 2025
Copy link
Contributor

@thoh-ot thoh-ot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need time to look at this.

@cvinayak
Copy link
Contributor Author

cvinayak commented Jul 9, 2025

Need time to look at this.

The issue was discovered running the central_gatt_write and peripheral_gatt_write samples with conn update procedures that was added in #92572. The sample can be run in bsim to reproduce supervision timeout due to incorrect instant being checked when consecutive connection events (prepare requesting preemption) overlap under high throughput scenario.

@thoh-ot
Copy link
Contributor

thoh-ot commented Jul 10, 2025

This PR removes fixes that were made as part of #52822 to stabilize CIS therefore I'm not really fond of it.

ull_conn_event_counter_at_prepare() is just the prepare path (TX path) of ull_conn_event_counter(), so something must be wrong with using ull_ref_get() for the post-prepare path (RX path) of ull_conn_event_counter().

I'm also a bit curious to why the LLCP unit-tests are passing with this change, either they aren't being run or they don't catch this

@cvinayak
Copy link
Contributor Author

This PR removes fixes that were made as part of #52822 to stabilize CIS therefore I'm not really fond of it.

/* Check if this connection event is where we need to start the CIS */

Above is the added fixes/improvement comment, but which can execute anyway inside a connection event anywhere upto 150 us before the next connection event. During the overlap of preemption timeout duration, the ull_ref_get() value used in rp_cc_check_instant() call done insde the event (not a prepare) is in that corner case.

I'm also a bit curious to why the LLCP unit-tests are passing with this change, either they aren't being run or they don't catch this.

I do not think there is a test to cover the fixes/improvements added in #52822 where the call to rp_cc_check_instant() before next event and after prepare unto preempt is tested. The change in #52822 seems like a workaround compared to other control procedures that check instant in prepare and not during PDU reception.

@thoh-ot
Copy link
Contributor

thoh-ot commented Jul 10, 2025

This PR removes fixes that were made as part of #52822 to stabilize CIS therefore I'm not really fond of it.

/* Check if this connection event is where we need to start the CIS */

Above is the added fixes/improvement comment, but which can execute anyway inside a connection event anywhere upto 150 us before the next connection event. During the overlap of preemption timeout duration, the ull_ref_get() value used in rp_cc_check_instant() call done insde the event (not a prepare) is in that corner case.

I'm also a bit curious to why the LLCP unit-tests are passing with this change, either they aren't being run or they don't catch this.

I do not think there is a test to cover the fixes/improvements added in #52822 where the call to rp_cc_check_instant() before next event and after prepare unto preempt is tested. The change in #52822 seems like a workaround compared to other control procedures that check instant in prepare and not during PDU reception.

The original commit has a bit more explaining: 4f5b711

It's to handle the case where the CIS begins in the next event.

Fix LLCP to use event_counter value used during prepare to
compare instant to be the value as when in prepare and not
incorrectly use a stale value (when two radio events
overlap).

Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
@cvinayak cvinayak force-pushed the github_llcp_event_counter_fix branch from c85adc3 to 3cad4b1 Compare July 11, 2025 07:27
@cvinayak
Copy link
Contributor Author

It's to handle the case where the CIS begins in the next event.

Correction, i.e it is an improvement to handle cases where CIS begins in the current ACL event (and before next ACL event) where the CIS_IND was received. Without the improvement, the CIS will still be established in the next ACL event after calculating any instant latency and appropriately compensating for elapsed CIS intervals and CIS offsets (hence no CI tests failed with/without this improvement)

I have added back the improvement with renamed function names to bring clarity to when the instant is checked, state machine run at prepare versus on reception of CIS_IND PDU.

Copy link

Copy link
Contributor

@Thalley Thalley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to test this fix? CI passes both before and after

Copy link
Contributor

@thoh-ot thoh-ot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if the usefulness of ull_conn_event_counter() is gone, now we know that the ull_ref_get() logic is flawed...

None the less, I'll approve these changes.

@cvinayak
Copy link
Contributor Author

I'm wondering if the usefulness of ull_conn_event_counter() is gone, now we know that the ull_ref_get() logic is flawed...

None the less, I'll approve these changes.

ull_ref_get() was intended to be used to count references when same state/role events overlap, like ACL connections occupying entire event length, directed advertising and continuous scanning so that state/role can be gracefully disabled when reference count reaches zero; definitely not to identify being inside an event, though it works. Yes, ull_conn_event_counter() is not useful in providing the event_counter, it is better to use the right event_counter value inlined based on where in the timeline the event_counter is needed.

That said, current use of ull_conn_event_counter() to assign instant value in initiating PDUs being in Central role is acceptable for now.

@cvinayak
Copy link
Contributor Author

Is there any way to test this fix? CI passes both before and after

Testable in unit test (otherwise flakky to test with a modified tester Controller implementation under bsim).

This PR is not fixing anything fatal, I can get back to adding unit test after some quiet period!

@cvinayak cvinayak removed this from the v4.2.0 milestone Jul 15, 2025
@nashif nashif removed the bug The issue is a bug, or the PR is fixing a bug label Jul 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants