Construct TF listeners passing nodes, spinning on separate thread #5406

roncapat · 2025-07-30T17:36:11Z

See #1182 - this is a re-assessment after 6 years.
Advantages: easier auditing of TF2 subscriptions across the ROS graph. The nav2 node(s) name(s) will appear for example when issuing ros2 topic info -v /tf.

Basic Info

Info	Please fill out this column
Ticket(s) this addresses	#1182
Primary OS tested on	Ubuntu
Robotic platform tested on	proprietary simulation & HW
Does this PR contain AI generated software?	No
Was this PR description generated by AI software?	No

Description of contribution in a few bullet points

Construct tf2_ros::TransformListener instances passing a node pointer, so that no additional nodes with randomized names are spawned on the ROS graph (less pollution, better auditing), but enabling the spin_thread flag, so that we ensure TF subscriptions are not interleaved with other nav2-related callbacks in the same executor.

For Maintainers:

Check that any new parameters added are updated in docs.nav2.org
Check that any significant change is added to the migration guide
Check that any new features OR changes to existing behaviors are reflected in the tuning guide
Check that any new functions have Doxygen added
Check that any new features have test coverage
Check that any new plugins is added to the plugins page
If BT Node, Additionally: add to BT's XML index of nodes for groot, BT package's readme table, and BT library lists
Should this be backported to current distributions? If so, tag with backport-*.

nav2_bt_navigator/src/bt_navigator.cpp

mergify · 2025-07-30T18:38:06Z

@roncapat, your PR has failed to build. Please check CI outputs and resolve issues.
You may need to rebase or pull in main due to API changes (or your contribution genuinely fails).

nav2_amcl/src/amcl_node.cpp

SteveMacenski · 2025-07-31T19:02:17Z

Pull in / rebase on main once #5409 is merged to get CI to turn over. Sorry about that, we hit the wall with circleci limits and needing to load balance the builds.

Also makes sure to sign off with DCO

nav2_amcl/src/amcl_node.cpp

nav2_behaviors/src/behavior_server.cpp

Signed-off-by: Patrick Roncagliolo <ronca.pat@gmail.com>

SteveMacenski · 2025-08-01T17:13:53Z

... huh. Alot of tests failed. I'm rerunning but if they fail again, I think this introduces a regression

roncapat · 2025-08-01T17:17:05Z

Also noticed right now... I will be able to investigate as early as next week, sorry. Will be interesting to see what is the cause, since in my real testing scenario it works impressively well. Hope to learn something and have a fix!

SteveMacenski · 2025-08-01T17:19:46Z

impressively well

How so? big perf boost? I wouldn't have expected that

roncapat · 2025-08-01T17:32:59Z

Nah I mean more like "without surprises" - but I am 99% sure that by using the node, it will benefit from enabled IPC on the /tf subscribers.

It has been maybe two years or so, I have pushed in the past some patches for IPC in the TransformListener, need to check again in which conditions it gets enabled or not.

SteveMacenski · 2025-08-01T18:22:34Z

yeah this is still failing completely - I think there's something awry here. I sampled 2 of the 16 tests and the lifecycle transition never completes while its waiting for a transform to be available (which seems awfully related, so I don't think its a CI fluke)

Signed-off-by: Patrick Roncagliolo <ronca.pat@gmail.com>

roncapat · 2025-08-03T19:56:13Z

I began to study deeper the tf2_ros::TransformListener.

What I assessed, basically, is that current nav2 code uses the constructor:
TransformListener(tf2::BufferCore & buffer, bool spin_thread = true, bool static_only = false)
Notice the default spin_thread = true.

This is to say, the "only" difference introduced in this PR is the node used by TransformListerer implementation, not the spinning logic, that is exactly kept the same.

Of course, here we are passing a LifecycleNode, from which the TransformListener costructor will extract a set of NodeInterfaces. I will probably focus in the upcoming days on possible subtle implications of this - for example, whether the LifecycleNode current state could influence the correct working of TransformListener.

Moreover, it seems the only place where we have problems in the tests is the costmap_2d_ros. Reverting modifications only for that node makes all the tests pass. This may be an hint on specific way of using the TransformListener that can cause such issue w.r.t. other use cases, restricting the "search area" for the issue.

Will update you if I discover something more in the upcoming days.

roncapat · 2025-08-03T20:12:24Z

Ok, I may have undestood the issue.
costmap_2d_ros expect TF to be received during the on_activate call.

Per https://design.ros2.org/articles/node_lifecycle.html, in the inactive mode ...the node will not receive any execution time to read topics, perform processing of data, respond to functional service requests, etc.

Two options:

explicitly create a non-lifecycle-node and explicitly pass to the TransformListener -> node name can be customized***
defer the canTransform call to the "active" state of costmap_2d_ros

*** https://github.com/ros2/geometry2/blob/2b1742c80a4e91a411e5798eec78573928391a7c/tf2_ros/src/transform_listener.cpp#L46-L56

My personal take (a bit phylosophic, take this with a grain of salt):

I fully understand why canTransform is called there, but I think that it reveals a (minor) flaw in the choice of adopting a LifecycleNode-based architecture - basically by expecting to receive something during the on_activate transition -this currently works because this responsibility is deferred to a "classic" node, hidden in TransformListener.
This also reveals that, when the costmap node is inactive, the listener is still receiving /tf messages - while being 100% strict, in principle, it should be "disabled" (not receiving) too.

SteveMacenski · 2025-08-04T19:58:46Z

Mhm, I don't think we can activate until we have all the inputs required to actually be able to process something. Unlike other things like having services available from other nodes that we can make sure are available intrinsically by the ordering of lifecycle transitions, the setting of the robot's initial pose is a user-application defined task (or SLAM if running SLAM) that we cannot know is completed without checking.

defer the canTransform call to the "active" state of costmap_2d_ros

We could move it to the already actived state, but then requests are able to be submitted without actually being processable. At the moment, I think we should leave this as-is but can be reopened designwise. I suppose we could have a timer or possibly in the update map thread check for this transform and have the similar delay after activation. That would complicate the implementation a bit, but nothing terrible. My biggest concern there is that we have a timeout feature for waiting for that transform. If we cannot return a failure on a state transition when that timeout is exceeded, then the server becomes in an unrecoverable state. If we have some ideas around that, I wouldn't object to a redesign of this handling.

This also reveals that, when the costmap node is inactive, the listener is still receiving /tf messages - while being 100% strict, in principle, it should be "disabled" (not receiving) too.

TF is not lifecycle enabled, so that's no surprise. This isn't doing 'work' or given 'execution time' on the application though so I think that's fine. The lifecycle transition quote you gave from the design document I think is talking about the work in the transition function to block the completion of transition. While perhaps TF could technically do some work given a message, the transition isn't dependent on it, so that's fine. How we use TF to block for the available transform however does break that principle. But your point is understood. If we wanted to be aggressively pure on Lifecycle Nodes, there are many ROS libraries that would need to have activate/deactivate functions enabled.

Anyway, but why does change with using the spinning thread and node not work? I'm a little unclear as to that, since there is no lifecycle subscription for the subscription within TF to not be processing. The spin thread should be creating its own executor spun in its own thread as well so that should be all working independently, from first glance.

roncapat · 2025-08-05T08:52:42Z

Anyway, but why does change with using the spinning thread and node not work? I'm a little unclear as to that, since there is no lifecycle subscription for the subscription within TF to not be processing. The spin thread should be creating its own executor spun in its own thread as well so that should be all working independently, from first glance.

I may have misunderstood the design document (at least the part I quoted), but it seems to me that since we are using the Lifecycle Node interfaces to create the subscription inside TransformListener, also that subscription will not receive any execution time to read topics. I don't understand how, will try to study more the rclcpp_lifecycle code.

SteveMacenski · 2025-08-05T16:47:49Z

I think this has more to with the TF code spinning w.r.t. the main node. Maybe some print statements would help clarify. I think we should understand the 'why' before we merge, but once we do I'm happy to merge assuming we don't find its just hiding something buggy (or we find that this change is actually buggy and costmap2D is the only place showing the problem to us immediately)

roncapat · 2025-08-05T19:35:41Z

I agree!
I think I have found the issue. Took this screenshot while running some failing tests:

/tf and /tf_static get namespaced!

Will check a simple way to avoid this.

EDIT 1:
I tried to add remappings like

    rclcpp::NodeOptions().arguments({
    "--ros-args", "-r", std::string("__ns:=") + nav2_util::add_namespaces(parent_namespace, local_namespace),
    "--ros-args", "-r", nav2_util::add_namespaces(parent_namespace, local_namespace) + "/tf:=/tf",
    "--ros-args", "-r", nav2_util::add_namespaces(parent_namespace, local_namespace) + "/tf_static:=/tf_static",
    "--ros-args", "-r", "tf:=/tf",
    "--ros-args", "-r", "tf_static:=/tf_static",
    "--ros-args", "-p", "use_sim_time:=" + std::string(use_sim_time ? "true" : "false"),

in costmap_2d_ros.cpp but they won't work.
Apparently the issue lies in the nav2_system_test launchfile test_error_codes_launch.py, where:

    remappings = [('/tf', 'tf'), ('/tf_static', 'tf_static')]

is found, like in many nav2_bringup files.
Emptying that list will do the trick. Of course, it is not the right solution.
It seems that remapping from CLI forcefully override any hardcoded override.

mergify · 2025-08-05T21:03:25Z

@roncapat, your PR has failed to build. Please check CI outputs and resolve issues.
You may need to rebase or pull in main due to API changes (or your contribution genuinely fails).

roncapat commented Jul 30, 2025

View reviewed changes

nav2_bt_navigator/src/bt_navigator.cpp Outdated Show resolved Hide resolved

SteveMacenski requested changes Jul 31, 2025

View reviewed changes

nav2_amcl/src/amcl_node.cpp Outdated Show resolved Hide resolved

SteveMacenski reviewed Jul 31, 2025

View reviewed changes

nav2_amcl/src/amcl_node.cpp Outdated Show resolved Hide resolved

roncapat commented Aug 1, 2025

View reviewed changes

nav2_behaviors/src/behavior_server.cpp Outdated Show resolved Hide resolved

Construct TF listeners passing nodes, spinning on separate thread

8cfda62

Signed-off-by: Patrick Roncagliolo <ronca.pat@gmail.com>

roncapat force-pushed the patch-2 branch from 0e3881b to 8cfda62 Compare August 1, 2025 12:31

SteveMacenski approved these changes Aug 1, 2025

View reviewed changes

(tentative) pin down of the impacting change

2db0a74

Signed-off-by: Patrick Roncagliolo <ronca.pat@gmail.com>

ros-navigation deleted a comment from claude bot Aug 5, 2025

WIP: check if number of failing tests decreases

ea90890

SteveMacenski mentioned this pull request Aug 5, 2025

Following server open-navigation/opennav_docking#43

Draft

Construct TF listeners passing nodes, spinning on separate thread #5406

Are you sure you want to change the base?

Construct TF listeners passing nodes, spinning on separate thread #5406

Conversation

roncapat commented Jul 30, 2025

Basic Info

Description of contribution in a few bullet points

For Maintainers:

Uh oh!

Uh oh!

mergify bot commented Jul 30, 2025

Uh oh!

Uh oh!

SteveMacenski commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SteveMacenski commented Aug 1, 2025

Uh oh!

roncapat commented Aug 1, 2025

Uh oh!

SteveMacenski commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roncapat commented Aug 1, 2025

Uh oh!

SteveMacenski commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roncapat commented Aug 3, 2025

Uh oh!

roncapat commented Aug 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SteveMacenski commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roncapat commented Aug 5, 2025

Uh oh!

SteveMacenski commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roncapat commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Aug 5, 2025

Uh oh!

Uh oh!

SteveMacenski commented Jul 31, 2025 •

edited

Loading

SteveMacenski commented Aug 1, 2025 •

edited

Loading

SteveMacenski commented Aug 1, 2025 •

edited

Loading

roncapat commented Aug 3, 2025 •

edited

Loading

SteveMacenski commented Aug 4, 2025 •

edited

Loading

SteveMacenski commented Aug 5, 2025 •

edited

Loading

roncapat commented Aug 5, 2025 •

edited

Loading