-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Construct TF listeners passing nodes, spinning on separate thread #5406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@roncapat, your PR has failed to build. Please check CI outputs and resolve issues. |
Pull in / rebase on main once #5409 is merged to get CI to turn over. Sorry about that, we hit the wall with circleci limits and needing to load balance the builds. Also makes sure to sign off with DCO |
Signed-off-by: Patrick Roncagliolo <ronca.pat@gmail.com>
... huh. Alot of tests failed. I'm rerunning but if they fail again, I think this introduces a regression |
Also noticed right now... I will be able to investigate as early as next week, sorry. Will be interesting to see what is the cause, since in my real testing scenario it works impressively well. Hope to learn something and have a fix! |
How so? big perf boost? I wouldn't have expected that |
Nah I mean more like "without surprises" - but I am 99% sure that by using the node, it will benefit from enabled IPC on the /tf subscribers. It has been maybe two years or so, I have pushed in the past some patches for IPC in the TransformListener, need to check again in which conditions it gets enabled or not. |
yeah this is still failing completely - I think there's something awry here. I sampled 2 of the 16 tests and the lifecycle transition never completes while its waiting for a transform to be available (which seems awfully related, so I don't think its a CI fluke) |
Signed-off-by: Patrick Roncagliolo <ronca.pat@gmail.com>
I began to study deeper the What I assessed, basically, is that current nav2 code uses the constructor: This is to say, the "only" difference introduced in this PR is the node used by Of course, here we are passing a Moreover, it seems the only place where we have problems in the tests is the Will update you if I discover something more in the upcoming days. |
Ok, I may have undestood the issue. Per https://design.ros2.org/articles/node_lifecycle.html, in the Two options:
My personal take (a bit phylosophic, take this with a grain of salt): I fully understand why |
Mhm, I don't think we can activate until we have all the inputs required to actually be able to process something. Unlike other things like having services available from other nodes that we can make sure are available intrinsically by the ordering of lifecycle transitions, the setting of the robot's initial pose is a user-application defined task (or SLAM if running SLAM) that we cannot know is completed without checking.
We could move it to the already actived state, but then requests are able to be submitted without actually being processable. At the moment, I think we should leave this as-is but can be reopened designwise. I suppose we could have a timer or possibly in the update map thread check for this transform and have the similar delay after activation. That would complicate the implementation a bit, but nothing terrible. My biggest concern there is that we have a timeout feature for waiting for that transform. If we cannot return a failure on a state transition when that timeout is exceeded, then the server becomes in an unrecoverable state. If we have some ideas around that, I wouldn't object to a redesign of this handling.
TF is not lifecycle enabled, so that's no surprise. This isn't doing 'work' or given 'execution time' on the application though so I think that's fine. The lifecycle transition quote you gave from the design document I think is talking about the work in the transition function to block the completion of transition. While perhaps TF could technically do some work given a message, the transition isn't dependent on it, so that's fine. How we use TF to block for the available transform however does break that principle. But your point is understood. If we wanted to be aggressively pure on Lifecycle Nodes, there are many ROS libraries that would need to have activate/deactivate functions enabled. Anyway, but why does change with using the spinning thread and node not work? I'm a little unclear as to that, since there is no lifecycle subscription for the subscription within TF to not be processing. The spin thread should be creating its own executor spun in its own thread as well so that should be all working independently, from first glance. |
I may have misunderstood the design document (at least the part I quoted), but it seems to me that since we are using the Lifecycle Node interfaces to create the subscription inside |
I think this has more to with the TF code spinning w.r.t. the main node. Maybe some print statements would help clarify. I think we should understand the 'why' before we merge, but once we do I'm happy to merge assuming we don't find its just hiding something buggy (or we find that this change is actually buggy and costmap2D is the only place showing the problem to us immediately) |
@roncapat, your PR has failed to build. Please check CI outputs and resolve issues. |
See #1182 - this is a re-assessment after 6 years.
Advantages: easier auditing of TF2 subscriptions across the ROS graph. The nav2 node(s) name(s) will appear for example when issuing
ros2 topic info -v /tf
.Basic Info
Description of contribution in a few bullet points
Construct
tf2_ros::TransformListener
instances passing a node pointer, so that no additional nodes with randomized names are spawned on the ROS graph (less pollution, better auditing), but enabling thespin_thread
flag, so that we ensure TF subscriptions are not interleaved with other nav2-related callbacks in the same executor.For Maintainers:
backport-*
.