Add ability to preempt execution and add test #684

MarqRazz · 2025-05-23T17:59:26Z

This PR is an update of #363 and adds a test to ensure it is working as expected.

Additional changes I made to make it possible:

Added the ability to not start Rviz when running demo.launch.py (I use this launch file in my test and needed the ability to disable it)
Added a persistent node and action client to Task
Extended testing of my motion system has found that recreating the node on each execution request can cause a seg fault after many hours of running. I am continuing to test this and will update with any results, but if you feel this is out of scope I can move this feature to a different PR. Here is a link to the upstream issue with cyclonedds and changes they have made.

[my_motion_node-9] my_motion_node: ./src/core/ddsc/src/dds_handles.c:430: dds_handle_drop_childref_and_pin: Assertion `(cf & HDL_REFCOUNT_MASK) > 0' failed.
[my_motion_node-9] Stack trace (most recent call last) in thread 686402:
[my_motion_node-9] #27   Object "/usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1", at 0xffffffffffffffff, in 
[my_motion_node-9] #26   Object "/usr/lib/aarch64-linux-gnu/libc.so.6", at 0xffffb3adba4b, in 
[my_motion_node-9] #25   Object "/usr/lib/aarch64-linux-gnu/libc.so.6", at 0xffffb3a7595b, in 
[my_motion_node-9] #24   Object "/usr/lib/aarch64-linux-gnu/libstdc++.so.6.0.33", at 0xffffb3d81adf, in 
...
[my_motion_node-9] #19   Object "/root/ws/install/my_motion_node", at 0xaaaad172b93b, in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release_last_use_cold()
[my_motion_node-9] #18   Object "/root/ws/install/my_motion_node", at 0xaaaad176fff7, in std::_Sp_counted_deleter<rclcpp_action::Client<moveit_task_constructor_msgs::action::ExecuteTaskSolution>*, rclcpp_action::create_client<moveit_task_constructor_msgs::action::ExecuteTaskSolution>(std::shared_ptr<rclcpp::node_interfaces::NodeBaseInterface>, std::shared_ptr<rclcpp::node_interfaces::NodeGraphInterface>, std::shared_ptr<rclcpp::node_interfaces::NodeLoggingInterface>, std::shared_ptr<rclcpp::node_interfaces::NodeWaitablesInterface>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<rclcpp::CallbackGroup>, rcl_action_client_options_s const&)::{lambda(rclcpp_action::Client<moveit_task_constructor_msgs::action::ExecuteTaskSolution>*)#1}, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose()
[my_motion_node-9] #17   Object "/root/ws/install/my_motion_node", at 0xaaaad176f58f, in rclcpp_action::create_client<moveit_task_constructor_msgs::action::ExecuteTaskSolution>(std::shared_ptr<rclcpp::node_interfaces::NodeBaseInterface>, std::shared_ptr<rclcpp::node_interfaces::NodeGraphInterface>, std::shared_ptr<rclcpp::node_interfaces::NodeLoggingInterface>, std::shared_ptr<rclcpp::node_interfaces::NodeWaitablesInterface>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<rclcpp::CallbackGroup>, rcl_action_client_options_s const&)::{lambda(rclcpp_action::Client<moveit_task_constructor_msgs::action::ExecuteTaskSolution>*)#1}::operator()(rclcpp_action::Client<moveit_task_constructor_msgs::action::ExecuteTaskSolution>*) const

rhaschke

The original review comments (#363 (review)) still apply.
It would be much more reasonable to provide access to the ActionServer:

auto as = task.getExecutionActionServer()
auto goal_handle = as->async_send_goal(solution) // convert solution to goal and send it to action server
...

Additionally, I have a few other remarks:

Your approach to reuse demo.launch.py from moveit_task_constructor_demo doesn't work as this is a downstream package of moveit_task_constructor_capabilities
You don't use callback groups as in #363. Isn't that needed?
The code is overly complex, partially. I will commit some simplifications.

MarqRazz · 2025-05-27T14:52:22Z

Thanks for the simplifications and updates Robert!

It would be much more reasonable to provide access to the ActionServer:

The problem with this approach is that the user would need access to the ActionClient and node to properly manage the request. I did not think it was good approach to expose both of those objects to the users program but if you had other ideas on how they could properly manage it I'm open to suggestions. For example here is how execute is currently waiting for the result:

while (result_future.wait_for(std::chrono::milliseconds(10)) != std::future_status::ready) {
	...
	rclcpp::spin_some(execute_solution_node_);
}

I also looked into having the user pass in their own node to execute but it means that they would be responsible for spinning things such that the client request could complete correctly.

The user can always create their own ActionClient and node to handle request as they see best if they need fine control over execution. Given that execute() is an exposed feature to task I feel this is an easy and robust way to make it feature complete. In my use case MTC is wrapped in a ActionServer and having a single call to preempt planning or execution simplifies my setup from having to know which state it is in and guarantees things are properly stopped.

Your approach to reuse demo.launch.py from moveit_task_constructor_demo doesn't work as this is a downstream package of moveit_task_constructor_capabilities

I can copy the launch file into the execution test or move it over to the demo package, do you have a preference on how I fix this?

You don't use callback groups as in #363. Isn't that needed?

I did not find it a requirement for the ActionServer to be Reentrant to cancel the request.

rhaschke · 2025-05-27T17:48:51Z

It would be much more reasonable to provide access to the ActionServer:

The problem with this approach is that the user would need access to the ActionClient and node

Agreed. Given that situation, handling the cancellation internal to task seems reasonable.
I have one other idea to avoid the busy loop: One could simply cancel the execution goal when calling Task::preempt().
This would require saving the latest goal_handle.

I can copy the launch file into the execution test or move it over to the demo package, do you have a preference on how I fix this?

Could you simply reuse core/test/test.launch.py?

core/src/task.cpp

MarqRazz · 2025-05-27T20:28:44Z

I have one other idea to avoid the busy loop: One could simply cancel the execution goal when calling Task::preempt().
This would require saving the latest goal_handle.

I'm still looking into this but wouldn't it still require a busy loop blocking execute from returning until motion or cancel is complete?

rhaschke · 2025-05-27T20:36:47Z

No. I think you would simply use the existing code:
rclcpp::spin_until_future_complete(node, result_future)

MarqRazz · 2025-05-27T21:00:01Z

So sorry, I though I hit quote and it edited your post 😊.

spin can only be called on a node once so we would need a busy loop in preempt to wait for the cancel request to complete. If you are okay with that I can get the change implemented.

rhaschke · 2025-05-27T21:33:28Z

spin can only be called on a node once

Really? Is that documented somewhere? What happens if one spins a node from several threads simultaneously?

MarqRazz · 2025-05-27T22:30:06Z

I was actually thinking of calling spin multiple times on an executor. For example the single_threaded_executor throws an exception if you do this.

I tried saving the goal handle and moved the cancel call to preempt and my test segfaults if I wait for the cancel result.

void Task::preempt() {
	pimpl()->preempt_requested_ = true;
	if(execute_goal_handle_){
		auto cancel_future = execute_ac_->async_cancel_goal(execute_goal_handle_);
		if (rclcpp::spin_until_future_complete(execute_solution_node_, cancel_future) !=
			rclcpp::FutureReturnCode::SUCCESS) {
			RCLCPP_ERROR(execute_solution_node_->get_logger(), "Could not preempt execution");
		}
	}
}

To fix this I tried to make sure the thread was joinable but it still segfaults so I suspect something is going out of scope when cancel and rclcpp::spin_until_future_complete in execute attempt to return at the same time.

1: [test_execution-6]          55: 	// cancel the trajectory half way through the expected execution time
1: [test_execution-6]          56: 	rclcpp::sleep_for(exec_duration.to_chrono<std::chrono::milliseconds>() / 2);
1: [test_execution-6] #6  | Source "/usr/include/c++/13/bits/std_thread.h", line 173, in __terminate
1: [test_execution-6]     |   171:     {
1: [test_execution-6]     |   172:       if (joinable())
1: [test_execution-6]     | > 173: 	std::__terminate();
1: [test_execution-6]     |   174:     }
1: [test_execution-6]       Source "/usr/include/x86_64-linux-gnu/c++/13/bits/c++config.h", line 322, in ~thread [0x5821e663a802]
1: [test_execution-6]         319:   inline void __terminate() _GLIBCXX_USE_NOEXCEPT
1: [test_execution-6]         320:   {
1: [test_execution-6]         321:     void terminate() _GLIBCXX_USE_NOEXCEPT __attribute__ ((__noreturn__));
1: [test_execution-6]       > 322:     terminate();
1: [test_execution-6]         323:   }
1: [test_execution-6]         324: #pragma GCC visibility pop
1: [test_execution-6]         325: }
1: [test_execution-6] #5    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.33", at 0x7d6ba97cda54, in std::terminate()
1: [test_execution-6] #4    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.33", at 0x7d6ba97e30d9, in 
1: [test_execution-6] #3    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.33", at 0x7d6ba97cdff4, in 
1: [test_execution-6] #2    Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7d6ba953e8fe, in abort
1: [test_execution-6] #1    Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7d6ba955b27d, in raise
1: [test_execution-6] #0    Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7d6ba95b4b2c, in pthread_kill
1: [test_execution-6] Aborted (Signal sent by tkill() 58817 0)

Removing waiting for the cancel result in preempt works as expected. Does this sound like a better approach than manually spinning and calling cancel from execute?

rhaschke · 2025-05-28T07:51:38Z

Why does move_group crash on shutdown? This shouldn't happen.

This reverts commit 7b63c9866c81ebbcaa0a9e9d005760a742e967ea.

rhaschke · 2025-05-28T13:37:13Z

Why does move_group crash on shutdown? This shouldn't happen.

This crash is flaky. With the asan build move_group seems to crash even before running the test binary.

MarqRazz · 2025-05-28T13:51:01Z

Why does move_group crash on shutdown? This shouldn't happen.

move_group has been flaky at shutting down for me for quite a while. Here is the crash I normally see.

[move_group-5] #10   Source "/root/my_ws/src/moveit2/moveit_ros/planning/planning_scene_monitor/src/planning_scene_monitor.cpp", line 549, in scenePublishingThread [0x7d8db9c1afaf]
[move_group-5]         546:       planning_scene_publisher_->publish(msg);
[move_group-5]         547:       if (is_full)
[move_group-5]         548:         RCLCPP_DEBUG(logger_, "Published full planning scene: '%s'", msg.name.c_str());
[move_group-5]       > 549:       rate.sleep();
[move_group-5]         550:     }
[move_group-5] #9    Object "/opt/ros/jazzy/lib/librclcpp.so", at 0x7d8db914c480, in rclcpp::Rate::sleep()
[move_group-5] #8    Object "/opt/ros/jazzy/lib/librclcpp.so", at 0x7d8db90a0238, in rclcpp::Clock::sleep_for(rclcpp::Duration, std::shared_ptr<rclcpp::Context>)

MarqRazz · 2025-05-28T15:38:06Z

This crash is flaky. With the asan build move_group seems to crash even before running the test binary.

I pulled moveit/moveit2:rolling-source and built this branch but move_group always runs successfully. I'm still working on re-create the asan environment so that I can debug what is going on. It looks like it might be an issue loading the ExecuteTaskSolutionCapability which didn't change here 🤔

[move_group-3] #21   Object "/root/ws_moveit/install/moveit_ros_move_group/lib/moveit_ros_move_group/move_group", at 0x5556cb5dd3ed, in move_group::MoveGroupExe::configureCapabilities()
[move_group-3] #20   Object "/root/ws_moveit/install/moveit_ros_move_group/lib/moveit_ros_move_group/move_group", at 0x5556cb5cc020, in pluginlib::ClassLoader<move_group::MoveGroupCapability>::createUniqueInstance(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
[move_group-3] #19   Object "/root/ws_moveit/install/moveit_ros_move_group/lib/moveit_ros_move_group/move_group", at 0x5556cb5bf4ff, in pluginlib::ClassLoader<move_group::MoveGroupCapability>::loadLibraryForClass(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
...
[move_group-3] #13   Object "/opt/ros/rolling/lib/librcutils.so", at 0x7fca7070d95c, in rcutils_load_shared_library
[move_group-3] #12   Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7fca702db25e, in dlopen

This reverts commit 3fc04e3.

If planning fails, we cannot access the solution...

codecov · 2025-05-28T19:31:22Z

Codecov Report

Attention: Patch coverage is 73.91304% with 6 lines in your changes missing coverage. Please review.

Project coverage is 49.69%. Comparing base (4de8d23) to head (3fc04e3).
Report is 142 commits behind head on ros2.

Files with missing lines	Patch %	Lines
core/src/task.cpp	73.92%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             ros2     #684      +/-   ##
==========================================
+ Coverage   42.47%   49.69%   +7.22%     
==========================================
  Files          83       99      +16     
  Lines        8086     8967     +881     
==========================================
+ Hits         3434     4455    +1021     
+ Misses       4652     4512     -140

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

MarqRazz · 2025-05-29T15:28:46Z

@rhaschke I was able to reproduce the issue locally by pulling moveit/moveit2:rolling-source making a separate workspace for MTC and only building MTC with --cmake-args -DCMAKE_CXX_FLAGS=-fsanitize=address.

Plugin loading when using -fsanitize=address strongly suggests an ABI incompatibility or a runtime linking issue between the plugin compiled with AddressSanitizer and other parts of the ROS 2 system (like class_loader, move_group itself, or underlying libraries) that were not compiled with it.

From my research, if you link or dynamically load modules where some are instrumented and some are not, you can run into crashes, especially around memory allocation/deallocation or when the sanitizer's runtime tries to manage state across these boundaries.

Do we have access to a docker container with MoveIt also built with --cmake-args -DCMAKE_CXX_FLAGS=-fsanitize=address?

MarqRazz added 2 commits May 23, 2025 11:31

Add ability to preempt execution and add test

2379896

persistent action_client

04234a0

MarqRazz requested review from v4hn and rhaschke May 23, 2025 17:59

rhaschke reviewed May 25, 2025

View reviewed changes

rhaschke added 3 commits May 25, 2025 19:04

Fix indentation

f5987dd

Simplify test code

464c45f

Simplify Task::execute()

7b963b9

rhaschke reviewed May 27, 2025

View reviewed changes

core/src/task.cpp Show resolved Hide resolved

unique node per task and fix test depends

2cb84e1

MarqRazz added 2 commits May 27, 2025 14:31

revert changes to demo.launch.py

5d1f23b

pre-commit

b011321

MarqRazz added 2 commits May 27, 2025 17:14

give move_group more time to start in execution test

2774466

remove post_shutdown_test because move_group fails

f42b159

rhaschke added 5 commits May 28, 2025 13:59

Revert "remove post_shutdown_test because move_group fails"

eb1e941

This reverts commit 7b63c9866c81ebbcaa0a9e9d005760a742e967ea.

Remove not required depencencies

dcb1069

Replace EXPECT_TRUE with EXPECT_EQ if appropriate

b005349

Increase CurrentState timeout to 5s

320ae2b

Increase timeout to 10s

3fc04e3

MarqRazz mentioned this pull request May 28, 2025

Check rclcpp::ok before sleeping for clean shutdown moveit/moveit2#3484

Merged

6 tasks

rhaschke added 2 commits May 28, 2025 18:04

Revert "Increase timeout to 10s"

e49d040

This reverts commit 3fc04e3.

Use ASSERT_TRUE(task.plan())

5408d7c

If planning fails, we cannot access the solution...

This was referenced May 28, 2025

Check rclcpp::ok before sleeping for clean shutdown (backport #3484) moveit/moveit2#3485

Merged

Check rclcpp::ok before sleeping for clean shutdown (backport #3484) moveit/moveit2#3486

Merged

Disable execution test with asan build

0259d51

rhaschke approved these changes May 29, 2025

View reviewed changes

rhaschke merged commit 584197a into moveit:ros2 May 29, 2025
5 checks passed

MarqRazz added a commit to MarqRazz/moveit_task_constructor that referenced this pull request May 29, 2025

Let Task::preempt() cancel execute() (moveit#684)

fd9044a

MarqRazz mentioned this pull request May 29, 2025

Let Task::preempt() cancel execute() (#684) #688

Closed

MarqRazz deleted the preempt_execution branch May 29, 2025 20:19

MarqRazz mentioned this pull request May 30, 2025

Reset preempt_request after canceling execution #689

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ability to preempt execution and add test #684

Add ability to preempt execution and add test #684

Uh oh!

MarqRazz commented May 23, 2025

Uh oh!

rhaschke left a comment

Uh oh!

MarqRazz commented May 27, 2025

Uh oh!

rhaschke commented May 27, 2025

Uh oh!

Uh oh!

MarqRazz commented May 27, 2025

Uh oh!

rhaschke commented May 27, 2025 •

edited by MarqRazz

Loading

Uh oh!

MarqRazz commented May 27, 2025

Uh oh!

rhaschke commented May 27, 2025

Uh oh!

MarqRazz commented May 27, 2025 •

edited

Loading

Uh oh!

rhaschke commented May 28, 2025

Uh oh!

rhaschke commented May 28, 2025

Uh oh!

MarqRazz commented May 28, 2025

Uh oh!

MarqRazz commented May 28, 2025

Uh oh!

codecov bot commented May 28, 2025

Uh oh!

MarqRazz commented May 29, 2025

Uh oh!

Uh oh!

Uh oh!

Add ability to preempt execution and add test #684

Add ability to preempt execution and add test #684

Uh oh!

Conversation

MarqRazz commented May 23, 2025

Uh oh!

rhaschke left a comment

Choose a reason for hiding this comment

Uh oh!

MarqRazz commented May 27, 2025

Uh oh!

rhaschke commented May 27, 2025

Uh oh!

Uh oh!

MarqRazz commented May 27, 2025

Uh oh!

rhaschke commented May 27, 2025 • edited by MarqRazz Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarqRazz commented May 27, 2025

Uh oh!

rhaschke commented May 27, 2025

Uh oh!

MarqRazz commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhaschke commented May 28, 2025

Uh oh!

rhaschke commented May 28, 2025

Uh oh!

MarqRazz commented May 28, 2025

Uh oh!

MarqRazz commented May 28, 2025

Uh oh!

codecov bot commented May 28, 2025

Codecov Report

Uh oh!

MarqRazz commented May 29, 2025

Uh oh!

Uh oh!

Uh oh!

rhaschke commented May 27, 2025 •

edited by MarqRazz

Loading

MarqRazz commented May 27, 2025 •

edited

Loading