Skip to content

Fix segfault, added tests for extractLabeledEuclideanClusters #4084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

clunietp
Copy link

extractLabeledEuclideanClusters would segfault at the following line if the output array labeled_clusters was not properly sized, or was empty:

labeled_clusters[cloud.points[i].label].push_back (r); // We could avoid a copy by working directly in the vector

This PR fixes this issue and adds some basic tests for LabeledEuclideanClusterExtraction

@kunaltyagi
Copy link
Member

Currently, it seems like it's the caller's responsibility to allocate properly based on the number of labels. You're shifting the responsibility to extractLabeledEuclideanClusters so pedantically speaking, it's not a fix, it's an enhancement 😄

Keeping that in mind, if PCL seg-faults on it's own code, would it not make more sense to fix it at the source rather than inside the function?

If it's not caused by PCL, it'd make more sense to pre-allocate based on the max label_cluster idx that can be possibly accessed

const auto& max_label_idx = *std::max_element(cloud.begin(), cloud.end(), [](const auto& a, const auto&b) { return a.label < b.label; });
labeled_clusters.resize(max_label_idx);

@kunaltyagi kunaltyagi added needs: author reply Specify why not closed/merged yet module: segmentation needs: more work Specify why not closed/merged yet labels May 11, 2020
@clunietp
Copy link
Author

Hmm yes I can see how this is an enhancement rather than a fix. I suppose I framed it that way because it seems rather counter-intuitive to have to preallocate the output parameter, but then again I'm new to PCL. At the very least, it would help if this requirement was documented somewhere, since I spent many hours tracking this down. Simply changing the documentation would help with that.

Also, consider the edge case with a single pointcloud with label of std::numeric_limits::max(). You would need to preallocate an array of that size just to store a single cluster, which seems very inefficient to me as opposed to enhancing the PCL code.

I'm happy to make any changes; I'm just trying to prevent others from going through the same pain I've just went through :)

@kunaltyagi
Copy link
Member

Also, consider the edge case with a single pointcloud

Where are the labels coming from? I'm actually not 100% sure that the labels are generated solely by the library.

If they are coming via library, they'd be in order and packed. Else your point stands and using a map-like approach makes sense.

changing the documentation would help with that

You can create another PR to fix that while we discuss improvements here

@clunietp
Copy link
Author

I am generating my points externally, and using the labels for semantic meaning. I want to extract/cluster the labeled points for further processing. In my case, I'm using object classification on 2D images to classify objects (and using the Label for classid with a custom map), and then projecting them to 3D points in space so I can later merge/deconflict/etc with RADAR/LIDAR scans, which are already pointclouds.

At this point I can choose/limit my label values so I don't expect to need more than 30-40 right now (with values <= that amount), and I don't need this PR. I was just trying to smooth out what I perceived to be a rough edge in the library.

Happy to implement a map version if you feel that's more appropriate, it shouldn't take me long

@kunaltyagi
Copy link
Member

Instead of a map implementation, I'd recommend the following:

  • Improve the documentation for the std::vector<std::vector<T>> case
  • Create another API for std::map<label_type, std::vector<T>>
  • Refactor the 2 implementations (perhaps by using boost::variant)

@kunaltyagi kunaltyagi removed the needs: author reply Specify why not closed/merged yet label May 11, 2020
Copy link
Member

@kunaltyagi kunaltyagi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This will significantly change the behavior of extractLabeledEuclideanClusters with std::vector<std::vector<pcl::PointIndices>>, esp wrt

  • speed
  • memory pressure

Could someone please benchmark this with a real-world dataset? IFF the difference is significant, no need to panic. We can refactor the meat-of-the-proposal using a std::variant<std::map<T>, std::vector<T>> (with T = std::vector<pcl::Indices>) and a visitor on operator[]


// get max label
const auto extracted_max_label = std::max_element( labeled_map.begin(), labeled_map.end(),
[]( const pcl::labeled_cluster_map_t::value_type& lhs, const pcl::labeled_cluster_map_t::value_type& rhs ) { return lhs.first < rhs.first; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[]( const pcl::labeled_cluster_map_t::value_type& lhs, const pcl::labeled_cluster_map_t::value_type& rhs ) { return lhs.first < rhs.first; }
[]( const auto& lhs, const auto& rhs ) { return lhs.first < rhs.first; }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any C++11 concerns? Or are we >= C++14 now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PCL requires C++14

}

// move clusters to output vector
// convert pcl::Indices to pcl:PointIndices
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are planning on deprecating the PointIndices. Could you please add an interface for vec-o-vec-o-Indices and move this code to modify the output of call to that interface?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you envision this class having 3 extraction methods? I had assumed 2: the new map method would supercede the vec-o-vec-o-PointIndices. This function exists to provide backcompat. Or I may be misunderstanding

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're touching this, I thought it might be easy to add it now, rather than later when someone is going through the interface updating the API.

That means 3 interfaces:

  • vec-o-vec-Indices: new
  • map-o-vec-Indices: new
  • vec-o-vec-PointIndices: old

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If vec-o-vec-Indices will behave similarly to vec-o-vec-PointIndices , and therefore inherit the memory inefficiencies in the sparse label case, why not just deprecate and replace with the map version?

Actually, in searching for existing usage within PCL, I stumbled across ConditionalEuclideanClustering, which seems like a more generic solution to this grouping problem. Are there any good reasons not to simply deprecate this entire class in favor of ConditionalEuclideanClustering with a trivial label matching function? Or, under the hood, rewrite this class to use ConditionalEuclideanClustering, and then transform the result to comply with existing behavior. Maybe I'm missing something obvious

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrt deprecation and the alternate of rewriting, I'm not a user of these classes. Hopefully others can chime in

// iterator in labeled_clusters for the current label
const auto current_label_cluster_it = labeled_clusters.find( current_label );

// max labels check
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to check for max_labels? The interface had an extra parameter but we don't need to add a potential pitfall for that (esp since it is unused in the existing implementation, with no check or documentation)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to break the interface by removing max_labels, but I didn't want it to be left unimplemented either. So I implemented it. Should it be removed entirely, then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new interface: remove it; in the old interface: keep it. I'll create a PR for deprecating it while adding an overload without the extra parameter.

{
// Search for sq_idx
int ret = tree->radiusSearch (seed_queue[sq_idx], tolerance, nn_indices, nn_distances, std::numeric_limits<int>::max());
pcl::Indices nn_indices = {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulling out the vectors from the loop reduces the total dynamic allocations required

Copy link
Author

@clunietp clunietp May 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I had assumed the compiler would optimize this but a brief googling tells me I am probably wrong. I will allocate outside and clear the array instead (which radiusSearch should take care of, but just in case...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the compiler would optimize this

C++20 brings that (compiler optimize matching pair of new and delete). In any case, it'd not apply here because the allocations/deallocations aren't happening in pairs. If only there was a way to share a buffer in a loop from one iteration to another (without needing to use a PITA allocator)

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/** \brief Map of label and associated clusters
*/
using labeled_cluster_map_t = std::map<std::uint32_t, std::vector<pcl::Indices>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the keys are uint32, it might make sense to either let the user choose the map by templating the function, or use a faster map than std::map.

Pinging @Morwenn for the assist

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any plans to typedef the label_type? Ideally, a label_t or equivalent would go right here.

Also, I'm torn between unordered_map and map, any suggestions? I'm assuming a small n, where n==number of labels, and a map may be slightly better. But I could be wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any plans to typedef the label_type

Disclaimer: Personal opinion: None, though it makes sense.

Comment on lines +157 to +164
if ( current_label_cluster_it == labeled_clusters.end() ) // label not found in map, add
{
std::vector<pcl::Indices> v = {};
v.emplace_back(std::move(seed_queue));
labeled_clusters.emplace(current_label, std::move(v));
}
else // label already exists in map; append seed_queue to label vector
current_label_cluster_it->second.emplace_back(std::move(seed_queue));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ( current_label_cluster_it == labeled_clusters.end() ) // label not found in map, add
{
std::vector<pcl::Indices> v = {};
v.emplace_back(std::move(seed_queue));
labeled_clusters.emplace(current_label, std::move(v));
}
else // label already exists in map; append seed_queue to label vector
current_label_cluster_it->second.emplace_back(std::move(seed_queue));
labeled_clusters[current_label].emplace_back(std::move(seed_queue));

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how I missed that one -- I guess it's been a while since I used map. Thanks!

@kunaltyagi kunaltyagi added the needs: code review Specify why not closed/merged yet label May 15, 2020
@kunaltyagi kunaltyagi added needs: feedback Specify why not closed/merged yet and removed needs: code review Specify why not closed/merged yet labels May 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: segmentation needs: feedback Specify why not closed/merged yet needs: more work Specify why not closed/merged yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants