Scalability with large datasets? #263
Replies: 7 comments
-
|
Hi again Alexis. 1:
There's been one quite significant performance improvement since then that was achieved by removing several unnecessary calls to std::round(). Otherwise, I've been pretty much occupied with numerous minor bugfixes, updating documentation and, just in the last few days, the addition of a new extremely fast RectClip function. 2:
OK. I've just tried this out today and unfortunately it made no impact on performance. clipper.engine.h: clipper.engine.cpp: 3:
I'm completely ignorant of |
Beta Was this translation helpful? Give feedback.
-
|
Hey Angus, 1: So what's going on? Well, std::round() is specified to be able to raise FE_INEXACT exceptions and other details for abnormal input, and we don't care about any of that. (By the way, I was benchmarking the C++ version with -O3 -ffast-math -march=native -mtune=native -flto ~ the C port is 30%ish faster after all that) 2: Rather, I'm referring to changes that would turn the whole algorithm from O(n^3) to probably O(n log n). Let's see:
So, ideally, BuildIntersectList() should only test the edges that can possibly intersect at the current scan line. Proposed solution:
Does that make more sense? I haven't written the code yet because it's a bit complicated, and I'm concerned about unseen interactions... but I think it's a pretty sound solution. Thanks! EDIT: Oh, and the new function RectClip() is pretty cool :), well done! |
Beta Was this translation helpful? Give feedback.
-
|
Hi, alexisnaveros and Angus, I am also very interested in the scalability of Clipper2. Is it possible to share some testing dataset for the performance measurement, will the algorithm complexity falls to O(n^3) in the worst case when the input subject with n points and n(n-1)/2 edges? BTW, will it help to improve performace if we use the balanced binary search tree to store active edges instead of doubly linked list? Thanks for the discussion |
Beta Was this translation helpful? Give feedback.
-
I just can't see this working. It's not just neighbours that need to be considered. Any newly promoted horizontal line (or near horizontal) will intersect with any number of edges in the active edge list (ie Vatti's AET). So you'd need some way of checking the inactives and restoring those that will be intersected by edges newly promoted at the top of each scanline . Likewise, whenever an edge is inserted (ie at a local minima) into the AET, you'd again need to check how many inactive edges will be affected by this, and how soon. |
Beta Was this translation helpful? Give feedback.
-
Right, sorry, the plan outlined above does need some corrections (it's based on my notes/drawings from 3 months ago, so it's not quite fresh). Earlier, I was talking about updating the edges from the "inactive list" just above any new vertex (updating their "the scanline value at which the edge could possibly intersect anything" and placement in "inactive list" heap). As you correctly pointed out, that's insufficient. Rather, we could update all the edges of the inactive list that overlap the X range (the horizontal span) defined by the new incoming vertex and its two connected edges. A long horizontal edge could update dozens of edges, although I guess a typical case would be updating 2-3 edges. If we have a red-black tree keeping a sorted list (on X) of all inactive edges, then it's not expensive to find the first edge overlapping that X range, then walking (find-next-right) the tree to find the next edges, until we get out of that X span. I think the same strategy applies to any new active edge; just check the inactive list for overlaps on the given X range, and update. Would that work or I'm (still) missing something?... |
Beta Was this translation helpful? Give feedback.
-
I guess it could work, but it would add enormously to code complexity (and the library is already horridly complex). If you have relatively many contours that never intersect during their life spans in the AET (except at bottom and top), then it mightn't be overly complex to archive (make inactive) these edges in the AET. |
Beta Was this translation helpful? Give feedback.
-
|
Yeah, it's all fairly complicated. Optimized algorithms for complicated problems tend to get pretty nasty... I think that potential solution is on the right track to improve that O(n^3) runtime, but perhaps we could simplify further or find a better solution. I'm all for exploring the problem further, please don't hesitate to shout any idea you might have! For now, I'm still using my old code (not Clipper2-based), working with almost a billion vertices. Runtime performance is okay (I think it's O(n log n)-ish), but flexibility is poor, and robustness is just awful (I have special codes to handle/repair a gazillion edge cases). On the other hand, Clipper2 is flexible, robust and elegant (you have just one edge case: horizontal edges). It only lacks better scalability. Frankly, I really want to ditch the old code and switch to Clipper2, even if it's a overcomplicated scalable rewrite that you won't touch with a 3.048 meters pole. ;) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey people. We had discussed the topic a bit some 3 months ago, and I'm wondering if there have been further thoughts about improving the scalability of Clipper2 for large datasets?
Last I checked, my C port was around 30% faster than the C++ version (it can do better; I purposely didn't implement any optimization that would imply a massive reordering of the algorithm and memory, since it becomes harder to import improvements/fixes). But the scalability really has to be improved for the datasets I'm working with...
I could try improving the scalability in my C version. My plan would be to maintain two lists of edges: active and inactive edges. Only the active edges have any hope of possibly intersecting with the current scanline, so we would only test these in BuildIntersectList(), and edges move between the two lists as we scan down. I expect a massive performance gain whenever polygons have thousands/millions/billions of edges. Any thoughts? Is that something you would like to look at?
I notice that Area() to compute clockwiseness is still an issue ( #136 ), though I can solve that with Shewchuk summation (kind-of infinite precision summation). The overhead is tiny compared to the scalability issue.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions