Scalability with large datasets? #263

alexisnaveros · 2022-10-13T01:24:11Z

alexisnaveros
Oct 13, 2022

Hey people. We had discussed the topic a bit some 3 months ago, and I'm wondering if there have been further thoughts about improving the scalability of Clipper2 for large datasets?

Last I checked, my C port was around 30% faster than the C++ version (it can do better; I purposely didn't implement any optimization that would imply a massive reordering of the algorithm and memory, since it becomes harder to import improvements/fixes). But the scalability really has to be improved for the datasets I'm working with...

I could try improving the scalability in my C version. My plan would be to maintain two lists of edges: active and inactive edges. Only the active edges have any hope of possibly intersecting with the current scanline, so we would only test these in BuildIntersectList(), and edges move between the two lists as we scan down. I expect a massive performance gain whenever polygons have thousands/millions/billions of edges. Any thoughts? Is that something you would like to look at?

I notice that Area() to compute clockwiseness is still an issue ( #136 ), though I can solve that with Shewchuk summation (kind-of infinite precision summation). The overhead is tiny compared to the scalability issue.

Thanks!

AngusJohnson · 2022-10-13T10:33:16Z

AngusJohnson
Oct 13, 2022
Maintainer

Hi again Alexis.

1:

I'm wondering if there have been further thoughts about improving the scalability of Clipper2 for large datasets?

There's been one quite significant performance improvement since then that was achieved by removing several unnecessary calls to std::round().

Otherwise, I've been pretty much occupied with numerous minor bugfixes, updating documentation and, just in the last few days, the addition of a new extremely fast RectClip function.

2:

I could try improving the scalability in my C version. My plan would be to maintain two lists of edges: active and inactive edges.

OK. I've just tried this out today and unfortunately it made no impact on performance.
Here's a quick cut and paste of the changes so you can see what I did (in case you had something else in mind):

clipper.engine.h:

class ClipperBase {
	private:
          ...
  Active* inactives_ = nullptr;
          ...
  void DeleteEdges(Active*& e);
  inline void AddEdgeToInactives(Active* e);
  inline void GetNewActive(Active*& e);
          ...
}

clipper.engine.cpp:

  void ClipperBase::GetNewActive(Active*& e)
  {
    if (!inactives_) 
    {
      for (int i = 0; i < 8; ++i)//tried various prefill sizes
      {
        Active* e2 = inactives_;
        inactives_ = new Active();
        inactives_->next_in_ael = e2;
      }
    }
    e = inactives_;
    inactives_ = inactives_->next_in_ael;
  }

  void ClipperBase::AddEdgeToInactives(Active* e)
  {
    e->next_in_ael = inactives_;
    inactives_ = e;
  }

  void ClipperBase::DeleteEdges(Active*& e) 
  {
    while (e)
    {
      Active* e2 = e;
      e = e->next_in_ael;
      delete e2;
    }
  }

  void ClipperBase::CleanUp()
  {
    DeleteEdges(inactives_);
    DeleteEdges(actives_);
    ...
  }

  void ClipperBase::InsertLocalMinimaIntoAEL(int64_t bot_y)
  {
    ...
    GetNewActive(left_bound);
    //left_bound = new Active();
    left_bound->bot = local_minima->vertex->pt;
    left_bound->curr_x = left_bound->bot.x;
    left_bound->wind_cnt = 0,
    left_bound->wind_cnt2 = 0,
    left_bound->wind_dx = -1,
    ...
     GetNewActive(right_bound);
    //right_bound = new Active();
    right_bound->bot = local_minima->vertex->pt;
    right_bound->curr_x = right_bound->bot.x;
    right_bound->wind_cnt = 0,
    right_bound->wind_cnt2 = 0,
    right_bound->wind_dx = 1,
    ...
  }

  inline void ClipperBase::DeleteFromAEL(Active& e)
  {
    ...
    AddEdgeToInactives(&e);
    //delete& e;
  }

3:

I notice that Area() to compute clockwiseness is still an issue

I'm completely ignorant of Shewchuk summation but I might have another look at using path orientation again.
However, I doubt that this will have a significant impact because the Area function is called in routines that are called infrequently.

0 replies

alexisnaveros · 2022-10-13T12:45:50Z

alexisnaveros
Oct 13, 2022
Author

Hey Angus,

1:
Okay, std::round() should never have been a performance issue. With SSE2 for plain old x86_64,amd64, it should be converted to a cvttsd2si/cvtsi2sd sequence, or a single roundsd/vroundsd when targeting SSE 4.1 (all x86 CPUs since 2009 or so). Either way, it's about as expensive as a '+'.

So what's going on? Well, std::round() is specified to be able to raise FE_INEXACT exceptions and other details for abnormal input, and we don't care about any of that.
Solutions: either compile with -ffast-math (converting the "function call" to 1-2 inlined instructions), or use the nearbyint() function that's directly translated to the instructions above (no FE_INEXACT or other non-sense).

(By the way, I was benchmarking the C++ version with -O3 -ffast-math -march=native -mtune=native -flto ~ the C port is 30%ish faster after all that)

2:
Okay, I see you are just preallocating edges for future use, though that's not quite what I meant. :) (we can discuss memory optimization another day).

Rather, I'm referring to changes that would turn the whole algorithm from O(n^3) to probably O(n log n).

Let's see:

The whole problem is BuildIntersectList() testing all active edges for intersection, and there are just way too many of them
There are very few actual intersections (or order "swap") for each call to BuildIntersectList()
Most of the "checks" in BuildIntersectList() are wasted

So, ideally, BuildIntersectList() should only test the edges that can possibly intersect at the current scan line.

Proposed solution:

If we have some edge A, and given the current state of neighbor edges, we can know if there's no way A will intersect any other edge until the scanline reaches at least the value Y.
So, let's put the edge A in the "inactive list" until the scanline reaches Y, so that BuildIntersectList() doesn't even look at it
That list of "inactive edges" has to be kind-of sorted based on "the scanline value at which the edge could possibly intersect anything", so that we can quickly move the first items back to the active list, based on the current scanline value (proposed data structure: a heap, not a sorted list)
As we scan down, if we encounter a lone "spike" vertex, there are 0/1/2 edges that could need an update to their "scanline value at which the edge could possibly intersect anything", so that we may look for intersections between them and the edges of that "lone spike vertex" sooner
EDIT/NEW: How can these 0/1/2 edges be located, the edges just "above" the spike vertex? Iterating a linked list is slow, so we could keep a red-black tree of all active edges, sorted on X ~ finding these "above" edges in log2(active_count) steps
We need to efficiently move around these 0/1/2 edges in the kind-of sorted "inactive list" ~ if it were a fully sorted list, that would be slow (hence the heap)
Each inactive edge needs to track where exactly it is on the heap (its "index"), at all times, so that it can be efficiently removed from the heap without any search
The edges are added back to the heap given their new "scanline value at which the edge could possibly intersect anything" value, a O(log n) operation, to complete the "move" operation based on the new value

Does that make more sense? I haven't written the code yet because it's a bit complicated, and I'm concerned about unseen interactions... but I think it's a pretty sound solution.

Thanks!

EDIT: Oh, and the new function RectClip() is pretty cool :), well done!

0 replies

cwangbh · 2022-10-13T12:55:49Z

cwangbh
Oct 13, 2022

Hi, alexisnaveros and Angus,

I am also very interested in the scalability of Clipper2. Is it possible to share some testing dataset for the performance measurement, will the algorithm complexity falls to O(n^3) in the worst case when the input subject with n points and n(n-1)/2 edges?

BTW, will it help to improve performace if we use the balanced binary search tree to store active edges instead of doubly linked list?

Thanks for the discussion

0 replies

AngusJohnson · 2022-10-13T23:20:06Z

AngusJohnson
Oct 13, 2022
Maintainer

If we have some edge A, and given the current state of neighbor edges, we can know if there's no way A will intersect any other edge until the scanline reaches at least the value Y. So, let's put the edge A in the "inactive list" until the scanline reaches Y, so that BuildIntersectList() doesn't even look at it

I just can't see this working. It's not just neighbours that need to be considered. Any newly promoted horizontal line (or near horizontal) will intersect with any number of edges in the active edge list (ie Vatti's AET). So you'd need some way of checking the inactives and restoring those that will be intersected by edges newly promoted at the top of each scanline . Likewise, whenever an edge is inserted (ie at a local minima) into the AET, you'd again need to check how many inactive edges will be affected by this, and how soon.

0 replies

alexisnaveros · 2022-10-14T04:22:49Z

alexisnaveros
Oct 14, 2022
Author

I just can't see this working. It's not just neighbours that need to be considered. Any newly promoted horizontal line (or near horizontal) will intersect with any number of edges in the active array. So you'd need some way of checking the inacitve list and restoring any edges that will be intersected by those horizontals. Likewise, whenever an edge is inserted into the active edge list (or just "promoted" to the next edge in that ascending contour), you'd need to check how many inactive edges will be affected by this, and how soon.

Right, sorry, the plan outlined above does need some corrections (it's based on my notes/drawings from 3 months ago, so it's not quite fresh).

Earlier, I was talking about updating the edges from the "inactive list" just above any new vertex (updating their "the scanline value at which the edge could possibly intersect anything" and placement in "inactive list" heap). As you correctly pointed out, that's insufficient.

Rather, we could update all the edges of the inactive list that overlap the X range (the horizontal span) defined by the new incoming vertex and its two connected edges. A long horizontal edge could update dozens of edges, although I guess a typical case would be updating 2-3 edges. If we have a red-black tree keeping a sorted list (on X) of all inactive edges, then it's not expensive to find the first edge overlapping that X range, then walking (find-next-right) the tree to find the next edges, until we get out of that X span.

I think the same strategy applies to any new active edge; just check the inactive list for overlaps on the given X range, and update.

Would that work or I'm (still) missing something?...

0 replies

AngusJohnson · 2022-10-14T04:30:20Z

AngusJohnson
Oct 14, 2022
Maintainer

Would that work or I'm (still) missing something?..

I guess it could work, but it would add enormously to code complexity (and the library is already horridly complex).
And unless complexity can be sensibly encapuslated (and admittedly that's something I've done very poorly in this library), maintenance will eventually become impossible.

If you have relatively many contours that never intersect during their life spans in the AET (except at bottom and top), then it mightn't be overly complex to archive (make inactive) these edges in the AET.

0 replies

alexisnaveros · 2022-10-14T04:59:09Z

alexisnaveros
Oct 14, 2022
Author

Yeah, it's all fairly complicated. Optimized algorithms for complicated problems tend to get pretty nasty...

I think that potential solution is on the right track to improve that O(n^3) runtime, but perhaps we could simplify further or find a better solution. I'm all for exploring the problem further, please don't hesitate to shout any idea you might have!

For now, I'm still using my old code (not Clipper2-based), working with almost a billion vertices. Runtime performance is okay (I think it's O(n log n)-ish), but flexibility is poor, and robustness is just awful (I have special codes to handle/repair a gazillion edge cases).

On the other hand, Clipper2 is flexible, robust and elegant (you have just one edge case: horizontal edges). It only lacks better scalability. Frankly, I really want to ditch the old code and switch to Clipper2, even if it's a overcomplicated scalable rewrite that you won't touch with a 3.048 meters pole. ;)

0 replies

Scalability with large datasets? #263

Uh oh!

alexisnaveros Oct 13, 2022

Replies: 7 comments

Uh oh!

Uh oh!

AngusJohnson Oct 13, 2022 Maintainer

Uh oh!

Uh oh!

alexisnaveros Oct 13, 2022 Author

Uh oh!

Uh oh!

cwangbh Oct 13, 2022

Uh oh!

Uh oh!

AngusJohnson Oct 13, 2022 Maintainer

Uh oh!

alexisnaveros Oct 14, 2022 Author

Uh oh!

Uh oh!

AngusJohnson Oct 14, 2022 Maintainer

Uh oh!

Uh oh!

alexisnaveros Oct 14, 2022 Author

alexisnaveros
Oct 13, 2022

AngusJohnson
Oct 13, 2022
Maintainer

alexisnaveros
Oct 13, 2022
Author

cwangbh
Oct 13, 2022

AngusJohnson
Oct 13, 2022
Maintainer

alexisnaveros
Oct 14, 2022
Author

AngusJohnson
Oct 14, 2022
Maintainer

alexisnaveros
Oct 14, 2022
Author