-
Notifications
You must be signed in to change notification settings - Fork 3.1k
feat(search): lineage search performance #13545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔴 Meticulous spotted visual differences in 91 of 1358 screens tested: view and approve differences detected. Meticulous evaluated ~8 hours of user flows against your PR. Last updated for commit 3e5c628. This comment will update as new commits are pushed. |
26ef3b8
to
efc5965
Compare
efc5965
to
e67002e
Compare
e67002e
to
c7202f8
Compare
c7202f8
to
a341978
Compare
a341978
to
556daa2
Compare
556daa2
to
3a5bf5a
Compare
b487992
to
3b645b1
Compare
Added couple of comments https://www.notion.so/acryldata/Tech-Spec-Lineage-Measure-investigate-Lineage-Query-Performance-1f5fc6a64277802abe45f9c1fadbc0c2#203fc6a6427780a39fcfc4dfa30da95b (needed to comment on the ES query source). @david-leifker |
4b34fbe
to
03cbdef
Compare
* remove aggregation from limited entities query * prevent slow responses from large page sizes (prevent 10k queries) * parallelize upstream/downstream 1-hop search * implement pagination to remove 1k upstream/downstream limit * add configuration options to application.yaml
* remove aggregation from limited entities query * prevent slow responses from large page sizes (prevent 10k queries) * parallelize upstream/downstream 1-hop search * implement pagination to remove 1k upstream/downstream limit * add configuration options to application.yaml
03cbdef
to
563c356
Compare
563c356
to
448d822
Compare
e7c8c3e
to
3e5c628
Compare
SearchAcrossLineage Performance Optimization:
For complex lineage graphs with high fanout, this optimization delivers performance improvements of >30x, reducing response times from >30s to <1s for standard lineage visualization queries.
Key improvements:
search_after
pagination to handle large fanout. Previous implementation was constrained to 1k relationships per hop due to Elasticsearch aggregation bucket limits.Technical details:
The optimization particularly benefits scenarios with:
High-fanout entities (100s to 1000s of relationships)
Multi-entity lineage exploration where limits need to be distributed fairly
Deep lineage traversal requiring pagination beyond the 1k limit
Query Optimization:
Our shared query builder can produce deeply nested boolean queries when composed across different components. To address this, we've added optimization logic that simplifies query structure by flattening redundant nesting and converting single-clause boolean operations. Enable this feature with the
ELASTICSEARCH_SEARCH_GRAPH_QUERY_OPTIMIZATION
environment variable. While performance gains are unverified, the resulting queries are notably more readable.An example query
Before:
After: