Skip to content

Commit 90fd9ca

Browse files
Added support for searching large amount of indices (#412)
**Description:** When searching the catalog with the `/search`-endpoint, a `GET /<indices>/_search` request is done with all indices listed in the URL path. However when doing such a search on a large amount of indices, it is possible that the size of the endpoint exceeds Elasticsearch’s maximum allowed HTTP line length (4096 bytes), resulting in the following error: `{"code":"RequestError","description":"RequestError(400, 'too_long_http_line_exception', 'An HTTP line is larger than 4096 bytes.')"}` The solution in this commit moves the indices from the endpoint to the body of the request once the amount of indices passes a certain threshold. The indices of the endpoint will be replaced by `ITEM_INDICES`. Since the query still filters on the correct indices, this change preserves the behavior while avoiding the URL length limitation. **PR Checklist:** - [x] Code is formatted and linted (run `pre-commit run --all-files`) - [x] Tests pass (run `make test`) - [ ] Documentation has been updated to reflect changes, if applicable - [x] Changes are added to the changelog --------- Co-authored-by: Stijn Caerts <stijn.caerts@vito.be>
1 parent a0b77cb commit 90fd9ca

File tree

4 files changed

+53
-2
lines changed

4 files changed

+53
-2
lines changed

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,10 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
1212

1313
- Added the ability to set timeout for Opensearch and Elasticsearch clients by setting the environmental variable `ES_TIMEOUT` [#408](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/408)
1414

15+
### Changed
16+
17+
- Updated collection to index logic to support searching a large amount of indices [#412](https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch/pull/412)
18+
1519
## [v6.0.0] - 2025-06-22
1620

1721
### Added

stac_fastapi/elasticsearch/stac_fastapi/elasticsearch/database_logic.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,10 @@
4343
return_date,
4444
validate_refresh,
4545
)
46+
from stac_fastapi.sfeos_helpers.database.query import (
47+
ES_MAX_URL_LENGTH,
48+
add_collections_to_body,
49+
)
4650
from stac_fastapi.sfeos_helpers.database.utils import (
4751
merge_to_operations,
4852
operations_to_script,
@@ -520,6 +524,9 @@ async def execute_search(
520524
query = search.query.to_dict() if search.query else None
521525

522526
index_param = indices(collection_ids)
527+
if len(index_param) > ES_MAX_URL_LENGTH - 300:
528+
index_param = ITEM_INDICES
529+
query = add_collections_to_body(collection_ids, query)
523530

524531
max_result_window = MAX_LIMIT
525532

stac_fastapi/opensearch/stac_fastapi/opensearch/database_logic.py

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,10 @@
4242
return_date,
4343
validate_refresh,
4444
)
45+
from stac_fastapi.sfeos_helpers.database.query import (
46+
ES_MAX_URL_LENGTH,
47+
add_collections_to_body,
48+
)
4549
from stac_fastapi.sfeos_helpers.database.utils import (
4650
merge_to_operations,
4751
operations_to_script,
@@ -532,6 +536,12 @@ async def execute_search(
532536
"""
533537
search_body: Dict[str, Any] = {}
534538
query = search.query.to_dict() if search.query else None
539+
540+
index_param = indices(collection_ids)
541+
if len(index_param) > ES_MAX_URL_LENGTH - 300:
542+
index_param = ITEM_INDICES
543+
query = add_collections_to_body(collection_ids, query)
544+
535545
if query:
536546
search_body["query"] = query
537547

@@ -544,8 +554,6 @@ async def execute_search(
544554

545555
search_body["sort"] = sort if sort else DEFAULT_SORT
546556

547-
index_param = indices(collection_ids)
548-
549557
max_result_window = MAX_LIMIT
550558

551559
size_limit = min(limit + 1, max_result_window)

stac_fastapi/sfeos_helpers/stac_fastapi/sfeos_helpers/database/query.py

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@
77

88
from stac_fastapi.sfeos_helpers.mappings import Geometry
99

10+
ES_MAX_URL_LENGTH = 4096
11+
1012

1113
def apply_free_text_filter_shared(
1214
search: Any, free_text_queries: Optional[List[str]]
@@ -83,3 +85,33 @@ def populate_sort_shared(sortby: List) -> Optional[Dict[str, Dict[str, str]]]:
8385
return {s.field: {"order": s.direction} for s in sortby}
8486
else:
8587
return None
88+
89+
90+
def add_collections_to_body(
91+
collection_ids: List[str], query: Optional[Dict[str, Any]]
92+
) -> Dict[str, Any]:
93+
"""Add a list of collection ids to the body of a query.
94+
95+
Args:
96+
collection_ids (List[str]): A list of collections ids.
97+
query (Optional[Dict[str, Any]]): The query to add collections to. If none, create a query that filters
98+
the collection ids.
99+
100+
Returns:
101+
Dict[str, Any]: A query that contains a filter on the given collection ids.
102+
103+
Notes:
104+
This function is needed in the execute_search function when the size of the URL path will exceed the maximum of ES.
105+
"""
106+
index_filter = {"terms": {"collection": collection_ids}}
107+
if query is None:
108+
query = {"query": {}}
109+
if "bool" not in query:
110+
query["bool"] = {}
111+
if "filter" not in query["bool"]:
112+
query["bool"]["filter"] = []
113+
114+
filters = query["bool"]["filter"]
115+
if index_filter not in filters:
116+
filters.append(index_filter)
117+
return query

0 commit comments

Comments
 (0)