Skip to content

ES|QL index resolution on planning is broken #127347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
smalyshev opened this issue Apr 24, 2025 · 2 comments
Open

ES|QL index resolution on planning is broken #127347

smalyshev opened this issue Apr 24, 2025 · 2 comments
Labels
:Analytics/ES|QL AKA ESQL >bug medium-risk An open issue or test failure that is a medium risk to future releases Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@smalyshev
Copy link
Contributor

smalyshev commented Apr 24, 2025

Currently, ES|QL index resolution works like this:

  1. Take the index pattern, and remove unavailable clusters from it (e.g. from enrich resolution)
  2. If the resulting pattern is empty, then resolution is empty
  3. If we have something to resolve, add the filter, if present, to it and send it all to field-caps, with ignore_unavailable=true (which means unknown indices and other such errors are ignored)
  4. Collect field-caps response. If it's empty, the resolution is invalid (not found).
  5. If resolution is valid:
  6. Find clusters that were in the original pattern but not in the response. For any such clusters, if any concrete index had been requested, produce failure with Verification error.
  7. Check if there are any unavailable clusters - if those are skippable, mark them as skipped, otherwise it’s a failure.
  8. If it was a CCS search and no clusters are left to search, fail with NoClustersToSearchException
  9. Try running the analysis step. For relations, this for each index pattern this checks that there is a resolution with this index pattern. It does not check individual indices in the pattern, only that the whole pattern resolves to something.
  10. If the analysis step succeeded, we're done. If not, we run the field-caps resolution step again, but this time without the filter.
  11. Then we try to run the analysis step again, using the non-filtered resolution now.

This causes various issues:

  1. Indices are treated as single blob, not as individual ones, which forces us to defer the real index check to runtime and leads to various issues with partial results (ESQL: silently empty result in case of missing index instead of ValidationException #126275) and LIMIT 0 (ES|QL: inconsistent "index not found exception" scenario for "limit 0" queries #114495).
  2. Dual field-caps resolution leads to inconsistencies with field set when filters are applied - sometimes filtered out fields are added, sometimes they are not.
  3. The errors on missing index are inconsistent - sometimes 400 from Verification error, sometimes 404.

Further comment from @dnhatn:

I took a look at the issue and I think there are several problems:

  1. There is a disparity in the indices option between the field-caps API (planning time) and the search-shards API (runtime). We use ALLOW_UNAVAILABLE_TARGETS for field-caps and ERROR_WHEN_UNAVAILABLE_TARGETS for search-shards. This leads to cases where field-caps does not return failures, but the runtime does. With allow_partial_results, we then ignore the runtime failures and return partial results instead of failing the request.

  2. We do not strictly check the index failures returned by the field-caps API.

  3. Another issue is related to security exceptions. Since we use ALLOW_UNAVAILABLE_TARGETS in the field-caps API, it returns unknown index if users lack the privilege to access it. However, if multiple index patterns are specified, we return an unauthorized error from the runtime instead (see EsqlSecurityIT).

  4. There are cases where we return a 400 error, and others where we return a 404.

@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Apr 24, 2025
@smalyshev smalyshev added :Analytics/ES|QL AKA ESQL >bug medium-risk An open issue or test failure that is a medium risk to future releases and removed needs:triage Requires assignment of a team area label labels Apr 24, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@smalyshev
Copy link
Contributor Author

/cc @dnhatn @astefan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >bug medium-risk An open issue or test failure that is a medium risk to future releases Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

2 participants