Skip to content

ES|QL index resolution on planning is broken #127347

Open
@smalyshev

Description

@smalyshev

Currently, ES|QL index resolution works like this:

  1. Take the index pattern, and remove unavailable clusters from it (e.g. from enrich resolution)
  2. If the resulting pattern is empty, then resolution is empty
  3. If we have something to resolve, add the filter, if present, to it and send it all to field-caps, with ignore_unavailable=true (which means unknown indices and other such errors are ignored)
  4. Collect field-caps response. If it's empty, the resolution is invalid (not found).
  5. If resolution is valid:
  6. Find clusters that were in the original pattern but not in the response. For any such clusters, if any concrete index had been requested, produce failure with Verification error.
  7. Check if there are any unavailable clusters - if those are skippable, mark them as skipped, otherwise it’s a failure.
  8. If it was a CCS search and no clusters are left to search, fail with NoClustersToSearchException
  9. Try running the analysis step. For relations, this for each index pattern this checks that there is a resolution with this index pattern. It does not check individual indices in the pattern, only that the whole pattern resolves to something.
  10. If the analysis step succeeded, we're done. If not, we run the field-caps resolution step again, but this time without the filter.
  11. Then we try to run the analysis step again, using the non-filtered resolution now.

This causes various issues:

  1. Indices are treated as single blob, not as individual ones, which forces us to defer the real index check to runtime and leads to various issues with partial results (ESQL: silently empty result in case of missing index instead of ValidationException #126275) and LIMIT 0 (ES|QL: inconsistent "index not found exception" scenario for "limit 0" queries #114495).
  2. Dual field-caps resolution leads to inconsistencies with field set when filters are applied - sometimes filtered out fields are added, sometimes they are not.
  3. The errors on missing index are inconsistent - sometimes 400 from Verification error, sometimes 404.

Further comment from @dnhatn:

I took a look at the issue and I think there are several problems:

  1. There is a disparity in the indices option between the field-caps API (planning time) and the search-shards API (runtime). We use ALLOW_UNAVAILABLE_TARGETS for field-caps and ERROR_WHEN_UNAVAILABLE_TARGETS for search-shards. This leads to cases where field-caps does not return failures, but the runtime does. With allow_partial_results, we then ignore the runtime failures and return partial results instead of failing the request.

  2. We do not strictly check the index failures returned by the field-caps API.

  3. Another issue is related to security exceptions. Since we use ALLOW_UNAVAILABLE_TARGETS in the field-caps API, it returns unknown index if users lack the privilege to access it. However, if multiple index patterns are specified, we return an unauthorized error from the runtime instead (see EsqlSecurityIT).

  4. There are cases where we return a 400 error, and others where we return a 404.

Metadata

Metadata

Assignees

Labels

:Analytics/ES|QLAKA ESQL>bugTeam:AnalyticsMeta label for analytical engine team (ESQL/Aggs/Geo)medium-riskAn open issue or test failure that is a medium risk to future releases

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions