The implementation of json inverted index #41624

YolandaLyj · 2025-04-30T02:37:29Z

YolandaLyj
Apr 30, 2025

I see two json inverted index implementations, JsonInvertedIndex and JsonKeyStatsInvertedIndex.

What are the differences between these two indexes?

Apr 30, 2025

@YolandaLyj A comprehensive design document will subsequently be placed in the doc directory of the Milvus repository. Below is a summary explanation of the differences between the two.

Two-Level JSON Indexing Strategy for Query Optimization

Key Presence Index (jsonKeyStats)
Structure: Maintains inverted lists tracking:

Which rows contain each JSON key
Byte positions (start, length) of corresponding values in raw JSON strings
Optimization Benefits:
Scan Reduction: For sparse keys (e.g., appearing in 1% of 1M rows), reduces scanned rows from 1M → ~10K
Partial Parsing: Directly extracts target values using recorded byte ranges without full JSON unmarshaling
Ideal For: Sparse key queries …

View full answer

yhmo · 2025-04-30T03:10:02Z

yhmo
Apr 30, 2025
Collaborator

JsonInvertedIndex is for this feature:
#35528

JsonKeyStatsInvertedIndex is for this enhancement:
#36995

For JSON field, only one index type is supported: INVERTED. JsonInvertedIndex is for this index, it is a new index type that users can choose.
JsonKeyStatsInvertedIndex is not a feature, it is an enhancement, to optimize the filtering search/query on the JSON field. I believe it is an "internal optimization", not a feature that can be used by users.

7 replies

YolandaLyj Apr 30, 2025
Author

https://milvus.io/docs/zh/use-json-fields.md

yhmo Apr 30, 2025
Collaborator

I do not clearly know the detailed implementation. I can ask a design doc if you are interested.

YolandaLyj Apr 30, 2025
Author

Thank you very much! Looking forward to your reply.

czs007 Apr 30, 2025
Maintainer

@YolandaLyj A comprehensive design document will subsequently be placed in the doc directory of the Milvus repository. Below is a summary explanation of the differences between the two.

Two-Level JSON Indexing Strategy for Query Optimization

Key Presence Index (jsonKeyStats)
Structure: Maintains inverted lists tracking:

Which rows contain each JSON key
Byte positions (start, length) of corresponding values in raw JSON strings
Optimization Benefits:
Scan Reduction: For sparse keys (e.g., appearing in 1% of 1M rows), reduces scanned rows from 1M → ~10K
Partial Parsing: Directly extracts target values using recorded byte ranges without full JSON unmarshaling
Ideal For: Sparse key queries (existence checks + value comparisons)

Value Inverted Index (JsonInvertedIndex)
Structure: Pre-built inverted index mapping specific JSON path values → row IDs
Example: Index on path "a" creates mapping like 5 → [row23, row45, ...]
Optimization Benefits:

Direct Row Targeting: Eliminates scanning entirely for exact match queries (e.g., a=5)
Selective Parsing: Can combine with position data to avoid parsing entire JSON
Ideal For: High-frequency keys (>80% coverage) with common filter conditions

Answer selected by YolandaLyj

xiaofan-luan · 2025-04-30T07:45:45Z

xiaofan-luan
Apr 30, 2025
Maintainer

JsonInvertedIndex is designed to build an index on a specific JSON path, such as json["a"]. It is effective for accelerating queries like json["a"] == 1 where the path is explicitly known.

On the other hand, JsonKeyStatsInvertedIndex (not yet officially released) automatically creates inverted indexes for all keys found in the JSON field. This makes it highly flexible and especially useful when each row contains different key names.

For example:
If your JSON field contains dynamic data like logs:

{ "error_code": "404", "timestamp": "2025-04-30T12:00:00Z" }
{ "status": "ok", "user": "alice" }
{ "action": "click", "page": "home" }
In such cases, where the structure varies per row, JsonKeyStatsInvertedIndex can significantly speed up filtering and querying.

However, if your JSON field is more like a dense column — where every row contains the same key (e.g., all rows have "a" column), then the performance benefit from JsonKeyStatsIndex is limited. In those cases, it's better to use JsonInvertedIndex and explicitly define the path.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The implementation of json inverted index #41624

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 7 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

The implementation of json inverted index #41624

Uh oh!

YolandaLyj Apr 30, 2025

Replies: 2 comments · 7 replies

Uh oh!

yhmo Apr 30, 2025 Collaborator

Uh oh!

YolandaLyj Apr 30, 2025 Author

Uh oh!

yhmo Apr 30, 2025 Collaborator

Uh oh!

YolandaLyj Apr 30, 2025 Author

Uh oh!

Uh oh!

czs007 Apr 30, 2025 Maintainer

Uh oh!

xiaofan-luan Apr 30, 2025 Maintainer

YolandaLyj
Apr 30, 2025

Replies: 2 comments 7 replies

yhmo
Apr 30, 2025
Collaborator

YolandaLyj Apr 30, 2025
Author

yhmo Apr 30, 2025
Collaborator

YolandaLyj Apr 30, 2025
Author

czs007 Apr 30, 2025
Maintainer

xiaofan-luan
Apr 30, 2025
Maintainer