Adds Json Parsing to nested object during update Query step in ML Inference Request processor #3856

brianf-aws · 2025-05-17T01:01:02Z

Description

Ensures that Sparse Vectors can be parsed by ML Inference Request processor during rewrite.

Next steps can be to write a tutorial showing how we can run neural sparse search with ML Inference request processor.

Related Issues

Resolves #3767

Testing

spotlessApply && ./gradlew test

Check List

New functionality includes testing.
New functionality has been documented.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

codecov · 2025-05-17T02:08:35Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.98%. Comparing base (ade09e6) to head (29c6745).
Report is 2 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #3856      +/-   ##
============================================
- Coverage     78.02%   77.98%   -0.04%     
- Complexity     7338     7388      +50     
============================================
  Files           656      658       +2     
  Lines         33074    33279     +205     
  Branches       3708     3749      +41     
============================================
+ Hits          25806    25954     +148     
- Misses         5681     5719      +38     
- Partials       1587     1606      +19

Flag	Coverage Δ
ml-commons	`77.98% <100.00%> (-0.04%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

plugin/src/main/java/org/opensearch/ml/processor/MLInferenceSearchRequestProcessor.java

mingshl · 2025-05-20T15:34:49Z

plugin/src/main/java/org/opensearch/ml/processor/MLInferenceSearchRequestProcessor.java

@@ -360,6 +361,9 @@ private String updateQueryTemplate(String queryTemplate, Map<String, String> out
                    String newQueryField = outputMapEntry.getKey();
                    String modelOutputFieldName = outputMapEntry.getValue();
                    Object modelOutputValue = getModelOutputValue(mlOutput, modelOutputFieldName, ignoreMissing, fullResponsePath);
+                    if (modelOutputValue instanceof Map) {
+                        modelOutputValue = new Gson().toJson(modelOutputValue);
+                    }


this is the right place to fix the issue, the string convert to json string should happen before string substitution. So the next question is, should this toJson conversion only added when it's a map?

Map should be a catch all for many scenarios that I can think of at the top of my head.

I may need to test if this solution doesn't break the query from remote llm. I wonder if a list should also be supported as well for example a list of maps. then this instanceof check wouldnt work in that scenario.

I made a UT where a llm returns a query and a query rewrite occurs. This happened to succeed with the code change e.g.

/** * { * "inference_results" : [ { * "output" : [ { * "name" : "response", * "dataAsMap" : { * "content": [ * "text": "{\"query\": \"match_all\" : {}}" * } * } ] * } ] * } */

Off the top of my head I cant think of another scenario where this parsing issue occurs since I see that your UTs cover a lot of edge cases such as a list.

we should cover more use cases in testing, because the model output can be various format,

I have two cases in mind that you can try add and verify in the UT

list of map,

{ "query": { "bool": { "must": [ { "terms": { "categories": [ {"name": "electronics", "id": 1}, {"name": "computers", "id": 2}, {"name": "phones", "id": 3} ] } } ] } } }

list of nested map (map of map),

{ "query": { "bool": { "must": [ { "nested": { "path": "products", "query": { "bool": { "must": [ { "terms": { "products.variants": [ { "color": { "primary": {"hex": "#FF0000", "name": "red"}, "secondary": {"hex": "#000000", "name": "black"} }, "size": { "dimensions": {"width": 10, "height": 20}, "label": {"us": "M", "eu": "38"} } }, { "color": { "primary": {"hex": "#0000FF", "name": "blue"}, "secondary": {"hex": "#FFFFFF", "name": "white"} }, "size": { "dimensions": {"width": 12, "height": 22}, "label": {"us": "L", "eu": "40"} } } ] } } ] } } } } ] } } }

For Scenario one list of Map I dont think this is a valid OpenSearch query consequently If it runs through the processor it fails the query rewrite

However I did try this, taken from docs :

query_template= { "query": { "bool": { "must": [ ${llm_response} ] } } } llm_response = { "bool": { "should": [ { "match": { "text_entry": "love" } }, { "match": { "text": "hate" } } ] } }, { "bool": { "should": [ { "match": { "text_entry": "life" } }, { "match": { "text": "grace" } } ] } }

mingshl · 2025-05-20T15:36:29Z

Thanks @brianf-aws for taking up the fix!!! overall it's a pretty neat fix, leave some comments.

Signed-off-by: Brian Flores <iflorbri@amazon.com>

brianf-aws · 2025-05-21T18:31:47Z

plugin/src/test/java/org/opensearch/ml/processor/MLInferenceSearchRequestProcessorTests.java

+         *     "output" : [ {
+         *       "name" : "response",
+         *       "dataAsMap" : {
+         * 		  "content": [


Not sure why its displayed out of place on github but looks normal in intelij

jngz-es · 2025-05-21T19:46:21Z

Spotless failed, let's apply spotless.

Signed-off-by: Brian Flores <iflorbri@amazon.com>

jngz-es · 2025-05-21T21:00:20Z

Another error.

* What went wrong:
Execution failed for task ':opensearch-ml-plugin:forbiddenPatterns'.
> Found invalid patterns:
  - tab on line 1671 of plugin/src/test/java/org/opensearch/ml/processor/MLInferenceSearchRequestProcessorTests.java
  - tab on line 1672 of plugin/src/test/java/org/opensearch/ml/processor/MLInferenceSearchRequestProcessorTests.java

Signed-off-by: Brian Flores <iflorbri@amazon.com>

brianf-aws · 2025-05-21T21:55:43Z

Hey @jngz-es I fixed the issue

you can see locally the test passing

./gradlew opensearch-ml-plugin:forbiddenPattern                 
=======================================
OpenSearch Build Hamster says Hello!
  Gradle Version        : 8.11.1
  OS Info               : Mac OS X 15.4.1 (aarch64)
  JDK Version           : 21 (Amazon Corretto JDK)
  JAVA_HOME             : /Users/<>/.sdkman/candidates/java/21.0.5-amzn
  Random Testing Seed   : 9B2C658F659F0CDD
  In FIPS 140 mode      : false
=======================================
[Incubating] Problems report is available at: file:///Users/<>/IdeaProjects/ml-commons/build/reports/problems/problems-report.html

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

For more on this, please refer to https://docs.gradle.org/8.11.1/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.

BUILD SUCCESSFUL in 556ms

I'm guessing tabs are prohibited when writing javadocs

mingshl · 2025-05-22T00:52:15Z

overall it;s looking good, would you increase the codecov?

brianf-aws · 2025-05-22T01:14:44Z

overall it;s looking good, would you increase the codecov?

I only edited one file and introduced one branch. But I did rebase recently so I assume the code coverage extension is judging me based on that too.

dhrubo-os · 2025-05-22T01:20:19Z

overall it;s looking good, would you increase the codecov?

I only edited one file and introduced one branch. But I did rebase recently so I assume the code coverage extension is judging me based on that too.

I think we should be looking at this: https://github.com/opensearch-project/ml-commons/pull/3856/checks?check_run_id=42675669550

We should have a delta coverage report like this

@brianf-aws could you please create an issue on this? Thanks

mingshl

the code coverage for this change is 100%, approved

brianf-aws · 2025-05-22T18:19:43Z

I think we should be looking at this: https://github.com/opensearch-project/ml-commons/pull/3856/checks?check_run_id=42675669550

We should have a delta coverage report like this

@brianf-aws could you please create an issue on this? Thanks

Hey @dhrubo-os I created a issue to track : #3866

…erence Request processor (#3856) * adds json parsing to modelOutputValue during query rewrite Signed-off-by: Brian Flores <iflorbri@amazon.com> * add Unit Test for queryTemplate change Signed-off-by: Brian Flores <iflorbri@amazon.com> * refactors and adds new UT Signed-off-by: Brian Flores <iflorbri@amazon.com> * apply spotless Signed-off-by: Brian Flores <iflorbri@amazon.com> * replace tab with spaces on multi-line comment Signed-off-by: Brian Flores <iflorbri@amazon.com> --------- Signed-off-by: Brian Flores <iflorbri@amazon.com> (cherry picked from commit 532284e)

…erence Request processor (#3856) (#3867) * adds json parsing to modelOutputValue during query rewrite * add Unit Test for queryTemplate change * refactors and adds new UT * apply spotless * replace tab with spaces on multi-line comment --------- (cherry picked from commit 532284e) Signed-off-by: Brian Flores <iflorbri@amazon.com> Co-authored-by: Brian Flores <iflorbri@amazon.com>

…erence Request processor (#3856) * adds json parsing to modelOutputValue during query rewrite Signed-off-by: Brian Flores <iflorbri@amazon.com> * add Unit Test for queryTemplate change Signed-off-by: Brian Flores <iflorbri@amazon.com> * refactors and adds new UT Signed-off-by: Brian Flores <iflorbri@amazon.com> * apply spotless Signed-off-by: Brian Flores <iflorbri@amazon.com> * replace tab with spaces on multi-line comment Signed-off-by: Brian Flores <iflorbri@amazon.com> --------- Signed-off-by: Brian Flores <iflorbri@amazon.com> (cherry picked from commit 532284e)

…erence Request processor (#3856) (#3868) * adds json parsing to modelOutputValue during query rewrite * add Unit Test for queryTemplate change * refactors and adds new UT * apply spotless * replace tab with spaces on multi-line comment --------- (cherry picked from commit 532284e) Signed-off-by: Brian Flores <iflorbri@amazon.com> Co-authored-by: Brian Flores <iflorbri@amazon.com>

…erence Request processor (opensearch-project#3856) * adds json parsing to modelOutputValue during query rewrite Signed-off-by: Brian Flores <iflorbri@amazon.com> * add Unit Test for queryTemplate change Signed-off-by: Brian Flores <iflorbri@amazon.com> * refactors and adds new UT Signed-off-by: Brian Flores <iflorbri@amazon.com> * apply spotless Signed-off-by: Brian Flores <iflorbri@amazon.com> * replace tab with spaces on multi-line comment Signed-off-by: Brian Flores <iflorbri@amazon.com> --------- Signed-off-by: Brian Flores <iflorbri@amazon.com> Signed-off-by: Abdul Muneer Kolarkunnu <muneer.kolarkunnu@netapp.com>

brianf-aws requested review from b4sjoo, dhrubo-os, mingshl, jngz-es, model-collapse, rbhavna, ylwu-amzn, zane-neo, Zhangxunmt, austintlee, HenryL27 and xinyual as code owners May 17, 2025 01:01

brianf-aws temporarily deployed to ml-commons-cicd-env-require-approval May 17, 2025 01:02 — with GitHub Actions Inactive

brianf-aws mentioned this pull request May 17, 2025

Adding Neural Sparse Search preset opensearch-project/dashboards-flow-framework#687

Closed

1 task

brianf-aws temporarily deployed to ml-commons-cicd-env-require-approval May 17, 2025 02:09 — with GitHub Actions Inactive

brianf-aws had a problem deploying to ml-commons-cicd-env-require-approval May 17, 2025 02:09 — with GitHub Actions Failure

mingshl reviewed May 20, 2025

View reviewed changes

plugin/src/main/java/org/opensearch/ml/processor/MLInferenceSearchRequestProcessor.java Outdated Show resolved Hide resolved

mingshl reviewed May 20, 2025

View reviewed changes

brianf-aws added 3 commits May 21, 2025 10:37

adds json parsing to modelOutputValue during query rewrite

5608c63

Signed-off-by: Brian Flores <iflorbri@amazon.com>

add Unit Test for queryTemplate change

46db551

Signed-off-by: Brian Flores <iflorbri@amazon.com>

refactors and adds new UT

f7d0f24

Signed-off-by: Brian Flores <iflorbri@amazon.com>

brianf-aws force-pushed the nested-obj-sub branch from f0dc3af to f7d0f24 Compare May 21, 2025 18:18

brianf-aws commented May 21, 2025

View reviewed changes

apply spotless

de23341

Signed-off-by: Brian Flores <iflorbri@amazon.com>

brianf-aws had a problem deploying to ml-commons-cicd-env-require-approval May 21, 2025 20:32 — with GitHub Actions Failure

Zhangxunmt previously approved these changes May 21, 2025

View reviewed changes

replace tab with spaces on multi-line comment

29c6745

Signed-off-by: Brian Flores <iflorbri@amazon.com>

brianf-aws dismissed Zhangxunmt’s stale review via 29c6745 May 21, 2025 21:53

brianf-aws temporarily deployed to ml-commons-cicd-env-require-approval May 21, 2025 21:54 — with GitHub Actions Inactive

brianf-aws temporarily deployed to ml-commons-cicd-env-require-approval May 21, 2025 23:00 — with GitHub Actions Inactive

mingshl approved these changes May 22, 2025

View reviewed changes

mingshl added the backport 3.0 label May 22, 2025

brianf-aws mentioned this pull request May 22, 2025

[BUG] CodeCov incorrectly runs on subsequent PRs #3866

Open

dhrubo-os approved these changes May 22, 2025

View reviewed changes

dhrubo-os merged commit 532284e into opensearch-project:main May 22, 2025
13 of 14 checks passed

opensearch-trigger-bot bot mentioned this pull request May 22, 2025

[Backport 3.0] Adds Json Parsing to nested object during update Query step in ML Inference Request processor #3867

Merged

mingshl added the backport 2.19 label May 23, 2025

opensearch-trigger-bot bot mentioned this pull request May 23, 2025

[Backport 2.19] Adds Json Parsing to nested object during update Query step in ML Inference Request processor #3868

Merged

saimedhi mentioned this pull request Jun 5, 2025

Added workflow template for Semantic Search using Sparse Encoders opensearch-project/dashboards-flow-framework#742

Merged

1 task

Adds Json Parsing to nested object during update Query step in ML Inference Request processor #3856

Adds Json Parsing to nested object during update Query step in ML Inference Request processor #3856

Uh oh!

Conversation

brianf-aws commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Testing

Check List

Uh oh!

codecov bot commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

mingshl May 20, 2025

Choose a reason for hiding this comment

Uh oh!

brianf-aws May 20, 2025

Choose a reason for hiding this comment

Uh oh!

brianf-aws May 21, 2025

Choose a reason for hiding this comment

Uh oh!

mingshl May 21, 2025

Choose a reason for hiding this comment

Uh oh!

brianf-aws May 22, 2025

Choose a reason for hiding this comment

Uh oh!

mingshl commented May 20, 2025

Uh oh!

brianf-aws May 21, 2025

Choose a reason for hiding this comment

Uh oh!

jngz-es commented May 21, 2025

Uh oh!

jngz-es commented May 21, 2025

Uh oh!

brianf-aws commented May 21, 2025

Uh oh!

mingshl commented May 22, 2025

Uh oh!

brianf-aws commented May 22, 2025

Uh oh!

dhrubo-os commented May 22, 2025

Uh oh!

mingshl left a comment

Choose a reason for hiding this comment

Uh oh!

brianf-aws commented May 22, 2025

Uh oh!

Uh oh!

Uh oh!

brianf-aws commented May 17, 2025 •

edited

Loading

codecov bot commented May 17, 2025 •

edited

Loading