Skip to content

Adds Json Parsing to nested object during update Query step in ML Inference Request processor #3856

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

brianf-aws
Copy link
Contributor

@brianf-aws brianf-aws commented May 17, 2025

Description

Ensures that Sparse Vectors can be parsed by ML Inference Request processor during rewrite.

Next steps can be to write a tutorial showing how we can run neural sparse search with ML Inference request processor.

Related Issues

Resolves #3767

Testing

spotlessApply && ./gradlew test

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@brianf-aws brianf-aws temporarily deployed to ml-commons-cicd-env-require-approval May 17, 2025 01:02 — with GitHub Actions Inactive
@brianf-aws brianf-aws temporarily deployed to ml-commons-cicd-env-require-approval May 17, 2025 01:02 — with GitHub Actions Inactive
@brianf-aws brianf-aws temporarily deployed to ml-commons-cicd-env-require-approval May 17, 2025 01:02 — with GitHub Actions Inactive
@brianf-aws brianf-aws temporarily deployed to ml-commons-cicd-env-require-approval May 17, 2025 01:02 — with GitHub Actions Inactive
Copy link

codecov bot commented May 17, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.98%. Comparing base (ade09e6) to head (29c6745).
Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3856      +/-   ##
============================================
- Coverage     78.02%   77.98%   -0.04%     
- Complexity     7338     7388      +50     
============================================
  Files           656      658       +2     
  Lines         33074    33279     +205     
  Branches       3708     3749      +41     
============================================
+ Hits          25806    25954     +148     
- Misses         5681     5719      +38     
- Partials       1587     1606      +19     
Flag Coverage Δ
ml-commons 77.98% <100.00%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@brianf-aws brianf-aws temporarily deployed to ml-commons-cicd-env-require-approval May 17, 2025 02:09 — with GitHub Actions Inactive
@brianf-aws brianf-aws had a problem deploying to ml-commons-cicd-env-require-approval May 17, 2025 02:09 — with GitHub Actions Failure
@@ -360,6 +361,9 @@ private String updateQueryTemplate(String queryTemplate, Map<String, String> out
String newQueryField = outputMapEntry.getKey();
String modelOutputFieldName = outputMapEntry.getValue();
Object modelOutputValue = getModelOutputValue(mlOutput, modelOutputFieldName, ignoreMissing, fullResponsePath);
if (modelOutputValue instanceof Map) {
modelOutputValue = new Gson().toJson(modelOutputValue);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the right place to fix the issue, the string convert to json string should happen before string substitution. So the next question is, should this toJson conversion only added when it's a map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Map should be a catch all for many scenarios that I can think of at the top of my head.

I may need to test if this solution doesn't break the query from remote llm. I wonder if a list should also be supported as well for example a list of maps. then this instanceof check wouldnt work in that scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a UT where a llm returns a query and a query rewrite occurs. This happened to succeed with the code change e.g.

        /**
         * {
         *   "inference_results" : [ {
         *     "output" : [ {
         *       "name" : "response",
         *       "dataAsMap" : {
         * 		  "content": [
         * 			"text": "{\"query\": \"match_all\" : {}}"
         *       }
         *     } ]
         *   } ]
         * }
         */

Off the top of my head I cant think of another scenario where this parsing issue occurs since I see that your UTs cover a lot of edge cases such as a list.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should cover more use cases in testing, because the model output can be various format,

I have two cases in mind that you can try add and verify in the UT

  1. list of map,
{
  "query": {
    "bool": {
      "must": [
        {
          "terms": {
            "categories": [
              {"name": "electronics", "id": 1},
              {"name": "computers", "id": 2},
              {"name": "phones", "id": 3}
            ]
          }
        }
      ]
    }
  }
}
  1. list of nested map (map of map),
{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "products",
            "query": {
              "bool": {
                "must": [
                  {
                    "terms": {
                      "products.variants": [
                        {
                          "color": {
                            "primary": {"hex": "#FF0000", "name": "red"},
                            "secondary": {"hex": "#000000", "name": "black"}
                          },
                          "size": {
                            "dimensions": {"width": 10, "height": 20},
                            "label": {"us": "M", "eu": "38"}
                          }
                        },
                        {
                          "color": {
                            "primary": {"hex": "#0000FF", "name": "blue"},
                            "secondary": {"hex": "#FFFFFF", "name": "white"}
                          },
                          "size": {
                            "dimensions": {"width": 12, "height": 22},
                            "label": {"us": "L", "eu": "40"}
                          }
                        }
                      ]
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Scenario one list of Map I dont think this is a valid OpenSearch query consequently If it runs through the processor it fails the query rewrite

However I did try this, taken from docs :

query_template=
{
  "query": {
    "bool": {
      "must": [
       ${llm_response}
      ]
    }
  }
}


llm_response = 
 {
          "bool": {
            "should": [
              {
                "match": {
                  "text_entry": "love"
                }
              },
              {
                "match": {
                  "text": "hate"
                }
              }
            ]
          }
        },
        {
          "bool": {
            "should": [
              {
                "match": {
                  "text_entry": "life"
                }
              },
              {
                "match": {
                  "text": "grace"
                }
              }
            ]
          }
        }

@mingshl
Copy link
Collaborator

mingshl commented May 20, 2025

Thanks @brianf-aws for taking up the fix!!! overall it's a pretty neat fix, leave some comments.

Signed-off-by: Brian Flores <iflorbri@amazon.com>
Signed-off-by: Brian Flores <iflorbri@amazon.com>
Signed-off-by: Brian Flores <iflorbri@amazon.com>
* "output" : [ {
* "name" : "response",
* "dataAsMap" : {
* "content": [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why its displayed out of place on github but looks normal in intelij

@jngz-es
Copy link
Collaborator

jngz-es commented May 21, 2025

Spotless failed, let's apply spotless.

Signed-off-by: Brian Flores <iflorbri@amazon.com>
@brianf-aws
Copy link
Contributor Author

Spotless failed, let's apply spotless.

Ah! okay have fixed can you take a look @jngz-es ?

@brianf-aws brianf-aws had a problem deploying to ml-commons-cicd-env-require-approval May 21, 2025 20:32 — with GitHub Actions Error
@brianf-aws brianf-aws had a problem deploying to ml-commons-cicd-env-require-approval May 21, 2025 20:32 — with GitHub Actions Failure
@brianf-aws brianf-aws had a problem deploying to ml-commons-cicd-env-require-approval May 21, 2025 20:32 — with GitHub Actions Error
@brianf-aws brianf-aws had a problem deploying to ml-commons-cicd-env-require-approval May 21, 2025 20:32 — with GitHub Actions Failure
Zhangxunmt
Zhangxunmt previously approved these changes May 21, 2025
@jngz-es
Copy link
Collaborator

jngz-es commented May 21, 2025

Another error.

* What went wrong:
Execution failed for task ':opensearch-ml-plugin:forbiddenPatterns'.
> Found invalid patterns:
  - tab on line 1671 of plugin/src/test/java/org/opensearch/ml/processor/MLInferenceSearchRequestProcessorTests.java
  - tab on line 1672 of plugin/src/test/java/org/opensearch/ml/processor/MLInferenceSearchRequestProcessorTests.java

Signed-off-by: Brian Flores <iflorbri@amazon.com>
@brianf-aws brianf-aws temporarily deployed to ml-commons-cicd-env-require-approval May 21, 2025 21:54 — with GitHub Actions Inactive
@brianf-aws brianf-aws temporarily deployed to ml-commons-cicd-env-require-approval May 21, 2025 21:54 — with GitHub Actions Inactive
@brianf-aws brianf-aws temporarily deployed to ml-commons-cicd-env-require-approval May 21, 2025 21:54 — with GitHub Actions Inactive
@brianf-aws brianf-aws temporarily deployed to ml-commons-cicd-env-require-approval May 21, 2025 21:54 — with GitHub Actions Inactive
@brianf-aws
Copy link
Contributor Author

Hey @jngz-es I fixed the issue

you can see locally the test passing

./gradlew opensearch-ml-plugin:forbiddenPattern                 
=======================================
OpenSearch Build Hamster says Hello!
  Gradle Version        : 8.11.1
  OS Info               : Mac OS X 15.4.1 (aarch64)
  JDK Version           : 21 (Amazon Corretto JDK)
  JAVA_HOME             : /Users/<>/.sdkman/candidates/java/21.0.5-amzn
  Random Testing Seed   : 9B2C658F659F0CDD
  In FIPS 140 mode      : false
=======================================
[Incubating] Problems report is available at: file:///Users/<>/IdeaProjects/ml-commons/build/reports/problems/problems-report.html

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

For more on this, please refer to https://docs.gradle.org/8.11.1/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.

BUILD SUCCESSFUL in 556ms

I'm guessing tabs are prohibited when writing javadocs

@brianf-aws brianf-aws deployed to ml-commons-cicd-env-require-approval May 21, 2025 23:00 — with GitHub Actions Active
@brianf-aws brianf-aws temporarily deployed to ml-commons-cicd-env-require-approval May 21, 2025 23:00 — with GitHub Actions Inactive
@mingshl
Copy link
Collaborator

mingshl commented May 22, 2025

overall it;s looking good, would you increase the codecov?

@brianf-aws
Copy link
Contributor Author

overall it;s looking good, would you increase the codecov?

I only edited one file and introduced one branch. But I did rebase recently so I assume the code coverage extension is judging me based on that too.

@dhrubo-os
Copy link
Collaborator

overall it;s looking good, would you increase the codecov?

I only edited one file and introduced one branch. But I did rebase recently so I assume the code coverage extension is judging me based on that too.

I think we should be looking at this: https://github.com/opensearch-project/ml-commons/pull/3856/checks?check_run_id=42675669550

We should have a delta coverage report like this

@brianf-aws could you please create an issue on this? Thanks

Copy link
Collaborator

@mingshl mingshl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code coverage for this change is 100%, approved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement] Support nested object substitution in query_template within ML inference Request Processor
5 participants