-
Notifications
You must be signed in to change notification settings - Fork 158
Adds Json Parsing to nested object during update Query step in ML Inference Request processor #3856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3856 +/- ##
============================================
- Coverage 78.02% 77.98% -0.04%
- Complexity 7338 7388 +50
============================================
Files 656 658 +2
Lines 33074 33279 +205
Branches 3708 3749 +41
============================================
+ Hits 25806 25954 +148
- Misses 5681 5719 +38
- Partials 1587 1606 +19
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
plugin/src/main/java/org/opensearch/ml/processor/MLInferenceSearchRequestProcessor.java
Outdated
Show resolved
Hide resolved
@@ -360,6 +361,9 @@ private String updateQueryTemplate(String queryTemplate, Map<String, String> out | |||
String newQueryField = outputMapEntry.getKey(); | |||
String modelOutputFieldName = outputMapEntry.getValue(); | |||
Object modelOutputValue = getModelOutputValue(mlOutput, modelOutputFieldName, ignoreMissing, fullResponsePath); | |||
if (modelOutputValue instanceof Map) { | |||
modelOutputValue = new Gson().toJson(modelOutputValue); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the right place to fix the issue, the string convert to json string should happen before string substitution. So the next question is, should this toJson conversion only added when it's a map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Map should be a catch all for many scenarios that I can think of at the top of my head.
I may need to test if this solution doesn't break the query from remote llm. I wonder if a list should also be supported as well for example a list of maps. then this instanceof check wouldnt work in that scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a UT where a llm returns a query and a query rewrite occurs. This happened to succeed with the code change e.g.
/**
* {
* "inference_results" : [ {
* "output" : [ {
* "name" : "response",
* "dataAsMap" : {
* "content": [
* "text": "{\"query\": \"match_all\" : {}}"
* }
* } ]
* } ]
* }
*/
Off the top of my head I cant think of another scenario where this parsing issue occurs since I see that your UTs cover a lot of edge cases such as a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should cover more use cases in testing, because the model output can be various format,
I have two cases in mind that you can try add and verify in the UT
- list of map,
{
"query": {
"bool": {
"must": [
{
"terms": {
"categories": [
{"name": "electronics", "id": 1},
{"name": "computers", "id": 2},
{"name": "phones", "id": 3}
]
}
}
]
}
}
}
- list of nested map (map of map),
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "products",
"query": {
"bool": {
"must": [
{
"terms": {
"products.variants": [
{
"color": {
"primary": {"hex": "#FF0000", "name": "red"},
"secondary": {"hex": "#000000", "name": "black"}
},
"size": {
"dimensions": {"width": 10, "height": 20},
"label": {"us": "M", "eu": "38"}
}
},
{
"color": {
"primary": {"hex": "#0000FF", "name": "blue"},
"secondary": {"hex": "#FFFFFF", "name": "white"}
},
"size": {
"dimensions": {"width": 12, "height": 22},
"label": {"us": "L", "eu": "40"}
}
}
]
}
}
]
}
}
}
}
]
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Scenario one list of Map I dont think this is a valid OpenSearch query consequently If it runs through the processor it fails the query rewrite
However I did try this, taken from docs :
query_template=
{
"query": {
"bool": {
"must": [
${llm_response}
]
}
}
}
llm_response =
{
"bool": {
"should": [
{
"match": {
"text_entry": "love"
}
},
{
"match": {
"text": "hate"
}
}
]
}
},
{
"bool": {
"should": [
{
"match": {
"text_entry": "life"
}
},
{
"match": {
"text": "grace"
}
}
]
}
}
Thanks @brianf-aws for taking up the fix!!! overall it's a pretty neat fix, leave some comments. |
Signed-off-by: Brian Flores <iflorbri@amazon.com>
Signed-off-by: Brian Flores <iflorbri@amazon.com>
Signed-off-by: Brian Flores <iflorbri@amazon.com>
f0dc3af
to
f7d0f24
Compare
* "output" : [ { | ||
* "name" : "response", | ||
* "dataAsMap" : { | ||
* "content": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why its displayed out of place on github but looks normal in intelij
Spotless failed, let's apply spotless. |
Signed-off-by: Brian Flores <iflorbri@amazon.com>
Ah! okay have fixed can you take a look @jngz-es ? |
Another error.
|
Signed-off-by: Brian Flores <iflorbri@amazon.com>
Hey @jngz-es I fixed the issue you can see locally the test passing
I'm guessing tabs are prohibited when writing javadocs |
overall it;s looking good, would you increase the codecov? |
I only edited one file and introduced one branch. But I did rebase recently so I assume the code coverage extension is judging me based on that too. |
I think we should be looking at this: https://github.com/opensearch-project/ml-commons/pull/3856/checks?check_run_id=42675669550 We should have a delta coverage report like this @brianf-aws could you please create an issue on this? Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code coverage for this change is 100%, approved
Description
Ensures that Sparse Vectors can be parsed by ML Inference Request processor during rewrite.
Next steps can be to write a tutorial showing how we can run neural sparse search with ML Inference request processor.
Related Issues
Resolves #3767
Testing
spotlessApply &&
./gradlew test
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.