You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
langchain, openai, and smolagents, although they are all implemented using openinference and exported with the JSONExporter all have slightly different trace formats.
In order to correctly parse the specific information (e.g. tool use, input/output messages, etc) we need to standardize the outputs before feeding it into the evaluation harness.
Build a class that will load the json trace from either openai/langchain/smolagents and normalize it. Use this normalized output when feeding into the evaluation code.
The text was updated successfully, but these errors were encountered:
Digging into this now and it appears that the normalization process is something that openinference is working on and the issue we are seeing are things that openinference is working to fix.
What doesn't appear to be covered is the idea that since different agent frameworks have different processing steps, the actual traces and sequence of spans that come out are still totally different. If the names of spans and metadata are fixed that will help being able to identify what each span is doing but we still need to work to figure out how to use them.
The answer may not be to normalize the traces per say, but maybe we need to have normalized access functions.
For example, maybe we need to implement methods for retrieve_tool_calls, retrieve_span_name_sequence, or things like that. Trying to normalize the transcripts before trying to access specific characteristics of the trace is looking like a mammoth task that is probably beyond this POC
langchain, openai, and smolagents, although they are all implemented using openinference and exported with the JSONExporter all have slightly different trace formats.
In order to correctly parse the specific information (e.g. tool use, input/output messages, etc) we need to standardize the outputs before feeding it into the evaluation harness.
Build a class that will load the json trace from either openai/langchain/smolagents and normalize it. Use this normalized output when feeding into the evaluation code.
The text was updated successfully, but these errors were encountered: