Skip to content

How to calculate the Correct Path Rate metric? #18

@dhx20150812

Description

@dhx20150812

Hi, @Yifan-Song793

Thanks for releasing this dataset and code.

I have a question: how to calculate the correct path rate based on the tool execution log, for example, for the query and golden solution path:

{
    "query": "give me the number of movies directed by Sofia Coppola",
    "solution": [
        "GET /search/person",
        "GET /person/{person_id}/movie_credits"
    ]
},

I have the following execution log:

Query: give me the number of movies directed by Sofia Coppola
Planner: search for the person with the name "Sofia Coppola"  
API response: Sofia Coppola's person_id is 4567  
collect the list of movies directed by Sofia Coppola whose person_id is 4567  
API response: The list of movies directed by Sofia Coppola includes The Virgin Suicides, Lost in Translation, Marie Antoinette, Somewhere, The Bling Ring, A Very Murray Christmas, and The Beguiled.  
count the number of movies in the list directed by Sofia Coppola  
API response: The number of movies directed by Sofia Coppola is 7.  
Thought: I am finished executing a plan and have the information the user asked for or the data the user asked to create  
Final Answer: Sofia Coppola has directed a total of 7 movies.
API Selector: GET /search/person to search for Sofia Coppola  
API response: Sofia Coppola's person_id is 4567  
Instruction: Continue. Collect the list of movies directed by Sofia Coppola whose person_id is 4567.  
GET /person/4567/movie_credits to collect the list of movies directed by Sofia Coppola.  
API response: The list of movies directed by Sofia Coppola includes The Virgin Suicides, Lost in Translation, Marie Antoinette, Somewhere, The Bling Ring, A Very Murray Christmas, and The Beguiled.  
Instruction: Continue. Count the number of movies in the list directed by Sofia Coppola.  
No API call needed.  
Final Answer: Sofia Coppola has directed a total of 7 movies.

So how should I determine whether the actual execution path matches the golden standard?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions