This repository contains a benchmark for evaluating the search performance (recall) of UnifAI's service discovery functionality.
The benchmark simulates realistic service discovery scenarios by:
- Generating queries that users might use when looking for specific services
- Measuring whether the expected service appears in the search results, and at what position
- Calculating recall@k metrics (the percentage of queries where the expected service appears in the top k results)
The graph shows the recall rate at different values of k, where k represents the top k search results. A higher recall rate indicates better search accuracy.
The benchmark uses generated search queries stored in search_queries.jsonl
. Each query represents a realistic user request paired with the expected service that should be found.
The test data was generated by LLM (with access to UnifAI tools through unifai-mcp-server), and you can view the prompt and generation process here
Results may vary based on the specific search queries used and the total number of available actions. At the time of this benchmark, there were 89 actions available in the UnifAI ecosystem.