-
Notifications
You must be signed in to change notification settings - Fork 0
async #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughThe change set replaces legacy async-specific evaluators with unified evaluator classes, migrates the evaluation pipeline to native async, removes the Ollama-specific Ragas evaluator, adds a universal embedding adapter framework, consolidates connectors around a single async base with streaming support, reworks factory/manager to async multi-evaluator orchestration, updates main usage, and refreshes test data/results fields. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant App as App
participant EM as EvaluatorManager
participant EF as EvaluatorFactory
participant AE as AcademicEvaluator
participant RE as RagasEvaluator
App->>EM: initialize_async(config)
EM->>EF: create_all_evaluators_async(config, types)
EF-->>EM: { "ragas": RE, "academic": AE }
EM-->>App: evaluators ready
note over EM,EF: Async multi-evaluator initialization
sequenceDiagram
autonumber
participant EM as EvaluatorManager
participant BE as BaseEvaluator
participant RE as RagasEvaluator
participant AE as AcademicEvaluator
participant RG as ragas.evaluate
participant EA as EmbeddingAdapter
EM->>RE: evaluate_with_timeout(questions,...)
RE->>RG: evaluate(Dataset, metrics, llm, embeddings)
RG-->>RE: scores[]
RE-->>EM: {relevancy[], correctness[], faithfulness[], context_precision[], context_recall[]}
EM->>AE: evaluate_with_timeout(questions,...)
AE->>EA: embed_query(answer), embed_query(ground_truth)
EA-->>AE: vectors
AE-->>EM: {relevancy[], correctness[], completeness[], clarity[]}
note over EM: Aggregates per-evaluator results concurrently
sequenceDiagram
autonumber
participant Client as Client
participant Dify as DifyConnector
participant API as "Dify API (stream)"
Client->>Dify: query_with_timeout(question)
Dify->>API: POST /chat-messages (response_mode=streaming)
API-->>Dify: SSE events (message, message_end)
Dify-->>Client: Aggregated answer + metadata
note over Dify: Parses streaming events and finalizes result
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @boma086, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly refactors the evaluation system by transitioning it to a fully asynchronous architecture. The changes streamline the codebase by removing deprecated synchronous components and adopting native asynchronous capabilities, particularly for Ragas evaluations. This refactoring aims to enhance performance, simplify the system's design, and lay the groundwork for future optimizations outlined in the newly added optimization roadmap.
Highlights
- Unified Asynchronous Evaluator System: The evaluation framework has been refactored to be exclusively asynchronous, removing all legacy synchronous evaluator classes and their associated factories and managers.
- Native Ragas Async Integration: The Ragas evaluator (ragas_evaluator.py) now leverages Ragas's native asynchronous API for both single and batch evaluations, eliminating the need for asyncio.run_in_executor and improving performance.
- Simplified Evaluator Naming: Evaluator class and file names have been streamlined by removing the async_ prefix (e.g., async_academic_evaluator.py is now academic_evaluator.py), as all evaluators are now inherently asynchronous.
- Improved Embedding API Compatibility: The academic evaluator (academic_evaluator.py) now supports multiple embedding API response formats and includes a fallback to text similarity if embedding API keys are not configured.
- Optimization Roadmap: A new optims.md file has been added, detailing a comprehensive plan for future architectural, performance, and code quality improvements, including further async optimizations, unified evaluator management, and enhanced monitoring.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request does a great job of unifying the evaluator architecture around an asynchronous-first approach. Removing the synchronous evaluators and factories simplifies the codebase significantly. The enhancements to the AcademicEvaluator
, such as better embedding API compatibility and more robust response parsing, are also valuable improvements.
However, I've found a few issues that should be addressed:
- There's a critical performance issue in
RagasEvaluator
where a synchronous, blocking call is used within anasync
method, which will freeze the event loop. - The
EvaluatorFactory
can be improved to initialize evaluators concurrently for faster startup. - The renaming of classes and modules has resulted in some confusing aliases and inconsistent naming, which impacts code maintainability.
My detailed comments provide specific suggestions to resolve these points. After addressing these, the PR will be in excellent shape.
result = evaluate( | ||
dataset, | ||
metrics=self.metrics, | ||
llm=self.eval_llm, | ||
embeddings=self.embeddings, | ||
raise_exceptions=False | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call to ragas.evaluate()
is synchronous and will block the event loop. This is a critical issue in an async application, as it negates the performance benefits of asyncio
. The ragas
library provides an async alternative, aevaluate
, which should be used for non-blocking evaluation.
To fix this, you should:
- Change the import on line 4 from
from ragas import evaluate
tofrom ragas import aevaluate
. - Use
await aevaluate(...)
here.
result = evaluate( | |
dataset, | |
metrics=self.metrics, | |
llm=self.eval_llm, | |
embeddings=self.embeddings, | |
raise_exceptions=False | |
) | |
result = await aevaluate( | |
dataset, | |
metrics=self.metrics, | |
llm=self.eval_llm, | |
embeddings=self.embeddings, | |
raise_exceptions=False | |
) |
main_multi_eval_async.py
Outdated
from connectors.universal import UniversalRAGConnector | ||
from evaluators.factory import EvaluatorManager | ||
from evaluators.async_factory import AsyncEvaluatorManager | ||
from evaluators.evaluator_factory import EvaluatorManager as AsyncEvaluatorManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import from evaluators.evaluator_factory import EvaluatorManager as AsyncEvaluatorManager
is confusing. The class is named EvaluatorManager
in evaluator_factory.py
, but it's being aliased to AsyncEvaluatorManager
here. This refactoring has led to inconsistent naming that can make the code harder to understand and maintain.
Consider renaming the classes in evaluator_factory.py
back to AsyncEvaluatorFactory
and AsyncEvaluatorManager
, or updating the consumer code to use the new names directly without aliasing. A consistent naming scheme across the project would be beneficial. For example, if all evaluators are now async, the Async
prefix might be redundant and could be removed everywhere for simplicity, but this should be done consistently.
PR Code Suggestions ✨Explore these optional code suggestions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
evaluators/evaluator_factory.py (1)
52-64
: 按注释并发创建评估器:当前实现仍是串行 await将协程封装为任务并 gather。
- # 并发创建所有评估器 - tasks = [] - for evaluator_type in types: - task = cls.create_evaluator_async(evaluator_type, config) - tasks.append((evaluator_type, task)) - - # 等待所有评估器创建完成 - for evaluator_type, task in tasks: - evaluator = await task - if evaluator: - evaluators[evaluator_type] = evaluator + # 并发创建所有评估器 + task_map = { + evaluator_type: asyncio.create_task(cls.create_evaluator_async(evaluator_type, config)) + for evaluator_type in types + } + results = await asyncio.gather(*task_map.values(), return_exceptions=True) + for evaluator_type, result in zip(task_map.keys(), results): + if isinstance(result, BaseEvaluator): + evaluators[evaluator_type] = resultevaluators/ragas_evaluator.py (1)
117-124
: Do not block the event loop: run ragas.evaluate in a thread.evaluate(...) is synchronous; calling it directly inside async code will block. Use asyncio.to_thread to offload.
- # 使用Ragas评估(同步函数) - result = evaluate( - dataset, - metrics=self.metrics, - llm=self.eval_llm, - embeddings=self.embeddings, - raise_exceptions=False - ) + # 使用线程池避免阻塞事件循环 + result = await asyncio.to_thread( + evaluate, + dataset, + metrics=self.metrics, + llm=self.eval_llm, + embeddings=self.embeddings, + raise_exceptions=False + )
🧹 Nitpick comments (35)
evaluators/base_evaluator.py (1)
26-26
: Consider lowering init log level to debug to reduce noise.Initialization logs can be verbose across many evaluators. Suggest logger.debug or gating by config flag.
- logger.info(f"Evaluator initialized: {name}") + logger.debug("Evaluator initialized: %s", name)evaluators/factory.py (1)
9-16
: Tidy up legacy placeholder: fix lint warnings, typing, and logging.Address ClassVar typing for mutable class attrs, remove f-strings without placeholders, mark optional types, and use logging over print. Also avoid “unused arg” warnings without breaking the public signature.
-from typing import Dict, List, Any, Optional +from typing import Dict, List, Any, Optional, ClassVar, Type +import logging +logger = logging.getLogger(__name__) class EvaluatorFactory: @@ - EVALUATOR_TYPES = { - # 注: 同期評価器は削除されました。非同期バージョンを使用してください。 - } + EVALUATOR_TYPES: ClassVar[Dict[str, Type[BaseEvaluator]]] = {} @@ - DEFAULT_PRIORITY = [] # 同期評価器は使用されません + DEFAULT_PRIORITY: ClassVar[List[str]] = [] # 同期評価器は使用されません @@ - def create_evaluator(cls, evaluator_type: str, config: Dict[str, Any]) -> Optional[BaseEvaluator]: + def create_evaluator(cls, evaluator_type: str, config: Dict[str, Any]) -> Optional[BaseEvaluator]: """指定されたタイプの評価器を作成""" - print(f"⚠️ {evaluator_type} 同期評価器は削除されました。非同期バージョンを使用してください。") + logger.warning("⚠️ %s 同期評価器は削除されました。非同期バージョンを使用してください。", evaluator_type) return None @@ - def create_all_evaluators(cls, config: Dict[str, Any], - types: Optional[List[str]] = None) -> Dict[str, BaseEvaluator]: + def create_all_evaluators(cls, config: Dict[str, Any], + types: Optional[List[str]] = None) -> Dict[str, BaseEvaluator]: """全ての利用可能な評価器を作成""" - print("⚠️ 同期評価器はすべて削除されました。非同期バージョンを使用してください。") + logger.warning("⚠️ 同期評価器はすべて削除されました。非同期バージョンを使用してください。") return {} @@ def get_evaluator_info(cls) -> Dict[str, Dict[str, Any]]: """全評価器の情報を取得""" - print("⚠️ 同期評価器はすべて削除されました。非同期バージョンを使用してください。") + logger.warning("⚠️ 同期評価器はすべて削除されました。非同期バージョンを使用してください。") return {} @@ - return f"{evaluator_type} 同期評価器は削除されました。非同期バージョンを使用してください。" + return f"{evaluator_type} 同期評価器は削除されました。非同期バージョンを使用してください。" @@ - print("⚠️ EvaluatorManager はレガシークラスです。AsyncEvaluatorManager を使用してください。") - self.evaluators = {} - print(f"🔧 同期評価器はすべて削除されました。非同期バージョンを使用してください。") + logger.warning("⚠️ EvaluatorManager はレガシークラスです。AsyncEvaluatorManager を使用してください。") + # 保持して将来の互換性と静的解析の警告解消 + self.chat_config = chat_config + self.embedding_config = embedding_config + self.evaluators = {} + logger.info("🔧 同期評価器はすべて削除されました。非同期バージョンを使用してください。") @@ - def evaluate_all(self, questions: List[str], answers: List[str], - ground_truths: List[str], contexts: List[List[str]] = None) -> Dict[str, Dict[str, List[float]]]: + def evaluate_all(self, questions: List[str], answers: List[str], + ground_truths: List[str], contexts: Optional[List[List[str]]] = None) -> Dict[str, Dict[str, List[float]]]: """全評価器で評価を実行""" - print("⚠️ 同期評価器はすべて削除されました。非同期バージョンを使用してください。") + logger.warning("⚠️ 同期評価器はすべて削除されました。非同期バージョンを使用してください。") return {} @@ - print("⚠️ 同期評価器はすべて削除されました。非同期バージョンを使用してください。") + logger.warning("⚠️ 同期評価器はすべて削除されました。非同期バージョンを使用してください。") return { "total_evaluators": 0, "available_evaluators": [], "evaluator_details": {} }Notes:
- Kept parameter names to avoid breaking keyword callers; stored them on self to satisfy ARG002.
- _get_evaluator_description legitimately uses an f-string with a placeholder; retained.
Also applies to: 18-35, 37-40, 44-55, 58-61, 48-48
results/detailed_evaluation_results.json (1)
1-134
: Add JSON validation to CI
Validate the JSON output in your CI pipeline to catch formatting regressions early. For example, add a step running:jq -e . results/detailed_evaluation_results.json >/dev/null && echo "JSON OK"optims.md (4)
3-8
: 避免绝对路径与命名矛盾;改为仓库相对路径并统一命名策略
- 文档中出现的本地绝对路径如“/Users/mabo/.../ragas_ollama.py”不可移植,建议改为仓库相对路径(如“evaluators/ragas_ollama.py”)。
- 文档要求产物名不再包含“async”,而仓库仍存在“main_multi_eval_async.py”。请在迁移完成后统一命名或在文档中明确过渡期策略。
81-97
: 统一管理器接口与现有 BaseEvaluator API 不一致当前仓库 BaseEvaluator 以 evaluate_answers_async/evaluate_with_timeout 为主,文档里的 UnifiedEvaluatorManager 使用 evaluate/evaluate_async 检测,命名不一致会增加实现/调用负担。建议文档与代码对齐(沿用 evaluate_with_timeout 与 batch 方法)。
212-226
: EvaluationCache 标记为 async 但无 await;考虑改为同步或加锁保证并发安全get/set 无异步 I/O,完全可以改为同步方法;若保留 async,建议至少用 asyncio.Lock 保护 self.cache 写入,避免与 clear_expired 并发竞争(尤其在多任务场景)。
示意(改为同步):
- async def get(...): + def get(...): ... - async def set(...): + def set(...): ...或保留 async 并引入锁:
self._lock = asyncio.Lock() async with self._lock: self.cache[key] = {...}
350-360
: BaseEvaluator 示例与仓库实现不一致文档中的 BaseEvaluator(evaluate/evaluate_batch) 与仓库中 BaseEvaluator(evaluate_answers_async/evaluate_with_timeout/...) 不同。建议统一术语和方法名,避免后续 API 演进产生歧义。
main_multi_eval_async.py (5)
13-15
: 移除未使用的导入,避免歧义from evaluators.factory import EvaluatorManager 未使用且与 AsyncEvaluatorManager 的别名并存,容易混淆。建议删除。
-from evaluators.factory import EvaluatorManager from evaluators.evaluator_factory import EvaluatorManager as AsyncEvaluatorManager
44-51
: 在后台线程读取 JSON,避免阻塞事件循环load_test_cases 当前为同步文件 I/O。建议使用 asyncio.to_thread 包装读取。
async def load_test_cases(self, file_path: str) -> list: - """加载测试用例""" - try: - with open(file_path, 'r', encoding='utf-8') as f: - return json.load(f) + """加载测试用例(在线程池中执行以避免阻塞事件循环)""" + def _read_json(fp: str): + with open(fp, 'r', encoding='utf-8') as f: + return json.load(f) + try: + return await asyncio.to_thread(_read_json, file_path) except Exception as e: raise ValueError(f"测试用例加载失败 {file_path}: {e}")
52-66
: 并发测试连接以提升启动速度test_connections 逐个 await,串行等待。可并发发起连接测试,明显缩短总耗时。
async def test_connections(self) -> Dict[str, bool]: """测试所有连接""" - results = {} - - for system_name, connector in self.connectors.items(): - try: - is_connected = await connector.test_connection_async() - results[system_name] = is_connected - print(f"{'✅' if is_connected else '❌'} {system_name} RAG系统连接{'成功' if is_connected else '失败'}") - except Exception as e: - results[system_name] = False - print(f"❌ {system_name} RAG系统连接测试失败: {e}") - - return results + tasks = { + name: asyncio.create_task(conn.test_connection_async()) + for name, conn in self.connectors.items() + } + results: Dict[str, bool] = {} + for name, task in tasks.items(): + try: + is_ok = await task + results[name] = bool(is_ok) + print(f"{'✅' if is_ok else '❌'} {name} RAG系统连接{'成功' if is_ok else '失败'}") + except Exception as e: + results[name] = False + print(f"❌ {name} RAG系统连接测试失败: {e}") + return results
71-88
: 对各连接器的查询并发执行,减少单题查询延时query_rag_systems 逐个 await,可用 gather 并发执行。
async def query_rag_systems(self, question: str) -> Dict[str, Dict[str, Any]]: """查询所有RAG系统""" - results = {} - - for system_name, connector in self.connectors.items(): - try: - result = await connector.query_with_timeout( - question, - timeout=ASYNC_CONFIG["rag_query_timeout"] - ) - results[system_name] = result - - if result.get("error"): - print(f" {system_name} 错误: {result['error']}") - else: - print(f" {system_name} 成功获取回答") - - except Exception as e: - results[system_name] = {"answer": "", "contexts": [], "error": str(e)} - print(f" {system_name} 查询失败: {e}") - - return results + tasks = { + name: asyncio.create_task( + connector.query_with_timeout(question, timeout=ASYNC_CONFIG["rag_query_timeout"]) + ) + for name, connector in self.connectors.items() + } + results: Dict[str, Dict[str, Any]] = {} + for name, task in tasks.items(): + try: + result = await task + results[name] = result + print(f" {name} {'错误: ' + result['error'] if result.get('error') else '成功获取回答'}") + except Exception as e: + results[name] = {"answer": "", "contexts": [], "error": str(e)} + print(f" {name} 查询失败: {e}") + return results
1-2
: 命名对齐计划文档要求最终产物不再包含“async”。当统一管理器/工厂落地后,建议将本文件重命名为 main_multi_eval.py(并更新调用入口)。
evaluators/evaluator_factory.py (6)
3-3
: 补充 typing:使用 ClassVar/Type/Tuple 以满足 lint 与类型安全为类属性注解 ClassVar,并引入 Type/Tuple。
-from typing import Dict, List, Any, Optional +from typing import Dict, List, Any, Optional, Type, ClassVar, Tuple
37-41
: 工厂错误信息使用 logger,并保留上下文统一日志出口,方便生产观测。
- print(f"⚠️ {evaluator_type}评估器不可用") + logger.warning("评估器不可用: %s", evaluator_type) return None except Exception as e: - print(f"❌ {evaluator_type}评估器创建失败: {e}") + logger.exception("评估器创建失败: %s", evaluator_type) return None
115-116
: 移除无占位符 f-string 或改为 logger避免 F541。
- print(f"🔧 评估器管理器初始化完成") + logger.info("🔧 评估器管理器初始化完成")
138-140
: PEP 484:Optional 显式标注将 contexts 的默认 None 改为 Optional[...]。
- async def evaluate_all_async(self, questions: List[str], answers: List[str], - ground_truths: List[str], contexts: List[List[str]] = None) -> Dict[str, Dict[str, List[float]]]: + async def evaluate_all_async(self, questions: List[str], answers: List[str], + ground_truths: List[str], contexts: Optional[List[List[str]]] = None + ) -> Dict[str, Dict[str, List[float]]]:
144-159
: 统一使用 logger 取代 print,便于接入结构化日志与等级控制运行期输出应通过 logger。
- print(f"\n📊 使用{evaluator_name}评估器评估中...") + logger.info("📊 使用评估器评估: %s", evaluator_name) ... - print(f" ✅ 完成") + logger.info(" ✅ 完成: %s", evaluator_name) - except Exception as e: - print(f" ❌ 失败: {e}") + except Exception as e: + logger.exception(" ❌ 失败: %s", evaluator_name) ... - return all_results + return all_results
100-103
: (可选)全角括号导致 lint 警告若启用 Ruff RUF001,可将描述字符串中的全角括号(())改为半角 ()。不影响功能,仅为消除告警。
evaluators/ragas_evaluator.py (9)
86-90
: Type hints allow None but signature says Dict[str, float].You return None values when unavailable/empty answer; annotate return and context accordingly.
- async def evaluate_single_answer_async(self, question: str, answer: str, ground_truth: str, context: List[str] = None) -> Dict[str, float]: + async def evaluate_single_answer_async( + self, + question: str, + answer: str, + ground_truth: str, + context: Optional[List[str]] = None + ) -> Dict[str, Optional[float]]:
102-104
: Same Optional typing for helper.Keep helper consistent with public method.
- async def _evaluate_ragas_native_async(self, question: str, answer: str, ground_truth: str, context: List[str] = None) -> Dict[str, float]: + async def _evaluate_ragas_native_async( + self, + question: str, + answer: str, + ground_truth: str, + context: Optional[List[str]] = None + ) -> Dict[str, Optional[float]]:
167-175
: Validate input lengths for batch.Guard against mismatched list lengths to avoid silent misalignment.
async def evaluate_answers_async(self, questions: List[str], answers: List[str], ground_truths: List[str], contexts: List[List[str]] = None) -> Dict[str, List[float]]: """使用Ragas原生异步API批量评估多个答案""" if not self._available: return {"relevancy": [None] * len(answers), "correctness": [None] * len(answers), "faithfulness": [None] * len(answers), "context_precision": [None] * len(answers), "context_recall": [None] * len(answers)} - + if not (len(questions) == len(answers) == len(ground_truths)): + raise ValueError("questions, answers, and ground_truths must have the same length") + try: # 准备评估数据 - eval_contexts = contexts if contexts else [['相关上下文'] for _ in range(len(questions))] + eval_contexts = contexts if contexts else [['相关上下文'] for _ in range(len(questions))]
30-37
: Use BaseEvaluator timeout and avoid hard-coded values.Propagate self.timeout to the client for consistency and configurability.
- self.eval_llm = LangchainLLMWrapper(ChatOpenAI( + self.eval_llm = LangchainLLMWrapper(ChatOpenAI( api_key=config["api_key"], base_url=config["base_url"], model=config.get("model", "gpt-3.5-turbo"), temperature=0, max_tokens=1000, - timeout=60 # 增加超时时间 + timeout=self.timeout ))
55-60
: Default OpenAI embedding model is outdated.text-embedding-ada-002 is deprecated. Prefer text-embedding-3-small or -3-large.
- self.embeddings = OpenAIEmbeddings( + self.embeddings = OpenAIEmbeddings( api_key=config.get("embedding_api_key", config["api_key"]), - model=embedding_config.get("model", "text-embedding-ada-002") + model=embedding_config.get("model", "text-embedding-3-small") ) - embedding_name = f"{embedding_config.get('model', 'text-embedding-ada-002')} (OpenAI)" + embedding_name = f"{embedding_config.get('model', 'text-embedding-3-small')} (OpenAI)"
89-101
: Inconsistent “missing score” semantics across evaluators.This class returns None on failure/unavailable, while BaseEvaluator defaults to 0.0. Align one way to avoid downstream schema confusion.
Do you want to standardize on None for “not evaluated” and update BaseEvaluator._get_default_scores/_get_default_single_score accordingly?
71-83
: Prefer logger over print for observability.Route init diagnostics through logging to enable level control and structured sinks.
155-156
: Avoid noisy prints in hot paths.Consider logging at debug level instead of printing every sample’s scores.
253-272
: Broad exception swallowing; tighten handlers.Catching bare Exception repeatedly hides actionable errors. Narrow where possible or at least log with traceback.
evaluators/academic_evaluator.py (8)
95-105
: Optional typing for context and return values.Methods return None for metrics in several paths; reflect that in annotations.
- async def evaluate_single_answer_async(self, question: str, answer: str, - ground_truth: str, context: List[str] = None) -> Dict[str, float]: + async def evaluate_single_answer_async( + self, + question: str, + answer: str, + ground_truth: str, + context: Optional[List[str]] = None + ) -> Dict[str, float]:
110-117
: Same Optional for helpers.Keep signatures consistent for readability and type checkers.
- async def _evaluate_hybrid_mode(self, question: str, answer: str, ground_truth: str, context: List[str] = None) -> Dict[str, float]: + async def _evaluate_hybrid_mode(self, question: str, answer: str, ground_truth: str, context: Optional[List[str]] = None) -> Dict[str, float]:
140-149
: Same Optional for pure chat path.- async def _evaluate_pure_chat_mode(self, question: str, answer: str, ground_truth: str, context: List[str] = None) -> Dict[str, float]: + async def _evaluate_pure_chat_mode(self, question: str, answer: str, ground_truth: str, context: Optional[List[str]] = None) -> Dict[str, float]:
298-306
: Same Optional for quality helper.- async def _assess_enhanced_quality_with_chat_model(self, question: str, answer: str, ground_truth: str, context: List[str] = None) -> Dict[str, float]: + async def _assess_enhanced_quality_with_chat_model(self, question: str, answer: str, ground_truth: str, context: Optional[List[str]] = None) -> Dict[str, float]:
382-384
: Avoid bare except.Use except Exception as e to prevent masking system-exiting exceptions and aid debugging.
- except: + except Exception: pass
419-421
: Avoid bare except.Same change here.
- except: + except Exception: pass
523-526
: Sync wrappers may fail inside running event loops.get_event_loop().run_until_complete will raise when a loop is already running. Consider an executor or exposing only async API.
24-29
: Add timeout to ChatOpenAI for consistency with BaseEvaluator.self.chat_llm = ChatOpenAI( api_key=config.get("chat_api_key", config.get("api_key")), base_url=config.get("chat_base_url", config.get("base_url")), model=config.get("chat_model", config.get("model", "gpt-3.5-turbo")), - temperature=0 + temperature=0, + timeout=self.timeout )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
results/multi_evaluation_results.csv
is excluded by!**/*.csv
📒 Files selected for processing (10)
evaluators/__init__.py
(1 hunks)evaluators/academic_evaluator.py
(7 hunks)evaluators/base_evaluator.py
(3 hunks)evaluators/evaluator_factory.py
(6 hunks)evaluators/factory.py
(1 hunks)evaluators/ragas_evaluator.py
(6 hunks)evaluators/ragas_ollama.py
(0 hunks)main_multi_eval_async.py
(1 hunks)optims.md
(1 hunks)results/detailed_evaluation_results.json
(3 hunks)
💤 Files with no reviewable changes (1)
- evaluators/ragas_ollama.py
🧰 Additional context used
🧬 Code graph analysis (6)
main_multi_eval_async.py (2)
evaluators/evaluator_factory.py (1)
EvaluatorManager
(105-173)evaluators/factory.py (1)
EvaluatorManager
(41-63)
evaluators/factory.py (3)
evaluators/base_evaluator.py (2)
BaseEvaluator
(10-184)get_evaluator_info
(171-184)evaluators/evaluator_factory.py (5)
EvaluatorFactory
(12-103)get_evaluator_info
(67-94)_get_evaluator_description
(97-103)EvaluatorManager
(105-173)get_evaluator_summary
(162-173)evaluators/ragas_evaluator.py (1)
get_evaluator_info
(282-290)
evaluators/__init__.py (5)
evaluators/base_evaluator.py (1)
BaseEvaluator
(10-184)evaluators/academic_evaluator.py (1)
AcademicEvaluator
(14-533)evaluators/ragas_evaluator.py (1)
RagasEvaluator
(21-290)evaluators/evaluator_factory.py (2)
EvaluatorFactory
(12-103)EvaluatorManager
(105-173)evaluators/factory.py (2)
EvaluatorFactory
(6-39)EvaluatorManager
(41-63)
evaluators/academic_evaluator.py (1)
evaluators/base_evaluator.py (1)
BaseEvaluator
(10-184)
evaluators/evaluator_factory.py (3)
evaluators/base_evaluator.py (1)
BaseEvaluator
(10-184)evaluators/academic_evaluator.py (1)
AcademicEvaluator
(14-533)evaluators/ragas_evaluator.py (1)
RagasEvaluator
(21-290)
evaluators/ragas_evaluator.py (2)
evaluators/base_evaluator.py (2)
BaseEvaluator
(10-184)evaluate_answers_async
(29-43)evaluators/academic_evaluator.py (1)
evaluate_answers_async
(46-92)
🪛 LanguageTool
optims.md
[uncategorized] ~19-~19: 动词的修饰一般为‘形容词(副词)+地+动词’。您的意思是否是:不必要"地"同步
Context: ...cutor` 包装同步代码,而非利用 Ragas 的原生异步能力 - 存在不必要的同步依赖,影响性能和架构清晰度 - 异步实现本质上是"异步外壳 + 同步内核"的...
(wb4)
🪛 Ruff (0.12.2)
evaluators/factory.py
10-12: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
15-15: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
18-18: Unused class method argument: config
(ARG003)
24-24: Unused class method argument: config
(ARG003)
25-25: Unused class method argument: types
(ARG003)
44-44: Unused method argument: chat_config
(ARG002)
44-44: Unused method argument: embedding_config
(ARG002)
48-48: f-string without any placeholders
Remove extraneous f
prefix
(F541)
50-50: Unused method argument: questions
(ARG002)
50-50: Unused method argument: answers
(ARG002)
51-51: Unused method argument: ground_truths
(ARG002)
51-51: Unused method argument: contexts
(ARG002)
51-51: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
evaluators/academic_evaluator.py
42-42: Do not catch blind exception: Exception
(BLE001)
217-217: Docstring contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF002)
217-217: Docstring contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF002)
220-220: Comment contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF003)
evaluators/evaluator_factory.py
16-19: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
22-22: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
39-39: Do not catch blind exception: Exception
(BLE001)
87-87: Do not use bare except
(E722)
100-100: String contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF001)
100-100: String contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF001)
101-101: String contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF001)
101-101: String contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF001)
115-115: f-string without any placeholders
Remove extraneous f
prefix
(F541)
139-139: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
evaluators/ragas_evaluator.py
98-98: Do not catch blind exception: Exception
(BLE001)
102-102: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
117-117: Comment contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF003)
117-117: Comment contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF003)
168-168: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
186-186: Comment contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF003)
186-186: Comment contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF003)
205-205: Comment contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF003)
244-244: Comment contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF003)
251-251: String contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF001)
253-253: Do not catch blind exception: Exception
(BLE001)
262-268: Consider moving this statement to an else
block
(TRY300)
270-270: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (5)
evaluators/base_evaluator.py (1)
1-1
: Header comment clarity improved.No action required.
results/detailed_evaluation_results.json (1)
21-36
: Confirm downstream handling of nulls in metric arrays.faithfulness includes nulls; ensure any consumers ignore nulls or impute safely to avoid NaN in aggregates.
optims.md (2)
127-135
: Pydantic v2 设置写法更新建议(若项目使用 v2)文档示例使用 v1 的 Config 内嵌类。若项目已升级到 Pydantic v2,建议改为 SettingsConfigDict 写法,避免迁移时踩坑。
若为 v2,可按下述思路调整(示意):
from pydantic_settings import BaseSettings, SettingsConfigDict class AppConfig(BaseSettings): database: DatabaseConfig = DatabaseConfig() llm: LLMConfig = LLMConfig() evaluation: EvaluationConfig = EvaluationConfig() model_config = SettingsConfigDict(env_file=".env", env_nested_delimiter="__")
31-44
: 确认 Ragas 原生异步接口可用
Ragas metrics 已在官方文档中提供single_turn_ascore(sample, callbacks=None, timeout=None)
与multi_turn_ascore(sample, callbacks=None, timeout=None)
异步方法,示例中await metric.single_turn_ascore(sample)
合规 (docs.ragas.io);顶层evaluate(..., is_async=True)
支持批量异步执行。evaluators/__init__.py (1)
12-21
: all matches re-exports.After the import fix, all looks good.
from .base import BaseEvaluator | ||
from .ragas_ollama import RagasOllamaEvaluator # Re-enabled with compatible versions | ||
# from .academic_evaluator import AcademicEvaluator # Removed in favor of async version | ||
from .factory import EvaluatorFactory, EvaluatorManager | ||
from .async_base import AsyncBaseEvaluator | ||
from .async_academic_evaluator import AsyncAcademicEvaluator | ||
from .async_ragas_evaluator import AsyncRagasEvaluator | ||
from .async_factory import AsyncEvaluatorFactory, AsyncEvaluatorManager | ||
from .base_evaluator import BaseEvaluator as AsyncBaseEvaluator | ||
from .academic_evaluator import AcademicEvaluator | ||
from .ragas_evaluator import RagasEvaluator | ||
from .factory import EvaluatorFactory, EvaluatorManager # Legacy classes | ||
from .evaluator_factory import EvaluatorFactory as AsyncEvaluatorFactory, EvaluatorManager as AsyncEvaluatorManager | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broken import: .base does not exist.
from .base import BaseEvaluator will ImportError; also you want to re-export BaseEvaluator twice. Import solely from base_evaluator and alias.
-from .base import BaseEvaluator
-from .base_evaluator import BaseEvaluator as AsyncBaseEvaluator
+from .base_evaluator import BaseEvaluator
+AsyncBaseEvaluator = BaseEvaluator
from .academic_evaluator import AcademicEvaluator
from .ragas_evaluator import RagasEvaluator
-from .factory import EvaluatorFactory, EvaluatorManager # Legacy classes
-from .evaluator_factory import EvaluatorFactory as AsyncEvaluatorFactory, EvaluatorManager as AsyncEvaluatorManager
+from .factory import EvaluatorFactory, EvaluatorManager # Legacy classes
+from .evaluator_factory import EvaluatorFactory as AsyncEvaluatorFactory, EvaluatorManager as AsyncEvaluatorManager
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from .base import BaseEvaluator | |
from .ragas_ollama import RagasOllamaEvaluator # Re-enabled with compatible versions | |
# from .academic_evaluator import AcademicEvaluator # Removed in favor of async version | |
from .factory import EvaluatorFactory, EvaluatorManager | |
from .async_base import AsyncBaseEvaluator | |
from .async_academic_evaluator import AsyncAcademicEvaluator | |
from .async_ragas_evaluator import AsyncRagasEvaluator | |
from .async_factory import AsyncEvaluatorFactory, AsyncEvaluatorManager | |
from .base_evaluator import BaseEvaluator as AsyncBaseEvaluator | |
from .academic_evaluator import AcademicEvaluator | |
from .ragas_evaluator import RagasEvaluator | |
from .factory import EvaluatorFactory, EvaluatorManager # Legacy classes | |
from .evaluator_factory import EvaluatorFactory as AsyncEvaluatorFactory, EvaluatorManager as AsyncEvaluatorManager | |
from .base_evaluator import BaseEvaluator | |
AsyncBaseEvaluator = BaseEvaluator | |
from .academic_evaluator import AcademicEvaluator | |
from .ragas_evaluator import RagasEvaluator | |
from .factory import EvaluatorFactory, EvaluatorManager # Legacy classes | |
from .evaluator_factory import EvaluatorFactory as AsyncEvaluatorFactory, EvaluatorManager as AsyncEvaluatorManager |
🤖 Prompt for AI Agents
In evaluators/__init__.py around lines 3 to 9, remove the broken import "from
.base import BaseEvaluator" and instead import BaseEvaluator only from
.base_evaluator (aliasing if you need an async name), ensuring you don't
re-export the same symbol twice; keep the other evaluator and factory imports
as-is and export a single BaseEvaluator symbol (plus any intentional aliases)
from base_evaluator.
evaluators/evaluator_factory.py
Outdated
# 可用的评估器类型 | ||
EVALUATOR_TYPES = { | ||
"async_academic": AsyncAcademicEvaluator, | ||
"async_ragas": AsyncRagasEvaluator | ||
"academic": AcademicEvaluator, | ||
"ragas": RagasEvaluator | ||
} | ||
|
||
# 默认评估器优先级 | ||
DEFAULT_PRIORITY = ["async_ragas", "async_academic"] | ||
DEFAULT_PRIORITY = ["ragas", "academic"] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
类属性应为不可变并显式声明 ClassVar
避免可变类属性被实例共享篡改;同时用 Type[BaseEvaluator] 标注。
- # 可用的评估器类型
- EVALUATOR_TYPES = {
- "academic": AcademicEvaluator,
- "ragas": RagasEvaluator
- }
+ # 可用的评估器类型
+ EVALUATOR_TYPES: ClassVar[Dict[str, Type[BaseEvaluator]]] = {
+ "academic": AcademicEvaluator,
+ "ragas": RagasEvaluator,
+ }
...
- # 默认评估器优先级
- DEFAULT_PRIORITY = ["ragas", "academic"]
+ # 默认评估器优先级(使用不可变元组)
+ DEFAULT_PRIORITY: ClassVar[Tuple[str, ...]] = ("ragas", "academic")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# 可用的评估器类型 | |
EVALUATOR_TYPES = { | |
"async_academic": AsyncAcademicEvaluator, | |
"async_ragas": AsyncRagasEvaluator | |
"academic": AcademicEvaluator, | |
"ragas": RagasEvaluator | |
} | |
# 默认评估器优先级 | |
DEFAULT_PRIORITY = ["async_ragas", "async_academic"] | |
DEFAULT_PRIORITY = ["ragas", "academic"] | |
# 可用的评估器类型 | |
EVALUATOR_TYPES: ClassVar[Dict[str, Type[BaseEvaluator]]] = { | |
"academic": AcademicEvaluator, | |
"ragas": RagasEvaluator, | |
} | |
# 默认评估器优先级(使用不可变元组) | |
DEFAULT_PRIORITY: ClassVar[Tuple[str, ...]] = ("ragas", "academic") |
🧰 Tools
🪛 Ruff (0.12.2)
16-19: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
22-22: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
🤖 Prompt for AI Agents
In evaluators/evaluator_factory.py around lines 15 to 23, the class-level
EVALUATOR_TYPES and DEFAULT_PRIORITY are currently mutable and lack proper
typing; change their declarations to use typing.ClassVar with explicit types
(EVALUATOR_TYPES: ClassVar[Mapping[str, Type[BaseEvaluator]]] and
DEFAULT_PRIORITY: ClassVar[Tuple[str, ...]]), replace the mutable dict with an
immutable Mapping (or wrap with types.MappingProxyType) and use a tuple (or
frozenset/tuple) for DEFAULT_PRIORITY to prevent instance-level mutation and
ensure correct static typing.
evaluators/evaluator_factory.py
Outdated
info[name] = { | ||
"name": name, | ||
"supported_metrics": [], | ||
"description": cls._get_evaluator_description(name), | ||
"async": True | ||
"description": cls._get_evaluator_description(name) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
避免 bare except;记录异常上下文
bare except 会吞掉系统异常。改为捕获 Exception 并记录。
- except:
- info[name] = {
+ except Exception as e:
+ logger.debug("获取评估器信息失败: %s (%s)", name, e)
+ info[name] = {
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
except: | |
info[name] = { | |
"name": name, | |
"supported_metrics": [], | |
"description": cls._get_evaluator_description(name), | |
"async": True | |
"description": cls._get_evaluator_description(name) | |
} | |
except Exception as e: | |
logger.debug("获取评估器信息失败: %s (%s)", name, e) | |
info[name] = { | |
"name": name, | |
"supported_metrics": [], | |
"description": cls._get_evaluator_description(name) | |
} |
🧰 Tools
🪛 Ruff (0.12.2)
87-87: Do not use bare except
(E722)
🤖 Prompt for AI Agents
In evaluators/evaluator_factory.py around lines 87 to 92, replace the bare
"except:" with "except Exception as e" and record the exception context before
continuing to populate info[name]; use the module logger (or import logging) and
call logger.exception or logging.exception with a message that includes the
evaluator name so the stack trace and error message are preserved, then keep the
existing info[name] assignment unchanged.
# 評価器ファクトリー - Factory Pattern実装 (Legacy - すべて非同期バージョンを使用) | ||
|
||
from typing import Dict, List, Any, Optional | ||
from .base import BaseEvaluator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broken import: module name mismatch.
This will raise ImportError at import time. The base class lives in base_evaluator.py.
-from .base import BaseEvaluator
+from .base_evaluator import BaseEvaluator
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from .base import BaseEvaluator | |
++ b/evaluators/factory.py | |
@@ -1,4 +1,4 @@ | |
from .base_evaluator import BaseEvaluator |
🤖 Prompt for AI Agents
In evaluators/factory.py around line 4 the import "from .base import
BaseEvaluator" is incorrect because the base class is defined in
base_evaluator.py; change the import to reference the correct module name (e.g.,
from .base_evaluator import BaseEvaluator) so the module loads without
ImportError.
optims.md
Outdated
tasks = [self.process_batch(evaluator, batch) for batch in batches] | ||
results = await asyncio.gather(*tasks, return_exceptions=True) | ||
|
||
# 展平结果 | ||
final_results = [] | ||
for result in results: | ||
if isinstance(result, Exception): | ||
final_results.extend([{"error": str(result)} for _ in batch]) | ||
else: | ||
final_results.extend(result) | ||
|
||
return final_results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug:异常分支引用未定义的 batch,导致 NameError/错误填充数量
在 process_all 的展平阶段,异常分支使用了未定义变量 batch。应依据批次索引使用 batches[idx] 的长度进行占位。并且建议真正并发创建任务。
应用以下修复:
- tasks = [self.process_batch(evaluator, batch) for batch in batches]
- results = await asyncio.gather(*tasks, return_exceptions=True)
+ tasks = [asyncio.create_task(self.process_batch(evaluator, batch)) for batch in batches]
+ results = await asyncio.gather(*tasks, return_exceptions=True)
# 展平结果
final_results = []
- for result in results:
- if isinstance(result, Exception):
- final_results.extend([{"error": str(result)} for _ in batch])
+ for idx, result in enumerate(results):
+ if isinstance(result, Exception):
+ final_results.extend([{"error": str(result)} for _ in batches[idx]])
else:
final_results.extend(result)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
tasks = [self.process_batch(evaluator, batch) for batch in batches] | |
results = await asyncio.gather(*tasks, return_exceptions=True) | |
# 展平结果 | |
final_results = [] | |
for result in results: | |
if isinstance(result, Exception): | |
final_results.extend([{"error": str(result)} for _ in batch]) | |
else: | |
final_results.extend(result) | |
return final_results | |
tasks = [asyncio.create_task(self.process_batch(evaluator, batch)) for batch in batches] | |
results = await asyncio.gather(*tasks, return_exceptions=True) | |
# 展平结果 | |
final_results = [] | |
for idx, result in enumerate(results): | |
if isinstance(result, Exception): | |
final_results.extend([{"error": str(result)} for _ in batches[idx]]) | |
else: | |
final_results.extend(result) | |
return final_results |
🤖 Prompt for AI Agents
In optims.md around lines 278 to 289, the exception branch in the
result-flattening loop references an undefined variable batch and will mis-fill
placeholders; change the code to create tasks concurrently (build tasks =
[self.process_batch(evaluator, batch) for batch in batches] before awaiting) and
when iterating results use enumerate so on exception you can use batches[idx]
(or its length) to generate the correct number of error placeholder entries
(e.g., final_results.extend([{"error": str(result)} for _ in batches[idx]]));
ensure non-exception results are extended as before.
@claude fix it。 |
No operation ID found for this PR |
@claude fix 1 |
No operation ID found for this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
connectors/dify.py (3)
27-48
: Add Accept header for SSE when using streaming mode.Dify streaming uses text/event-stream. Without Accept, some gateways may downgrade to buffered JSON.
return { "method": "POST", "url": f"{self.config['base_url']}/chat-messages", "headers": { "Authorization": f"Bearer {self.config['api_key']}", - "Content-Type": "application/json" + "Content-Type": "application/json", + "Accept": "text/event-stream" }, "body": { "inputs": { "background": "I am a software developer working on Japanese apps", "instruction": "Please help with software development questions" }, "query": question, "response_mode": "streaming", "auto_generate_name": True, "user": user_id } }
50-71
: Propagate timeouts and raise-for-status; use typed exceptions.Currently no client timeout is set and HTTP errors rely on manual status checks. Use aiohttp.ClientTimeout and raise_for_status; convert to a custom exception and chain original errors.
async def send_request_async(self, request_data: Dict[str, Any]) -> Dict[str, Any]: """异步发送HTTP请求到Dify API""" headers = request_data["headers"] url = request_data["url"] body = request_data["body"] try: - async with aiohttp.ClientSession() as session: - async with session.post(url, headers=headers, json=body) as response: - if response.status == 200: - # 检查是否是streaming模式 - if body.get("response_mode") == "streaming": - return await self._parse_streaming_response(response) - else: - return await response.json() - else: - error_text = await response.text() - raise Exception(f"Dify API error: {response.status} - {error_text}") - except asyncio.TimeoutError: - raise Exception("Dify API请求超时") - except Exception as e: - raise Exception(f"Dify API请求失败: {str(e)}") + timeout = aiohttp.ClientTimeout(total=self.timeout) + async with aiohttp.ClientSession(timeout=timeout) as session: + async with session.post(url, headers=headers, json=body) as response: + response.raise_for_status() + if body.get("response_mode") == "streaming": + return await self._parse_streaming_response(response) + return await response.json() + except asyncio.TimeoutError as e: + raise DifyAPIError("Dify API请求超时") from e + except aiohttp.ClientError as e: + raise DifyAPIError(f"Dify API请求失败: {e!s}") from e + except Exception as e: + raise DifyAPIError(f"Dify API未知错误: {e!s}") from eAdd once at module scope:
class DifyAPIError(Exception): passAlso applies to: 60-65
115-121
: Remove sys.path hacking; use normal imports.sys.path mutation is brittle and can mask packaging issues.
- import sys - import os - sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) - from utils.async_utils import AsyncUtils + from utils.async_utils import AsyncUtilsOptionally move the import to module scope if import cost is small.
connectors/ragflow.py (2)
45-47
: Timeouts not actually enforced in HTTP calls.You catch asyncio.TimeoutError but aiohttp calls have no timeout configured; requests may hang indefinitely. Pass ClientTimeout.
- async with aiohttp.ClientSession() as session: + timeout = aiohttp.ClientTimeout(total=self.timeout) + async with aiohttp.ClientSession(timeout=timeout) as session: async with session.post(url, headers=headers, json=body) as response: if response.status == 200: return await response.json() else: error_text = await response.text() - raise Exception(f"RagFlow API error: {response.status} - {error_text}") + raise Exception(f"RagFlow API error: {response.status} - {error_text}")
59-63
: Remove sys.path hacks; use proper imports.Manipulating sys.path inside a hot path is brittle and slows calls. Import AsyncUtils at module scope via the package path.
- import sys - import os - sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) - from utils.async_utils import AsyncUtils + from utils.async_utils import AsyncUtilsIf package layout requires relative import, switch to:
from ..utils.async_utils import AsyncUtilsevaluators/ragas_evaluator.py (1)
117-124
: Blocking call: wrap ragas.evaluate in a thread or use aevaluateThis blocks the event loop. Use asyncio.to_thread (or ragas.aevaluate if available).
- # 使用Ragas评估(同步函数) - result = evaluate( - dataset, - metrics=self.metrics, - llm=self.eval_llm, - embeddings=self.embeddings, - raise_exceptions=False - ) + # 非阻塞执行同步evaluate + result = await asyncio.to_thread( + evaluate, + dataset, + metrics=self.metrics, + llm=self.eval_llm, + embeddings=self.embeddings, + raise_exceptions=False + )
♻️ Duplicate comments (5)
results/detailed_evaluation_results.json (1)
11-12
: Fix invalid JSON: unterminated/truncated dify_answer strings.The value near Line 11 ends mid-SQL and lacks closing quotes/backticks; the API-performance answer also appears truncated. This breaks JSON parsers.
- "dify_answer": "語彙データベースを設計する際は、以下の構造が適切です:\n\n1. **基本テーブル構造**\n - **words**(単語)\n ...snip...\n**実装例(SQL)**:\n```sql\nCREATE TABLE words (\n id INTEGER PRIMARY KEY," + "dify_answer": "語彙データベースを設計する際は、以下の構造が適切です:\n\n1. **基本テーブル構造**\n - **words**(単語)\n ...snip...\n**実装例(SQL)**:\n```sql\nCREATE TABLE words (\n id INTEGER PRIMARY KEY\n);\n```\n" @@ - "dify_answer": "To optimize API performance for delivering lesson content in your Japanese app, consider the following strategies:\n\n### 1. **Caching Mechanisms**\n ...snip...\n### 7. **Load Balancing & Auto-Scaling**\n - Distribute traffic across multiple servers using Nginx, HAProxy, or cloud-native solutions (e.g., AWS ALB).\n - Enable auto-scaling to handle traffic spikes during peak learning hours in Japan.\n\n###" + "dify_answer": "To optimize API performance for delivering lesson content in your Japanese app, consider the following strategies:\n\n### 1. **Caching Mechanisms**\n ...snip...\n### 7. **Load Balancing & Auto-Scaling**\n - Distribute traffic across multiple servers using Nginx, HAProxy, or cloud-native solutions (e.g., AWS ALB).\n - Enable auto-scaling to handle traffic spikes during peak learning hours in Japan.\n"Also applies to: 21-21
evaluators/ragas_evaluator.py (1)
186-193
: Batch path still blocks the loop; mirror the to_thread changeSame issue as single-item path.
- # 使用Ragas批量评估(同步函数) - result = evaluate( - dataset, - metrics=self.metrics, - llm=self.eval_llm, - embeddings=self.embeddings, - raise_exceptions=False - ) + # 使用线程池避免阻塞事件循环 + result = await asyncio.to_thread( + evaluate, + dataset, + metrics=self.metrics, + llm=self.eval_llm, + embeddings=self.embeddings, + raise_exceptions=False + )evaluators/academic_evaluator.py (1)
35-40
: Add validation for embedding_base_url when embedding_api_key is provided.The code might attempt to call
.rstrip('/')
onNone
ifembedding_base_url
is not provided butembedding_api_key
is set.Based on the code in
utils/embedding_adapter.py
, theEmbeddingAdapterFactory.create_adapter
expects a valid base_url. Add validation:# 初始化通用嵌入适配器 embedding_config = { "api_key": config.get("embedding_api_key", ""), "base_url": config.get("embedding_base_url"), "model": config.get("embedding_model", "nomic-embed-text:latest"), "timeout": config.get("embedding_timeout", 30) } + +# Ensure base_url is set if api_key is provided +if embedding_config["api_key"] and not embedding_config["base_url"]: + # Try fallback URLs + embedding_config["base_url"] = (config.get("chat_base_url") or + config.get("base_url") or + "https://api.openai.com/v1")evaluators/__init__.py (1)
3-3
: Fix broken import path.The import
from .base import BaseEvaluator
will fail because the base class is inbase_evaluator.py
, notbase.py
.-from .base import BaseEvaluator +from .base_evaluator import BaseEvaluatorevaluators/factory.py (1)
4-4
: Fix broken import path.The import will fail as the base class is in
base_evaluator.py
, notbase.py
.-from .base import BaseEvaluator +from .base_evaluator import BaseEvaluator
🧹 Nitpick comments (33)
config.py (1)
5-11
: Scope the warning suppression to Pydantic (avoid global mute).A global ignore can hide unrelated, useful warnings. Narrow it to the emitting module and keep the regex literal.
-import warnings +import warnings -# 过滤 Pydantic 命名空间冲突警告 -warnings.filterwarnings("ignore", category=UserWarning, message="Field \"model_name\" has conflict with protected namespace") +# 仅忽略来自 Pydantic 的该条提示 +warnings.filterwarnings( + "ignore", + message=r'Field "model_name" has conflict with protected namespace', + category=UserWarning, + module=r"pydantic(\.|$)" +)connectors/dify.py (2)
143-151
: Make connection test assert non-empty answer.Current check passes even when answer is empty.
- return result.get("error") is None + return not result.get("error") and bool(result.get("answer"))
152-169
: Harden context extraction for alternate Dify payload shapes.Some payloads nest metadata under message. Fall back to that path.
- contexts = [] - - # 提取上下文信息 - if "metadata" in response_data and "retriever_resources" in response_data["metadata"]: - contexts = [ - resource.get("content", "") - for resource in response_data["metadata"]["retriever_resources"] - if resource.get("content") - ] + contexts: List[str] = [] + metadata = ( + response_data.get("metadata") + or response_data.get("message", {}).get("metadata") + or {} + ) + resources = metadata.get("retriever_resources") or [] + contexts = [ + r.get("content", "") + for r in resources + if r.get("content") + ]connectors/base.py (3)
63-78
: Timeout None-check and structured logging.
- Use None-check to allow explicit 0 if ever intended.
- Prefer logger.exception/parameterized logging to satisfy linters and keep tracebacks.
- timeout = timeout or self.timeout + timeout = self.timeout if timeout is None else timeout @@ - logger.warning(f"{self.system_name} {error_msg}") + logger.warning("%s %s", self.system_name, error_msg) @@ - error_msg = f"查询失败: {str(e)}" - logger.error(f"{self.system_name} {error_msg}") + error_msg = f"查询失败: {e!s}" + logger.exception("%s %s", self.system_name, error_msg)- timeout = timeout or self.timeout + timeout = self.timeout if timeout is None else timeout @@ - logger.warning(f"{self.system_name} 连接测试超时({timeout}秒)") + logger.warning("%s 连接测试超时(%s秒)", self.system_name, timeout) @@ - except Exception as e: - logger.error(f"{self.system_name} 连接测试失败: {str(e)}") + except Exception as e: + logger.exception("%s 连接测试失败: %s", self.system_name, e)Also applies to: 90-102
21-23
: Avoid external mutation of config.Make a shallow copy to prevent callers mutating internal state accidentally.
- self.config = config + self.config = dict(config)
55-61
: Fix fullwidth punctuation in docstrings/strings (Ruff RUF001/RUF002).Standardize to ASCII punctuation for consistency and lint compliance.
- timeout: 超时时间(秒) + timeout: 超时时间(秒) @@ - error_msg = f"查询超时({timeout}秒)" + error_msg = f"查询超时({timeout}秒)" @@ - timeout: 超时时间(秒) + timeout: 超时时间(秒) @@ - logger.warning(f"{self.system_name} 连接测试超时({timeout}秒)") + logger.warning(f"{self.system_name} 连接测试超时({timeout}秒)") @@ - 错误信息列表,空列表表示验证通过 + 错误信息列表, 空列表表示验证通过Also applies to: 71-74, 84-89, 97-99, 109-111
connectors/ragflow.py (2)
88-89
: Leverage base timeout wrapper for connectivity test.Reuse query_with_timeout for consistent timeout handling.
- result = await self.query_async("test connection", max_retries=1) + result = await self.query_with_timeout("test connection", timeout=self.timeout)
91-91
: Log with traceback on connection test failures.Use logger.exception to retain stack trace.
- logger.error(f"RagFlow连接测试失败: {e}") + logger.exception("RagFlow连接测试失败: %s", e)results/detailed_evaluation_results.json (2)
14-14
: Minor: remove duplicated question text.The question string repeats the same sentence twice.
- "question": "iPhone 13 Pro Maxのスペックを教えてください iPhone 13 Pro Maxのスペックを教えてください ", + "question": "iPhone 13 Pro Maxのスペックを教えてください",
1-148
: Add JSON validity check to CI
Include a pipeline step that runs:jq . results/detailed_evaluation_results.json >/dev/nullto fail on malformed JSON and prevent regressions.
evaluators/base.py (2)
91-94
: Use logger.exception and avoid stringifying exceptionsKeeps tracebacks and cleaner logs.
- except Exception as e: - error_msg = f"评价失败: {str(e)}" - logger.error(f"{self.name} {error_msg}") + except Exception: + logger.exception(f"{self.name} 评价失败") return self._get_default_scores(len(answers)) ... - except Exception as e: - error_msg = f"单个评价失败: {str(e)}" - logger.error(f"{self.name} {error_msg}") + except Exception: + logger.exception(f"{self.name} 单个评价失败") return self._get_default_single_score()Also applies to: 125-127
32-58
: Docstring punctuation: replace full-width parensReplace Chinese full-width parentheses with ASCII to appease linters; content can remain Chinese.
Also applies to: 66-76, 100-111, 154-159
utils/embedding_adapter.py (2)
110-113
: Preserve tracebacks with logger.exception and re-raiseImproves debuggability; avoid wrapping with broad Exception.
- except ImportError as e: - logger.error(f"LangChain导入失败: {e}") - raise Exception(f"不支持的嵌入模型配置: {self.config}") + except ImportError: + logger.exception("LangChain导入失败") + raise ... - except Exception as e: - logger.error(f"嵌入失败: {e}") - raise Exception(f"嵌入失败: {str(e)}") + except Exception: + logger.exception("嵌入失败") + raise ... - except Exception as e: - logger.error(f"批量嵌入失败: {e}") - raise Exception(f"批量嵌入失败: {str(e)}") + except Exception: + logger.exception("批量嵌入失败") + raise ... - except Exception as e: - logger.error(f"直接嵌入失败: {e}") - raise Exception(f"嵌入失败: {str(e)}") + except Exception: + logger.exception("直接嵌入失败") + raiseAlso applies to: 118-121, 126-129, 191-194
176-190
: Reuse a single ClientSession for performanceCreating a session per request is expensive; consider a per-adapter session lifecycle.
evaluators/ragas_evaluator.py (3)
30-37
: Respect configured timeoutUse BaseEvaluator.timeout instead of a hardcoded 60s.
- self.eval_llm = LangchainLLMWrapper(ChatOpenAI( + self.eval_llm = LangchainLLMWrapper(ChatOpenAI( api_key=config["api_key"], base_url=config["base_url"], model=config.get("model", "gpt-3.5-turbo"), temperature=0, max_tokens=1000, - timeout=60 # 增加超时时间 + timeout=self.timeout ))
86-94
: Type hints: make context(s) Optional to match usagePrevents implicit-Optional violations and matches Base change.
- async def evaluate_single_answer_async(self, question: str, answer: str, ground_truth: str, context: List[str] = None) -> Dict[str, float]: + async def evaluate_single_answer_async(self, question: str, answer: str, ground_truth: str, + context: Optional[List[str]] = None) -> Dict[str, Optional[float]]: ... - async def evaluate_answers_async(self, questions: List[str], - answers: List[str], ground_truths: List[str], contexts: List[List[str]] = None) -> Dict[str, List[float]]: + async def evaluate_answers_async(self, questions: List[str], + answers: List[str], ground_truths: List[str], + contexts: Optional[List[List[str]]] = None) -> Dict[str, List[Optional[float]]]:Also applies to: 167-169
71-85
: Prefer logger over print for structured logsUnify on logging like BaseEvaluator for consistency and level control.
Also applies to: 98-101, 155-165, 251-272
evaluators/academic_evaluator.py (5)
1-1
: Comment formatting: Replace fullwidth characters with standard ASCII.The comment contains fullwidth characters that should be replaced with standard ASCII equivalents.
-# 增强学术评估器 - 合并学术和混合模型优势 +# Enhanced academic evaluator - combines academic and hybrid model advantages
43-49
: Avoid catching bare Exception.Catching bare
Exception
can mask unexpected errors. Be more specific about the exceptions you expect.try: self.embedding_adapter = EmbeddingAdapterFactory.create_adapter(embedding_config) print(f"✅ 通用嵌入适配器初始化成功: {embedding_config['model']}") -except Exception as e: +except (ValueError, ConnectionError, KeyError) as e: print(f"⚠️ 嵌入适配器初始化失败,将使用文本相似度: {e}") self.embedding_adapter = None
50-50
: Fix fullwidth characters in comments.Multiple fullwidth characters detected in the comment that should be replaced.
-# 评估模式:pure_chat(纯聊天模型)或 hybrid(混合模式) +# 评估模式: pure_chat(纯聊天模型)或 hybrid(混合模式)
230-230
: Fix fullwidth parentheses in docstring.The docstring contains fullwidth parentheses that should be standard ASCII.
-"""使用嵌入模型计算语义相似度(混合模式用)- 使用通用适配器""" +"""使用嵌入模型计算语义相似度(混合模式用) - 使用通用适配器"""
233-237
: Fix fullwidth comma in string and comment.Replace fullwidth comma with standard ASCII comma.
-# 如果没有嵌入适配器,直接使用文本相似度 +# 如果没有嵌入适配器,直接使用文本相似度 if not self.embedding_adapter: - print("🔍 嵌入适配器不可用,使用文本相似度") + print("🔍 嵌入适配器不可用,使用文本相似度") return self._calculate_text_similarity(answer, ground_truth)evaluators/__init__.py (2)
1-1
: Fix fullwidth parentheses in comment.Replace fullwidth parentheses with ASCII equivalents.
-# 评估器模块 - 统一接口 (全部使用异步版本) +# 评估器模块 - 统一接口 (全部使用异步版本)
8-8
: Improve comment accuracy.The comment states all evaluators support async API, but based on the code, the evaluators also provide sync methods for backward compatibility.
-# 注意: 所有评估器都支持异步API +# 注意: 所有评估器都支持异步API,并提供同步方法以保持向后兼容evaluators/factory.py (9)
16-19
: Add ClassVar annotation for mutable class attributes.Mutable class attributes should be annotated with
typing.ClassVar
to indicate they're shared across all instances.+from typing import Dict, List, Any, Optional, ClassVar from .base import BaseEvaluator class EvaluatorFactory: """评估器工厂类""" # 可用的评估器类型 - EVALUATOR_TYPES = { + EVALUATOR_TYPES: ClassVar[Dict[str, type[BaseEvaluator]]] = { "academic": AcademicEvaluator, "ragas": RagasEvaluator }
22-22
: Add ClassVar annotation for DEFAULT_PRIORITY.Add type annotation for the mutable class attribute.
# 默认评估器优先级 -DEFAULT_PRIORITY = ["ragas", "academic"] +DEFAULT_PRIORITY: ClassVar[List[str]] = ["ragas", "academic"]
39-41
: Avoid catching bare Exception.Be more specific about the exceptions you expect to handle.
-except Exception as e: +except (ValueError, AttributeError, TypeError) as e: print(f"❌ {evaluator_type}评估器创建失败: {e}") return None
100-101
: Fix fullwidth parentheses in strings.Replace fullwidth parentheses with ASCII equivalents.
-"academic": "增强学术评估器 - 支持6维度质量评估(相关性、正确性、完整性、清晰度、连贯性、有用性)", -"ragas": "Ragas框架评估器 - 完整的RAG评估指标集(相关性、正确性、忠实性、上下文精度、上下文召回率)" +"academic": "增强学术评估器 - 支持6维度质量评估(相关性、正确性、完整性、清晰度、连贯性、有用性)", +"ragas": "Ragas框架评估器 - 完整的RAG评估指标集(相关性、正确性、忠实性、上下文精度、上下文召回率)"
115-115
: Remove unnecessary f-string prefix.The string doesn't contain any placeholders.
-print(f"🔧 评估器管理器初始化完成") +print("🔧 评估器管理器初始化完成")
128-128
: Fix fullwidth characters in comment.Replace fullwidth colon and comma with ASCII equivalents.
-"evaluation_mode": "hybrid" # 使用混合模式:embedding计算相关性,聊天模型评估质量 +"evaluation_mode": "hybrid" # 使用混合模式: embedding计算相关性,聊天模型评估质量
152-152
: Remove unnecessary f-string prefix.The string doesn't contain any placeholders.
-print(f" ✅ 完成") +print(" ✅ 完成")
153-153
: Avoid catching bare Exception.Be more specific about expected exceptions.
-except Exception as e: +except (asyncio.TimeoutError, ValueError, AttributeError) as e: print(f" ❌ 失败: {e}")
139-139
: Add explicit Optional annotation.PEP 484 prohibits implicit Optional. Add explicit type annotation.
-async def evaluate_all_async(self, questions: List[str], answers: List[str], - ground_truths: List[str], contexts: List[List[str]] = None) -> Dict[str, Dict[str, List[float]]]: +async def evaluate_all_async(self, questions: List[str], answers: List[str], + ground_truths: List[str], contexts: Optional[List[List[str]]] = None) -> Dict[str, Dict[str, List[float]]]:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
results/multi_evaluation_results.csv
is excluded by!**/*.csv
📒 Files selected for processing (14)
config.py
(1 hunks)connectors/async_base.py
(0 hunks)connectors/base.py
(1 hunks)connectors/dify.py
(3 hunks)connectors/ragflow.py
(1 hunks)data/test_cases_jp.json
(1 hunks)evaluators/__init__.py
(1 hunks)evaluators/academic_evaluator.py
(5 hunks)evaluators/base.py
(1 hunks)evaluators/factory.py
(3 hunks)evaluators/ragas_evaluator.py
(6 hunks)main_multi_eval_async.py
(1 hunks)results/detailed_evaluation_results.json
(2 hunks)utils/embedding_adapter.py
(1 hunks)
💤 Files with no reviewable changes (1)
- connectors/async_base.py
🚧 Files skipped from review as they are similar to previous changes (1)
- main_multi_eval_async.py
🧰 Additional context used
🧬 Code graph analysis (8)
connectors/ragflow.py (1)
connectors/base.py (1)
AsyncBaseRAGConnector
(10-152)
connectors/dify.py (1)
connectors/base.py (1)
AsyncBaseRAGConnector
(10-152)
evaluators/academic_evaluator.py (2)
evaluators/base.py (1)
BaseEvaluator
(10-184)utils/embedding_adapter.py (6)
EmbeddingAdapterFactory
(201-229)detect_embedding_config
(240-265)create_adapter
(205-214)embed_query
(29-31)embed_query
(114-120)embed_query
(167-193)
connectors/base.py (4)
connectors/dify.py (5)
query_async
(115-141)test_connection_async
(143-150)validate_config
(18-25)build_request
(27-48)parse_response
(152-169)connectors/ragflow.py (5)
query_async
(57-83)test_connection_async
(85-92)validate_config
(14-21)build_request
(23-36)parse_response
(94-109)connectors/universal.py (4)
query_async
(59-83)test_connection_async
(134-141)query_with_timeout
(85-105)get_system_info
(161-163)connectors/factory.py (1)
get_system_info
(56-69)
evaluators/ragas_evaluator.py (2)
evaluators/base.py (2)
BaseEvaluator
(10-184)evaluate_answers_async
(29-43)evaluators/academic_evaluator.py (1)
evaluate_answers_async
(59-105)
evaluators/__init__.py (4)
evaluators/base.py (1)
BaseEvaluator
(10-184)evaluators/academic_evaluator.py (1)
AcademicEvaluator
(18-503)evaluators/ragas_evaluator.py (1)
RagasEvaluator
(21-290)evaluators/factory.py (2)
EvaluatorFactory
(12-103)EvaluatorManager
(105-173)
evaluators/base.py (3)
evaluators/academic_evaluator.py (3)
evaluate_answers_async
(59-105)evaluate_single_answer_async
(107-121)get_supported_metrics
(481-487)evaluators/ragas_evaluator.py (5)
evaluate_answers_async
(167-272)evaluate_single_answer_async
(86-100)get_supported_metrics
(274-276)is_available
(278-280)get_evaluator_info
(282-290)evaluators/factory.py (1)
get_evaluator_info
(67-94)
evaluators/factory.py (3)
evaluators/base.py (4)
BaseEvaluator
(10-184)get_evaluator_info
(171-184)evaluate_with_timeout
(62-94)get_supported_metrics
(153-160)evaluators/academic_evaluator.py (2)
AcademicEvaluator
(18-503)get_supported_metrics
(481-487)evaluators/ragas_evaluator.py (3)
RagasEvaluator
(21-290)get_evaluator_info
(282-290)get_supported_metrics
(274-276)
🪛 Ruff (0.12.2)
connectors/dify.py
67-67: Abstract raise
to an inner function
(TRY301)
67-67: Create your own exception
(TRY002)
67-67: Avoid specifying long messages outside the exception class
(TRY003)
69-69: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
69-69: Create your own exception
(TRY002)
69-69: Avoid specifying long messages outside the exception class
(TRY003)
70-70: Do not catch blind exception: Exception
(BLE001)
71-71: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
71-71: Create your own exception
(TRY002)
71-71: Avoid specifying long messages outside the exception class
(TRY003)
71-71: Use explicit conversion flag
Replace with conversion flag
(RUF010)
91-91: Comment contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF003)
102-102: Comment contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF003)
108-108: Comment contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF003)
113-113: Create your own exception
(TRY002)
utils/embedding_adapter.py
47-47: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
48-48: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
48-48: Create your own exception
(TRY002)
48-48: Avoid specifying long messages outside the exception class
(TRY003)
64-64: Comment contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF003)
64-64: Comment contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF003)
111-111: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
112-112: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
112-112: Create your own exception
(TRY002)
112-112: Avoid specifying long messages outside the exception class
(TRY003)
118-118: Do not catch blind exception: Exception
(BLE001)
119-119: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
120-120: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
120-120: Create your own exception
(TRY002)
120-120: Avoid specifying long messages outside the exception class
(TRY003)
120-120: Use explicit conversion flag
Replace with conversion flag
(RUF010)
126-126: Do not catch blind exception: Exception
(BLE001)
127-127: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
128-128: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
128-128: Create your own exception
(TRY002)
128-128: Avoid specifying long messages outside the exception class
(TRY003)
128-128: Use explicit conversion flag
Replace with conversion flag
(RUF010)
190-190: Abstract raise
to an inner function
(TRY301)
190-190: Create your own exception
(TRY002)
190-190: Avoid specifying long messages outside the exception class
(TRY003)
191-191: Do not catch blind exception: Exception
(BLE001)
192-192: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
193-193: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
193-193: Create your own exception
(TRY002)
193-193: Avoid specifying long messages outside the exception class
(TRY003)
193-193: Use explicit conversion flag
Replace with conversion flag
(RUF010)
210-210: Do not catch blind exception: Exception
(BLE001)
211-211: String contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF001)
225-225: f-string without any placeholders
Remove extraneous f
prefix
(F541)
227-227: Do not catch blind exception: Exception
(BLE001)
228-228: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
evaluators/academic_evaluator.py
46-46: Do not catch blind exception: Exception
(BLE001)
47-47: String contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF001)
50-50: Comment contains ambiguous :
(FULLWIDTH COLON). Did you mean :
(COLON)?
(RUF003)
50-50: Comment contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF003)
50-50: Comment contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF003)
50-50: Comment contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF003)
50-50: Comment contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF003)
55-55: Do not catch blind exception: Exception
(BLE001)
230-230: Docstring contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF002)
230-230: Docstring contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF002)
233-233: Comment contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF003)
235-235: String contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF001)
connectors/base.py
57-57: Docstring contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF002)
57-57: Docstring contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF002)
70-70: Consider moving this statement to an else
block
(TRY300)
72-72: String contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF001)
72-72: String contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF001)
75-75: Do not catch blind exception: Exception
(BLE001)
76-76: Use explicit conversion flag
Replace with conversion flag
(RUF010)
77-77: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
85-85: Docstring contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF002)
85-85: Docstring contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF002)
98-98: String contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF001)
98-98: String contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF001)
100-100: Do not catch blind exception: Exception
(BLE001)
101-101: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
101-101: Use explicit conversion flag
Replace with conversion flag
(RUF010)
110-110: Docstring contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF002)
evaluators/ragas_evaluator.py
98-98: Do not catch blind exception: Exception
(BLE001)
102-102: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
117-117: Comment contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF003)
117-117: Comment contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF003)
168-168: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
186-186: Comment contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF003)
186-186: Comment contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF003)
205-205: Comment contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF003)
244-244: Comment contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF003)
251-251: String contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF001)
253-253: Do not catch blind exception: Exception
(BLE001)
262-268: Consider moving this statement to an else
block
(TRY300)
270-270: Do not catch blind exception: Exception
(BLE001)
evaluators/base.py
30-30: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
38-38: Docstring contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF002)
38-38: Docstring contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF002)
47-47: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
55-55: Docstring contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF002)
55-55: Docstring contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF002)
63-63: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
72-72: Docstring contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF002)
72-72: Docstring contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF002)
73-73: Docstring contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF002)
73-73: Docstring contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF002)
85-85: Consider moving this statement to an else
block
(TRY300)
87-87: String contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF001)
87-87: String contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF001)
91-91: Do not catch blind exception: Exception
(BLE001)
92-92: Use explicit conversion flag
Replace with conversion flag
(RUF010)
93-93: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
97-97: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
106-106: Docstring contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF002)
106-106: Docstring contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF002)
107-107: Docstring contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF002)
107-107: Docstring contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF002)
119-119: Consider moving this statement to an else
block
(TRY300)
121-121: String contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF001)
121-121: String contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF001)
124-124: Do not catch blind exception: Exception
(BLE001)
125-125: Use explicit conversion flag
Replace with conversion flag
(RUF010)
126-126: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
evaluators/factory.py
16-19: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
22-22: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
28-28: Avoid specifying long messages outside the exception class
(TRY003)
39-39: Do not catch blind exception: Exception
(BLE001)
100-100: String contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF001)
100-100: String contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF001)
101-101: String contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF001)
101-101: String contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF001)
115-115: f-string without any placeholders
Remove extraneous f
prefix
(F541)
128-128: Comment contains ambiguous :
(FULLWIDTH COLON). Did you mean :
(COLON)?
(RUF003)
128-128: Comment contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF003)
139-139: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
152-152: f-string without any placeholders
Remove extraneous f
prefix
(F541)
153-153: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (4)
connectors/dify.py (2)
7-7
: Import path consolidation looks correct.Aligned with base class relocation.
14-17
: Constructor wiring LGTM.System name and config passed to base correctly.
results/detailed_evaluation_results.json (1)
16-16
: Verify device facts; remove inaccurate claims.iPhone 13 Pro Max does not ship with Always‑On Display or Micro‑LED. Please correct or drop that bullet to avoid penalizing models incorrectly.
-その他の特徴として、Always-Onディスプレイ、Micro-LEDディスプレイ技術、およびAR機能強化が含まれます。 +その他の特徴として、AR機能の活用が挙げられます。Run a quick check and adjust OS-upgrade wording (avoid pinning to a specific current iOS in static data).
evaluators/ragas_evaluator.py (1)
4-15
: Import aevaluate when availableIf using ragas>=0.2 with aevaluate, prefer it over to_thread for clarity.
Would you like me to switch to aevaluate if present and fall back to to_thread otherwise?
return { | ||
"name": self.system_name, | ||
"description": self.__class__.__doc__ or "", | ||
"config_valid": len(self.validate_config()) == 0 | ||
} | ||
"system_name": self.system_name, | ||
"timeout": self.timeout, | ||
"config": {k: v for k, v in self.config.items() if k != 'api_key'} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Harden secret redaction in get_system_info.
Only removing api_key risks leaking other secrets (token, password, auth headers). Redact a broader set of sensitive keys.
- return {
- "system_name": self.system_name,
- "timeout": self.timeout,
- "config": {k: v for k, v in self.config.items() if k != 'api_key'}
- }
+ SENSITIVE_KEYS = ("api_key", "key", "token", "secret", "password", "authorization", "cookie")
+ return {
+ "system_name": self.system_name,
+ "timeout": self.timeout,
+ "config": {
+ k: ("***" if any(s in k.lower() for s in SENSITIVE_KEYS) else v)
+ for k, v in self.config.items()
+ },
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
return { | |
"name": self.system_name, | |
"description": self.__class__.__doc__ or "", | |
"config_valid": len(self.validate_config()) == 0 | |
} | |
\ No newline at end of file | |
"system_name": self.system_name, | |
"timeout": self.timeout, | |
"config": {k: v for k, v in self.config.items() if k != 'api_key'} | |
} | |
SENSITIVE_KEYS = ("api_key", "key", "token", "secret", "password", "authorization", "cookie") | |
return { | |
"system_name": self.system_name, | |
"timeout": self.timeout, | |
"config": { | |
k: ("***" if any(s in k.lower() for s in SENSITIVE_KEYS) else v) | |
for k, v in self.config.items() | |
}, | |
} |
🤖 Prompt for AI Agents
In connectors/base.py around lines 148 to 152, get_system_info currently only
filters out the 'api_key' key which can leak other secrets; update the logic to
redact a predefined set of sensitive keys (e.g., api_key, apiToken, token,
password, secret, auth, authorization, bearer) in a case-insensitive way and
return those keys with their values replaced by a constant like "REDACTED"
(rather than omitting them) so the shape remains predictable; ensure the filter
handles nested dicts at the top level only and does not modify non-sensitive
entries.
async def _parse_streaming_response(self, response) -> Dict[str, Any]: | ||
"""解析streaming响应""" | ||
import json | ||
|
||
message_events = [] | ||
message_end_result = None | ||
|
||
async for line in response.content: | ||
line = line.decode('utf-8').strip() | ||
if line.startswith('data: '): | ||
try: | ||
data = json.loads(line[6:]) # 移除 'data: ' 前缀 | ||
event_type = data.get("event") | ||
|
||
if event_type == "message": | ||
message_events.append(data) | ||
elif event_type == "message_end": | ||
message_end_result = data | ||
break # 找到message_end后停止,这是最完整的响应 | ||
|
||
except json.JSONDecodeError: | ||
continue | ||
|
||
# 合并所有message事件的答案 | ||
full_answer = "" | ||
for msg in message_events: | ||
if "answer" in msg: | ||
full_answer += msg["answer"] | ||
|
||
# 使用message_end作为基础,它包含完整的metadata | ||
if message_end_result: | ||
final_result = message_end_result | ||
final_result["answer"] = full_answer | ||
return final_result | ||
elif message_events: | ||
# 如果没有message_end,使用最后一个message事件 | ||
last_message = message_events[-1] | ||
last_message["answer"] = full_answer | ||
return last_message | ||
else: | ||
raise Exception("无法从streaming响应中解析出有效消息") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SSE parser is chunk-fragile; switch to line-based read and handle [DONE]/error events.
Iterating response.content yields arbitrary chunks, so startswith('data: ') can break across boundaries.
- async def _parse_streaming_response(self, response) -> Dict[str, Any]:
- """解析streaming响应"""
- import json
-
- message_events = []
- message_end_result = None
-
- async for line in response.content:
- line = line.decode('utf-8').strip()
- if line.startswith('data: '):
- try:
- data = json.loads(line[6:]) # 移除 'data: ' 前缀
- event_type = data.get("event")
-
- if event_type == "message":
- message_events.append(data)
- elif event_type == "message_end":
- message_end_result = data
- break # 找到message_end后停止,这是最完整的响应
-
- except json.JSONDecodeError:
- continue
-
- # 合并所有message事件的答案
- full_answer = ""
- for msg in message_events:
- if "answer" in msg:
- full_answer += msg["answer"]
-
- # 使用message_end作为基础,它包含完整的metadata
- if message_end_result:
- final_result = message_end_result
- final_result["answer"] = full_answer
- return final_result
- elif message_events:
- # 如果没有message_end,使用最后一个message事件
- last_message = message_events[-1]
- last_message["answer"] = full_answer
- return last_message
- else:
- raise Exception("无法从streaming响应中解析出有效消息")
+ async def _parse_streaming_response(self, response) -> Dict[str, Any]:
+ """解析streaming响应(逐行SSE解析)"""
+ import json
+
+ full_answer_parts: List[str] = []
+ message_end_result: Dict[str, Any] | None = None
+
+ while True:
+ raw = await response.content.readline()
+ if not raw:
+ break
+ line = raw.decode("utf-8", errors="ignore").strip()
+ if not line or line.startswith(":"):
+ continue
+ if not line.startswith("data:"):
+ continue
+ payload = line[5:].strip()
+ if payload in ("", "[DONE]"):
+ continue
+ try:
+ data = json.loads(payload)
+ except json.JSONDecodeError:
+ continue
+
+ event_type = data.get("event")
+ if event_type == "message":
+ ans = data.get("answer") or data.get("message", {}).get("content", "")
+ if ans:
+ full_answer_parts.append(ans)
+ elif event_type == "message_end":
+ message_end_result = data
+ break
+ elif event_type == "error":
+ raise DifyAPIError(f"Streaming error: {data.get('error') or data!s}")
+
+ full_answer = "".join(full_answer_parts)
+ if message_end_result:
+ final = dict(message_end_result)
+ final["answer"] = full_answer
+ return final
+ if full_answer:
+ return {"answer": full_answer}
+ raise DifyAPIError("无法从streaming响应中解析出有效消息")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
async def _parse_streaming_response(self, response) -> Dict[str, Any]: | |
"""解析streaming响应""" | |
import json | |
message_events = [] | |
message_end_result = None | |
async for line in response.content: | |
line = line.decode('utf-8').strip() | |
if line.startswith('data: '): | |
try: | |
data = json.loads(line[6:]) # 移除 'data: ' 前缀 | |
event_type = data.get("event") | |
if event_type == "message": | |
message_events.append(data) | |
elif event_type == "message_end": | |
message_end_result = data | |
break # 找到message_end后停止,这是最完整的响应 | |
except json.JSONDecodeError: | |
continue | |
# 合并所有message事件的答案 | |
full_answer = "" | |
for msg in message_events: | |
if "answer" in msg: | |
full_answer += msg["answer"] | |
# 使用message_end作为基础,它包含完整的metadata | |
if message_end_result: | |
final_result = message_end_result | |
final_result["answer"] = full_answer | |
return final_result | |
elif message_events: | |
# 如果没有message_end,使用最后一个message事件 | |
last_message = message_events[-1] | |
last_message["answer"] = full_answer | |
return last_message | |
else: | |
raise Exception("无法从streaming响应中解析出有效消息") | |
async def _parse_streaming_response(self, response) -> Dict[str, Any]: | |
"""解析streaming响应(逐行SSE解析)""" | |
import json | |
full_answer_parts: List[str] = [] | |
message_end_result: Dict[str, Any] | None = None | |
while True: | |
raw = await response.content.readline() | |
if not raw: | |
break | |
line = raw.decode("utf-8", errors="ignore").strip() | |
if not line or line.startswith(":"): | |
continue | |
if not line.startswith("data:"): | |
continue | |
payload = line[5:].strip() | |
if payload in ("", "[DONE]"): | |
continue | |
try: | |
data = json.loads(payload) | |
except json.JSONDecodeError: | |
continue | |
event_type = data.get("event") | |
if event_type == "message": | |
ans = data.get("answer") or data.get("message", {}).get("content", "") | |
if ans: | |
full_answer_parts.append(ans) | |
elif event_type == "message_end": | |
message_end_result = data | |
break | |
elif event_type == "error": | |
raise DifyAPIError(f"Streaming error: {data.get('error') or data!s}") | |
full_answer = "".join(full_answer_parts) | |
if message_end_result: | |
final = dict(message_end_result) | |
final["answer"] = full_answer | |
return final | |
if full_answer: | |
return {"answer": full_answer} | |
raise DifyAPIError("无法从streaming响应中解析出有效消息") |
🧰 Tools
🪛 Ruff (0.12.2)
91-91: Comment contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF003)
102-102: Comment contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF003)
108-108: Comment contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF003)
113-113: Create your own exception
(TRY002)
{ | ||
"question": "iPhone 13 Pro Maxのスペックを教えてください iPhone 13 Pro Maxのスペックを教えてください ", | ||
"ground_truth": "iPhone 13 Pro Maxの主な仕様は以下の通りです:\n\n- **ディスプレイ**: 6.7インチ Super Retina XDR(解像度2778×1284ピクセル) \n- **プロセッサ**: A15 Bionicチップ \n- **RAM**: 6GB \n- **ストレージ**: 128GB / 256GB / 512GB / 1TB(日本モデルは一部容量が異なる場合あり) \n- **カメラ**: \n - トリプル12MPシステム(超広角・広角・望遠) \n - ライダー(LiDAR)スキャナー搭載 \n- **バッテリー**: 約21時間の動画再生(再生時間は使用状況により変化) \n- **通信**: 5G対応、Wi-Fi 6 \n- **OS**: 発売時はiOS 15を搭載 \n\nアプリ開発において特に重要なのは画面解像度やカメラ機能、5G対応の有無です。日本向けアプリの場合、キャリアや地域固有の仕様(例: 防水性能)も考慮が必要な場合があります。何か特定の機能や開発上の懸念があれば、詳しく教えてください!" | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fix duplicated question text and incorrect battery spec in JP test case.
- Question string is duplicated and has a trailing space.
- iPhone 13 Pro Max video playback is up to 28 hours (not ~21). Ambiguous note about JP storage variants is likely incorrect.
- {
- "question": "iPhone 13 Pro Maxのスペックを教えてください iPhone 13 Pro Maxのスペックを教えてください ",
- "ground_truth": "iPhone 13 Pro Maxの主な仕様は以下の通りです:\n\n- **ディスプレイ**: 6.7インチ Super Retina XDR(解像度2778×1284ピクセル) \n- **プロセッサ**: A15 Bionicチップ \n- **RAM**: 6GB \n- **ストレージ**: 128GB / 256GB / 512GB / 1TB(日本モデルは一部容量が異なる場合あり) \n- **カメラ**: \n - トリプル12MPシステム(超広角・広角・望遠) \n - ライダー(LiDAR)スキャナー搭載 \n- **バッテリー**: 約21時間の動画再生(再生時間は使用状況により変化) \n- **通信**: 5G対応、Wi-Fi 6 \n- **OS**: 発売時はiOS 15を搭載 \n\nアプリ開発において特に重要なのは画面解像度やカメラ機能、5G対応の有無です。日本向けアプリの場合、キャリアや地域固有の仕様(例: 防水性能)も考慮が必要な場合があります。何か特定の機能や開発上の懸念があれば、詳しく教えてください!"
- },
+ {
+ "question": "iPhone 13 Pro Maxのスペックを教えてください",
+ "ground_truth": "iPhone 13 Pro Maxの主な仕様は以下の通りです:\n\n- **ディスプレイ**: 6.7インチ Super Retina XDR(2778×1284)\n- **プロセッサ**: A15 Bionic\n- **RAM**: 6GB\n- **ストレージ**: 128GB / 256GB / 512GB / 1TB\n- **カメラ**: トリプル12MP(超広角・広角・望遠)+LiDARスキャナ\n- **バッテリー**: 最大28時間のビデオ再生(使用状況により変動)\n- **通信**: 5G、Wi‑Fi 6\n- **OS**: 発売時はiOS 15\n\n開発観点では画面解像度、ProMotion(120Hz)、カメラ機能、5G対応が重要です。特定の要件があれば教えてください。"
+ },
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
{ | |
"question": "iPhone 13 Pro Maxのスペックを教えてください iPhone 13 Pro Maxのスペックを教えてください ", | |
"ground_truth": "iPhone 13 Pro Maxの主な仕様は以下の通りです:\n\n- **ディスプレイ**: 6.7インチ Super Retina XDR(解像度2778×1284ピクセル) \n- **プロセッサ**: A15 Bionicチップ \n- **RAM**: 6GB \n- **ストレージ**: 128GB / 256GB / 512GB / 1TB(日本モデルは一部容量が異なる場合あり) \n- **カメラ**: \n - トリプル12MPシステム(超広角・広角・望遠) \n - ライダー(LiDAR)スキャナー搭載 \n- **バッテリー**: 約21時間の動画再生(再生時間は使用状況により変化) \n- **通信**: 5G対応、Wi-Fi 6 \n- **OS**: 発売時はiOS 15を搭載 \n\nアプリ開発において特に重要なのは画面解像度やカメラ機能、5G対応の有無です。日本向けアプリの場合、キャリアや地域固有の仕様(例: 防水性能)も考慮が必要な場合があります。何か特定の機能や開発上の懸念があれば、詳しく教えてください!" | |
}, | |
{ | |
"question": "iPhone 13 Pro Maxのスペックを教えてください", | |
"ground_truth": "iPhone 13 Pro Maxの主な仕様は以下の通りです:\n\n- **ディスプレイ**: 6.7インチ Super Retina XDR(2778×1284)\n- **プロセッサ**: A15 Bionic\n- **RAM**: 6GB\n- **ストレージ**: 128GB / 256GB / 512GB / 1TB\n- **カメラ**: トリプル12MP(超広角・広角・望遠)+LiDARスキャナ\n- **バッテリー**: 最大28時間のビデオ再生(使用状況により変動)\n- **通信**: 5G、Wi-Fi 6\n- **OS**: 発売時はiOS 15\n\n開発観点では画面解像度、ProMotion(120Hz)、カメラ機能、5G対応が重要です。特定の要件があれば教えてください。" | |
}, |
🤖 Prompt for AI Agents
In data/test_cases_jp.json around lines 10 to 13, the "question" value repeats
the same text and includes an extra trailing space, and the "ground_truth" has
an incorrect battery playback spec and an unnecessary/ambiguous note about
Japanese storage variants; fix by keeping a single non-duplicated question
string with no trailing space, update the battery line to state video playback
up to 28 hours, and remove or rephrase the parenthetical about Japan-specific
storage variants so the spec block is accurate and unambiguous.
import sys | ||
import os | ||
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) | ||
from utils.embedding_adapter import EmbeddingAdapterFactory, detect_embedding_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Remove unnecessary sys.path manipulation.
Adding parent directory to sys.path is an anti-pattern that can cause import issues. The import should work without path manipulation since utils
is at the repository root.
-import sys
-import os
-sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-from utils.embedding_adapter import EmbeddingAdapterFactory, detect_embedding_config
+from utils.embedding_adapter import EmbeddingAdapterFactory, detect_embedding_config
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
import sys | |
import os | |
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) | |
from utils.embedding_adapter import EmbeddingAdapterFactory, detect_embedding_config | |
from utils.embedding_adapter import EmbeddingAdapterFactory, detect_embedding_config |
🤖 Prompt for AI Agents
In evaluators/academic_evaluator.py around lines 6 to 9, remove the sys.path
manipulation (sys.path.append(...)) and its os import; simply rely on the
repository root being on PYTHONPATH so the line "from utils.embedding_adapter
import EmbeddingAdapterFactory, detect_embedding_config" works. Delete the
unnecessary imports and sys.path modification, ensure any test or runtime
entrypoint sets the project root on PYTHONPATH or uses a proper package entry
(e.g., run as a module) so imports resolve without manually mutating sys.path.
async def evaluate_answers_async(self, questions: List[str], answers: List[str], | ||
ground_truths: List[str], contexts: List[List[str]] = None) -> Dict[str, List[float]]: | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fix typing: use Optional for contexts and allow None scores
Align with callers that return None on failure/unavailable. Update signatures to Optional and return types to include Optional[float].
- async def evaluate_answers_async(self, questions: List[str], answers: List[str],
- ground_truths: List[str], contexts: List[List[str]] = None) -> Dict[str, List[float]]:
+ async def evaluate_answers_async(self, questions: List[str], answers: List[str],
+ ground_truths: List[str],
+ contexts: Optional[List[List[str]]] = None
+ ) -> Dict[str, List[Optional[float]]]:
...
- async def evaluate_single_answer_async(self, question: str, answer: str,
- ground_truth: str, context: List[str] = None) -> Dict[str, float]:
+ async def evaluate_single_answer_async(self, question: str, answer: str,
+ ground_truth: str, context: Optional[List[str]] = None
+ ) -> Dict[str, Optional[float]]:
...
- async def evaluate_with_timeout(self, questions: List[str], answers: List[str],
- ground_truths: List[str], contexts: List[List[str]] = None,
- timeout: Optional[int] = None) -> Dict[str, List[float]]:
+ async def evaluate_with_timeout(self, questions: List[str], answers: List[str],
+ ground_truths: List[str], contexts: Optional[List[List[str]]] = None,
+ timeout: Optional[int] = None) -> Dict[str, List[Optional[float]]]:
...
- async def evaluate_single_with_timeout(self, question: str, answer: str,
- ground_truth: str, context: List[str] = None,
- timeout: Optional[int] = None) -> Dict[str, float]:
+ async def evaluate_single_with_timeout(self, question: str, answer: str,
+ ground_truth: str, context: Optional[List[str]] = None,
+ timeout: Optional[int] = None) -> Dict[str, Optional[float]]:
Also applies to: 46-47, 62-65, 96-99
🧰 Tools
🪛 Ruff (0.12.2)
30-30: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
🤖 Prompt for AI Agents
In evaluators/base.py around lines 29-31, change the signature to use Optional
for the contexts parameter and allow None scores in the return type (e.g.,
contexts: Optional[List[List[str]]] = None and return Dict[str,
List[Optional[float]]]); import Optional from typing. Apply the same typing
changes to the other affected method signatures at lines 46-47, 62-65, and 96-99
so that any lists of scores are List[Optional[float]] and optional contexts use
Optional; update docstrings/comments to reflect that evaluators may return None
for individual scores.
return {metric: [0.0] * count for metric in metrics} | ||
|
||
def _get_default_single_score(self) -> Dict[str, float]: | ||
""" | ||
获取默认单个评分 | ||
Returns: | ||
默认单个评分字典 | ||
""" | ||
metrics = self.get_supported_metrics() | ||
return {metric: 0.0 for metric in metrics} | ||
|
||
@abstractmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Default fallbacks should be “missing” (None), not 0.0
0.0 conflates “bad score” with “no score”. Return None consistently.
- return {metric: [0.0] * count for metric in metrics}
+ return {metric: [None] * count for metric in metrics}
...
- return {metric: 0.0 for metric in metrics}
+ return {metric: None for metric in metrics}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
metrics = self.get_supported_metrics() | |
return {metric: [0.0] * count for metric in metrics} | |
def _get_default_single_score(self) -> Dict[str, float]: | |
""" | |
获取默认单个评分 | |
Returns: | |
默认单个评分字典 | |
""" | |
metrics = self.get_supported_metrics() | |
return {metric: 0.0 for metric in metrics} | |
metrics = self.get_supported_metrics() | |
return {metric: [None] * count for metric in metrics} | |
def _get_default_single_score(self) -> Dict[str, float]: | |
""" | |
获取默认单个评分 | |
Returns: | |
默认单个评分字典 | |
""" | |
metrics = self.get_supported_metrics() | |
return {metric: None for metric in metrics} |
🤖 Prompt for AI Agents
In evaluators/base.py around lines 139 to 151, the default fallbacks currently
return 0.0 which conflates “bad score” with “missing”; update the two functions
so they return None for missing scores instead of 0.0, adjust return type hints
to use Optional[float] (e.g., Dict[str, List[Optional[float]]] for the
multi-score method and Dict[str, Optional[float]] for the single-score method),
and add/import typing.Optional at the top of the module; keep function behavior
otherwise the same so callers receive None to represent missing scores.
evaluators/ragas_evaluator.py
Outdated
try: | ||
scores_dict = result.scores | ||
|
||
# scores_dict 是列表格式,每个元素是一个字典 | ||
if scores_dict and len(scores_dict) > 0: | ||
for i, item_scores in enumerate(scores_dict): | ||
if i < len(answers): | ||
# Answer Relevancy | ||
if 'answer_relevancy' in item_scores: | ||
rel_score = item_scores['answer_relevancy'] | ||
relevancy_scores.append(float(rel_score) if rel_score is not None and not math.isnan(rel_score) else None) | ||
else: | ||
relevancy_scores.append(None) | ||
|
||
# Answer Correctness | ||
if 'answer_correctness' in item_scores: | ||
cor_score = item_scores['answer_correctness'] | ||
correctness_scores.append(float(cor_score) if cor_score is not None and not math.isnan(cor_score) else None) | ||
else: | ||
correctness_scores.append(None) | ||
|
||
# Faithfulness | ||
if 'faithfulness' in item_scores: | ||
faith_score = item_scores['faithfulness'] | ||
faithfulness_scores.append(float(faith_score) if faith_score is not None and not math.isnan(faith_score) else None) | ||
else: | ||
faithfulness_scores.append(None) | ||
|
||
# Context Precision | ||
if 'context_precision' in item_scores: | ||
ctx_prec_score = item_scores['context_precision'] | ||
context_precision_scores.append(float(ctx_prec_score) if ctx_prec_score is not None and not math.isnan(ctx_prec_score) else None) | ||
else: | ||
context_precision_scores.append(None) | ||
|
||
# Context Recall | ||
if 'context_recall' in item_scores: | ||
ctx_rec_score = item_scores['context_recall'] | ||
context_recall_scores.append(float(ctx_rec_score) if ctx_rec_score is not None and not math.isnan(ctx_rec_score) else None) | ||
else: | ||
context_recall_scores.append(None) | ||
else: | ||
# 如果没有足够的评估结果,填充None | ||
relevancy_scores.append(None) | ||
correctness_scores.append(None) | ||
faithfulness_scores.append(None) | ||
context_precision_scores.append(None) | ||
context_recall_scores.append(None) | ||
|
||
print(f" Ragas原生异步批量评估完成,处理了 {len(relevancy_scores)} 个样本") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure metric lists match len(answers) when ragas returns fewer rows
Currently you don’t pad trailing entries; consumers may break on length mismatch.
if scores_dict and len(scores_dict) > 0:
for i, item_scores in enumerate(scores_dict):
if i < len(answers):
...
- else:
- # 如果没有足够的评估结果,填充None
- relevancy_scores.append(None)
- correctness_scores.append(None)
- faithfulness_scores.append(None)
- context_precision_scores.append(None)
- context_recall_scores.append(None)
+ # 结果条目不足时补齐
+ missing = len(answers) - len(relevancy_scores)
+ if missing > 0:
+ relevancy_scores.extend([None] * missing)
+ correctness_scores.extend([None] * missing)
+ faithfulness_scores.extend([None] * missing)
+ context_precision_scores.extend([None] * missing)
+ context_recall_scores.extend([None] * missing)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
try: | |
scores_dict = result.scores | |
# scores_dict 是列表格式,每个元素是一个字典 | |
if scores_dict and len(scores_dict) > 0: | |
for i, item_scores in enumerate(scores_dict): | |
if i < len(answers): | |
# Answer Relevancy | |
if 'answer_relevancy' in item_scores: | |
rel_score = item_scores['answer_relevancy'] | |
relevancy_scores.append(float(rel_score) if rel_score is not None and not math.isnan(rel_score) else None) | |
else: | |
relevancy_scores.append(None) | |
# Answer Correctness | |
if 'answer_correctness' in item_scores: | |
cor_score = item_scores['answer_correctness'] | |
correctness_scores.append(float(cor_score) if cor_score is not None and not math.isnan(cor_score) else None) | |
else: | |
correctness_scores.append(None) | |
# Faithfulness | |
if 'faithfulness' in item_scores: | |
faith_score = item_scores['faithfulness'] | |
faithfulness_scores.append(float(faith_score) if faith_score is not None and not math.isnan(faith_score) else None) | |
else: | |
faithfulness_scores.append(None) | |
# Context Precision | |
if 'context_precision' in item_scores: | |
ctx_prec_score = item_scores['context_precision'] | |
context_precision_scores.append(float(ctx_prec_score) if ctx_prec_score is not None and not math.isnan(ctx_prec_score) else None) | |
else: | |
context_precision_scores.append(None) | |
# Context Recall | |
if 'context_recall' in item_scores: | |
ctx_rec_score = item_scores['context_recall'] | |
context_recall_scores.append(float(ctx_rec_score) if ctx_rec_score is not None and not math.isnan(ctx_rec_score) else None) | |
else: | |
context_recall_scores.append(None) | |
else: | |
# 如果没有足够的评估结果,填充None | |
relevancy_scores.append(None) | |
correctness_scores.append(None) | |
faithfulness_scores.append(None) | |
context_precision_scores.append(None) | |
context_recall_scores.append(None) | |
print(f" Ragas原生异步批量评估完成,处理了 {len(relevancy_scores)} 个样本") | |
try: | |
scores_dict = result.scores | |
# scores_dict 是列表格式,每个元素是一个字典 | |
if scores_dict and len(scores_dict) > 0: | |
for i, item_scores in enumerate(scores_dict): | |
if i < len(answers): | |
# Answer Relevancy | |
if 'answer_relevancy' in item_scores: | |
rel_score = item_scores['answer_relevancy'] | |
relevancy_scores.append( | |
float(rel_score) if rel_score is not None and not math.isnan(rel_score) else None | |
) | |
else: | |
relevancy_scores.append(None) | |
# Answer Correctness | |
if 'answer_correctness' in item_scores: | |
cor_score = item_scores['answer_correctness'] | |
correctness_scores.append( | |
float(cor_score) if cor_score is not None and not math.isnan(cor_score) else None | |
) | |
else: | |
correctness_scores.append(None) | |
# Faithfulness | |
if 'faithfulness' in item_scores: | |
faith_score = item_scores['faithfulness'] | |
faithfulness_scores.append( | |
float(faith_score) if faith_score is not None and not math.isnan(faith_score) else None | |
) | |
else: | |
faithfulness_scores.append(None) | |
# Context Precision | |
if 'context_precision' in item_scores: | |
ctx_prec_score = item_scores['context_precision'] | |
context_precision_scores.append( | |
float(ctx_prec_score) if ctx_prec_score is not None and not math.isnan(ctx_prec_score) else None | |
) | |
else: | |
context_precision_scores.append(None) | |
# Context Recall | |
if 'context_recall' in item_scores: | |
ctx_rec_score = item_scores['context_recall'] | |
context_recall_scores.append( | |
float(ctx_rec_score) if ctx_rec_score is not None and not math.isnan(ctx_rec_score) else None | |
) | |
else: | |
context_recall_scores.append(None) | |
# 结果条目不足时补齐 | |
missing = len(answers) - len(relevancy_scores) | |
if missing > 0: | |
relevancy_scores.extend([None] * missing) | |
correctness_scores.extend([None] * missing) | |
faithfulness_scores.extend([None] * missing) | |
context_precision_scores.extend([None] * missing) | |
context_recall_scores.extend([None] * missing) | |
print(f" Ragas原生异步批量评估完成,处理了 {len(relevancy_scores)} 个样本") |
🧰 Tools
🪛 Ruff (0.12.2)
205-205: Comment contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF003)
244-244: Comment contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF003)
251-251: String contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF001)
🤖 Prompt for AI Agents
In evaluators/ragas_evaluator.py around lines 202 to 251, the metric lists
(relevancy_scores, correctness_scores, faithfulness_scores,
context_precision_scores, context_recall_scores) can be shorter than
len(answers) when ragas returns fewer rows or no scores; ensure each metric list
is padded with trailing None entries so their lengths equal len(answers). After
the existing loop (and also when scores_dict is falsy), append None repeatedly
to each metric list until each list's length == len(answers) to guarantee
consumers won't break on length mismatch.
def _get_api_format(self) -> Dict[str, Any]: | ||
"""获取API格式配置""" | ||
if self.provider == EmbeddingProvider.OLLAMA: | ||
return { | ||
"input_field": "prompt", | ||
"response_path": ["embedding"], | ||
"headers": { | ||
"Content-Type": "application/json", | ||
"Authorization": f"Bearer {self.config.get('api_key', '')}" | ||
} | ||
} | ||
else: | ||
return { | ||
"input_field": "input", | ||
"response_path": ["data", 0, "embedding"], | ||
"headers": { | ||
"Content-Type": "application/json", | ||
"Authorization": f"Bearer {self.config.get('api_key', '')}" | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix Ollama endpoint; use provider-specific path
Ollama expects /api/embeddings. Current hardcoded /embeddings will 404.
def _get_api_format(self) -> Dict[str, Any]:
"""获取API格式配置"""
if self.provider == EmbeddingProvider.OLLAMA:
return {
+ "path": "/api/embeddings",
"input_field": "prompt",
"response_path": ["embedding"],
"headers": {
"Content-Type": "application/json",
"Authorization": f"Bearer {self.config.get('api_key', '')}"
}
}
else:
return {
+ "path": "/embeddings",
"input_field": "input",
"response_path": ["data", 0, "embedding"],
"headers": {
"Content-Type": "application/json",
"Authorization": f"Bearer {self.config.get('api_key', '')}"
}
}
...
- url = f"{self.config['base_url'].rstrip('/')}/embeddings"
+ url = f"{self.config['base_url'].rstrip('/')}{self.api_format['path']}"
Also applies to: 167-175
🤖 Prompt for AI Agents
In utils/embedding_adapter.py around lines 146-165 (and also check 167-175), the
Ollama configuration is still using the generic/hardcoded embeddings path which
causes 404s; update the adapter to use the Ollama-specific endpoint path
"/api/embeddings" when provider == EmbeddingProvider.OLLAMA and ensure other
providers keep their expected path (e.g., "/embeddings" or provider-specific as
appropriate), and adjust any code around lines 167-175 that constructs the
request URL to branch on provider and use "/api/embeddings" for Ollama.
async def embed_documents(self, texts: List[str]) -> List[List[float]]: | ||
"""嵌入多个文本""" | ||
# 并发处理多个文本 | ||
tasks = [self.embed_query(text) for text in texts] | ||
return await asyncio.gather(*tasks, return_exceptions=True) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not return exceptions from embed_documents
Returning exceptions violates the return type and leaks errors downstream. Let gather raise.
- tasks = [self.embed_query(text) for text in texts]
- return await asyncio.gather(*tasks, return_exceptions=True)
+ tasks = [self.embed_query(text) for text in texts]
+ return await asyncio.gather(*tasks, return_exceptions=False)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
async def embed_documents(self, texts: List[str]) -> List[List[float]]: | |
"""嵌入多个文本""" | |
# 并发处理多个文本 | |
tasks = [self.embed_query(text) for text in texts] | |
return await asyncio.gather(*tasks, return_exceptions=True) | |
async def embed_documents(self, texts: List[str]) -> List[List[float]]: | |
"""嵌入多个文本""" | |
# 并发处理多个文本 | |
tasks = [self.embed_query(text) for text in texts] | |
return await asyncio.gather(*tasks, return_exceptions=False) |
🤖 Prompt for AI Agents
In utils/embedding_adapter.py around lines 195 to 200, embed_documents currently
calls asyncio.gather(..., return_exceptions=True) which returns exceptions in
the result list and violates the declared return type; change the call to await
asyncio.gather(*tasks) (remove return_exceptions=True) so exceptions propagate
normally (or alternatively catch and re-raise specific errors if you need custom
handling) ensuring the function always returns List[List[float]] on success and
does not leak exceptions into the returned value.
- Replace all print() statements with structured logging calls - Fix async blocking issue in Ragas evaluator using asyncio.to_thread() - Improve error handling in embedding adapter for missing api_key - Add proper logging imports to evaluator modules 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
evaluators/academic_evaluator.py (3)
82-105
: Batch result returns only 4 metrics but get_supported_metrics returns 6.This will break consumers expecting
coherence
andhelpfulness
. Track and return both.- relevancy_scores = [] - correctness_scores = [] - completeness_scores = [] - clarity_scores = [] + relevancy_scores = [] + correctness_scores = [] + completeness_scores = [] + clarity_scores = [] + coherence_scores = [] + helpfulness_scores = [] @@ - if isinstance(result, Exception): + if isinstance(result, Exception): logger.error(f"评估异常: {result}") relevancy_scores.append(0.0) correctness_scores.append(0.0) completeness_scores.append(0.0) - clarity_scores.append(0.0) + clarity_scores.append(0.0) + coherence_scores.append(0.0) + helpfulness_scores.append(0.0) else: relevancy_scores.append(result["relevancy"]) correctness_scores.append(result["correctness"]) completeness_scores.append(result["completeness"]) - clarity_scores.append(result["clarity"]) + clarity_scores.append(result["clarity"]) + coherence_scores.append(result.get("coherence", 0.0)) + helpfulness_scores.append(result.get("helpfulness", 0.0)) @@ - return { - "relevancy": relevancy_scores, - "correctness": correctness_scores, - "completeness": completeness_scores, - "clarity": clarity_scores - } + return { + "relevancy": relevancy_scores, + "correctness": correctness_scores, + "completeness": completeness_scores, + "clarity": clarity_scores, + "coherence": coherence_scores, + "helpfulness": helpfulness_scores, + }
207-213
: Fix f-string lint and guard base_url when posting to /chat/completions.Avoid F541 and
.rstrip()
on None.- logger.debug(f"🔍 增强异步评估请求发送中...") - async with session.post( - f"{self.config.get('chat_base_url', self.config.get('base_url')).rstrip('/')}/chat/completions", + logger.debug("🔍 增强异步评估请求发送中...") + base_url = (self.config.get("chat_base_url") + or self.config.get("base_url") + or "https://api.openai.com/v1").rstrip("/") + async with session.post( + f"{base_url}/chat/completions", headers=headers, json=payload ) as response:
312-316
: Same base_url guard for quality assessment path.- async with session.post( - f"{self.config.get('chat_base_url', self.config.get('base_url')).rstrip('/')}/chat/completions", + base_url = (self.config.get("chat_base_url") + or self.config.get("base_url") + or "https://api.openai.com/v1").rstrip("/") + async with session.post( + f"{base_url}/chat/completions", headers=headers, json=payload ) as response:
♻️ Duplicate comments (5)
evaluators/academic_evaluator.py (1)
6-9
: Drop sys.path hacks; rely on package imports.This is fragile and can mask import issues. Remove the path mutation and unused
detect_embedding_config
import.-import sys -import os -sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) -from utils.embedding_adapter import EmbeddingAdapterFactory, detect_embedding_config +from utils.embedding_adapter import EmbeddingAdapterFactoryutils/embedding_adapter.py (2)
146-166
: Ollama endpoint path is wrong; use provider-specific path and robust base_url.Current hardcoded “/embeddings” 404s on Ollama and
.rstrip()
may hit None.def _get_api_format(self) -> Dict[str, Any]: """获取API格式配置""" if self.provider == EmbeddingProvider.OLLAMA: return { + "path": "/api/embeddings", "input_field": "prompt", "response_path": ["embedding"], "headers": { "Content-Type": "application/json", "Authorization": f"Bearer {self.config.get('api_key', '')}" } } else: return { + "path": "/embeddings", "input_field": "input", "response_path": ["data", 0, "embedding"], "headers": { "Content-Type": "application/json", "Authorization": f"Bearer {self.config.get('api_key', '')}" } } @@ async def embed_query(self, text: str) -> List[float]: """嵌入单个文本""" - url = f"{self.config['base_url'].rstrip('/')}/embeddings" + base_url = (self.config.get("base_url") or + ("http://localhost:11434" if self.provider == EmbeddingProvider.OLLAMA else "https://api.openai.com/v1")).rstrip("/") + url = f"{base_url}{self.api_format['path']}" payload = { "model": self.config["model"], self.api_format["input_field"]: text } @@ - except Exception as e: - logger.error(f"直接嵌入失败: {e}") - raise Exception(f"嵌入失败: {str(e)}") + except Exception as e: + logger.exception(f"直接嵌入失败: {e}") + raiseAlso applies to: 167-175, 176-193
195-200
: Don’t return exceptions from embed_documents.Propagate errors; return type should be List[List[float]].
- tasks = [self.embed_query(text) for text in texts] - return await asyncio.gather(*tasks, return_exceptions=True) + tasks = [self.embed_query(text) for text in texts] + return await asyncio.gather(*tasks)evaluators/ragas_evaluator.py (2)
190-197
: Batch path blocks the event loop; run evaluate off-thread (or switch to aevaluate).Keep async non-blocking like the single-item path.
- # 使用Ragas批量评估(同步函数) - result = evaluate( - dataset, - metrics=self.metrics, - llm=self.eval_llm, - embeddings=self.embeddings, - raise_exceptions=False - ) + # 使用线程池避免阻塞事件循环 + result = await asyncio.to_thread( + evaluate, + dataset, + metrics=self.metrics, + llm=self.eval_llm, + embeddings=self.embeddings, + raise_exceptions=False + )
206-265
: Pad metric lists when ragas returns fewer rows than inputs.Prevents length mismatch for consumers.
if scores_dict and len(scores_dict) > 0: for i, item_scores in enumerate(scores_dict): if i < len(answers): @@ - else: - # 如果没有足够的评估结果,填充None - relevancy_scores.append(None) - correctness_scores.append(None) - faithfulness_scores.append(None) - context_precision_scores.append(None) - context_recall_scores.append(None) + # 结果条目不足时补齐 + missing = len(answers) - len(relevancy_scores) + if missing > 0: + relevancy_scores.extend([None] * missing) + correctness_scores.extend([None] * missing) + faithfulness_scores.extend([None] * missing) + context_precision_scores.extend([None] * missing) + context_recall_scores.extend([None] * missing)
🧹 Nitpick comments (14)
evaluators/academic_evaluator.py (2)
56-57
: Prefer logging.exception() in exception handlers.Improves traceability without changing behavior.
- logger.error(f"❌ {self.name}增强评估器初始化失败: {e}") + logger.exception(f"❌ {self.name}增强评估器初始化失败: {e}") @@ - logger.error(f"异步评估错误: {e}") + logger.exception(f"异步评估错误: {e}") @@ - logger.error(f"混合评估错误: {e}") + logger.exception(f"混合评估错误: {e}") @@ - logger.error(f"纯聊天模式评估错误: {e}") + logger.exception(f"纯聊天模式评估错误: {e}") @@ - logger.error(f"嵌入模型调用失败: {e}") + logger.exception(f"嵌入模型调用失败: {e}") @@ - logger.error(f"聊天模型质量评估失败: {e}") + logger.exception(f"聊天模型质量评估失败: {e}") @@ - logger.error(f"余弦相似度计算失败: {e}") + logger.exception(f"余弦相似度计算失败: {e}") @@ - logger.error(f"文本相似度计算失败: {e}") + logger.exception(f"文本相似度计算失败: {e}")Also applies to: 120-121, 150-151, 226-227, 265-266, 329-330, 449-450, 474-475
34-49
: Consider using detect_embedding_config to normalize adapter config.You import it but don’t use it; applying it would auto-detect provider and reduce misconfig.
- embedding_config = { - "api_key": config.get("embedding_api_key", ""), - "base_url": config.get("embedding_base_url"), - "model": config.get("embedding_model", "nomic-embed-text:latest"), - "timeout": config.get("embedding_timeout", 30) - } + raw_cfg = { + "api_key": config.get("embedding_api_key", ""), + "base_url": config.get("embedding_base_url", "") or "", + "model": config.get("embedding_model", "nomic-embed-text:latest"), + "timeout": config.get("embedding_timeout", 30) + } + embedding_config = detect_embedding_config(raw_cfg["base_url"], raw_cfg["model"], raw_cfg["api_key"]) + embedding_config["timeout"] = raw_cfg["timeout"]utils/embedding_adapter.py (2)
58-76
: Unify provider detection with port heuristics.Mirror 11434-port and model-name checks to reduce mis-detection.
Want a small helper to share detection logic between adapters and
detect_embedding_config
?Also applies to: 138-145
114-121
: Use logging.exception and raise from original errors.Improves tracebacks.
- except Exception as e: - logger.error(f"嵌入失败: {e}") - raise Exception(f"嵌入失败: {str(e)}") + except Exception as e: + logger.exception("嵌入失败") + raise @@ - except Exception as e: - logger.error(f"批量嵌入失败: {e}") - raise Exception(f"批量嵌入失败: {str(e)}") + except Exception as e: + logger.exception("批量嵌入失败") + raiseAlso applies to: 122-129
evaluators/ragas_evaluator.py (2)
85-87
: Prefer logging.exception in catches.Keeps stack traces.
- logger.error(f"❌ {self.name}评估器初始化失败: {e}") + logger.exception(f"❌ {self.name}评估器初始化失败: {e}") @@ - logger.error(f"❌ {self.name}异步评估失败: {e}") + logger.exception(f"❌ {self.name}异步评估失败: {e}") @@ - logger.error(f" Ragas分数处理错误: {e}") + logger.exception(f" Ragas分数处理错误: {e}") @@ - logger.error(f"❌ {self.name}异步批量评估失败: {e}") + logger.exception(f"❌ {self.name}异步批量评估失败: {e}")Also applies to: 101-103, 161-169, 274-276
16-18
: Remove unused imports.
aiohttp
andOptional
are unused.-from typing import Dict, List, Any, Optional +from typing import Dict, List, Any @@ -import aiohttp +Also applies to: 21-21
evaluators/factory.py (8)
15-23
: Annotate class variables with ClassVar and fix typing.Satisfies RUF012 and improves type clarity.
-from typing import Dict, List, Any, Optional +from typing import Any, Dict, List, Optional, ClassVar, Type @@ - EVALUATOR_TYPES = { + EVALUATOR_TYPES: ClassVar[Dict[str, Type[BaseEvaluator]]] = { @@ - DEFAULT_PRIORITY = ["ragas", "academic"] + DEFAULT_PRIORITY: ClassVar[List[str]] = ["ragas", "academic"]
39-41
: Log full trace on factory creation failures.Use logging.exception to preserve stack traces (TRY400/BLE001).
- except Exception as e: - logger.error(f"❌ {evaluator_type}评估器创建失败: {e}") + except Exception: + logger.exception(f"❌ {evaluator_type}评估器创建失败")
52-64
: Make evaluator creation truly concurrent.Current code awaits coroutines sequentially. Create tasks and gather with exception handling.
- # 并发创建所有评估器 - tasks = [] - for evaluator_type in types: - task = cls.create_evaluator_async(evaluator_type, config) - tasks.append((evaluator_type, task)) - - # 等待所有评估器创建完成 - for evaluator_type, task in tasks: - evaluator = await task - if evaluator: - evaluators[evaluator_type] = evaluator + # 并发创建所有评估器 + tasks = {et: asyncio.create_task(cls.create_evaluator_async(et, config)) for et in types} + results = await asyncio.gather(*tasks.values(), return_exceptions=True) + for et, res in zip(tasks.keys(), results): + if isinstance(res, Exception): + logger.exception(f"❌ {et}评估器创建失败(并发)") + continue + if res: + evaluators[et] = res
71-87
: Avoid heavyweight instantiation in get_evaluator_info.Constructing evaluators here can trigger network calls (e.g., embeddings test) and slow or fail this metadata path. Prefer class-level metadata (e.g., a classmethod supported_metrics()/description) or a local static map.
100-104
: Normalize punctuation to ASCII to appease linters.Silences RUF001/RUF003 without altering meaning.
- "academic": "增强学术评估器 - 支持6维度质量评估(相关性、正确性、完整性、清晰度、连贯性、有用性)", - "ragas": "Ragas框架评估器 - 完整的RAG评估指标集(相关性、正确性、忠实性、上下文精度、上下文召回率)" + "academic": "增强学术评估器 - 支持6维度质量评估(相关性、正确性、完整性、清晰度、连贯性、有用性)", + "ragas": "Ragas框架评估器 - 完整的RAG评估指标集(相关性、正确性、忠实性、上下文精度、上下文召回率)"- "evaluation_mode": "hybrid" # 使用混合模式:embedding计算相关性,聊天模型评估质量 + "evaluation_mode": "hybrid" # 使用混合模式: embedding计算相关性, 聊天模型评估质量Also applies to: 128-128
115-115
: Remove f-strings without placeholders.Silences F541.
- logger.info(f"🔧 评估器管理器初始化完成") + logger.info("🔧 评估器管理器初始化完成")- logger.debug(f" ✅ 完成") + logger.debug(" ✅ 完成")Also applies to: 152-152
138-141
: Annotate contexts as Optional to match default.Silences RUF013 and improves API clarity.
- async def evaluate_all_async(self, questions: List[str], answers: List[str], - ground_truths: List[str], contexts: List[List[str]] = None) -> Dict[str, Dict[str, List[float]]]: + async def evaluate_all_async(self, questions: List[str], answers: List[str], + ground_truths: List[str], contexts: Optional[List[List[str]]] = None) -> Dict[str, Dict[str, List[float]]]:
143-159
: Run all evaluators concurrently for throughput.Keeps per-evaluator error isolation; also switches to logging.exception.
- for evaluator_name, evaluator in self.evaluators.items(): - logger.info(f"\n📊 使用{evaluator_name}评估器评估中...") - - try: - # 使用带超时的异步评估 - metrics = await evaluator.evaluate_with_timeout( - questions, answers, ground_truths, contexts - ) - all_results[evaluator_name] = metrics - logger.debug(f" ✅ 完成") - except Exception as e: - logger.error(f" ❌ 失败: {e}") - # 使用默认值填充 - default_metrics = {metric: [None] * len(answers) - for metric in evaluator.get_supported_metrics()} - all_results[evaluator_name] = default_metrics + # 并发执行所有评估器以减少总耗时 + tasks = { + name: asyncio.create_task( + evaluator.evaluate_with_timeout(questions, answers, ground_truths, contexts) + ) + for name, evaluator in self.evaluators.items() + } + for name in tasks: + logger.info(f"\n📊 使用{name}评估器评估中...") + results = await asyncio.gather(*tasks.values(), return_exceptions=True) + for (name, evaluator), res in zip(self.evaluators.items(), results): + if isinstance(res, Exception): + logger.exception(f" ❌ 失败({name})") + all_results[name] = {metric: [None] * len(answers) for metric in evaluator.get_supported_metrics()} + else: + all_results[name] = res + logger.debug(" ✅ 完成")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
evaluators/academic_evaluator.py
(15 hunks)evaluators/factory.py
(3 hunks)evaluators/ragas_evaluator.py
(1 hunks)utils/embedding_adapter.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
evaluators/factory.py (3)
evaluators/base.py (4)
BaseEvaluator
(10-184)get_evaluator_info
(171-184)evaluate_with_timeout
(62-94)get_supported_metrics
(153-160)evaluators/academic_evaluator.py (2)
AcademicEvaluator
(18-503)get_supported_metrics
(481-487)evaluators/ragas_evaluator.py (3)
RagasEvaluator
(24-294)get_evaluator_info
(286-294)get_supported_metrics
(278-280)
evaluators/ragas_evaluator.py (2)
evaluators/base.py (6)
BaseEvaluator
(10-184)evaluate_single_answer_async
(46-60)evaluate_answers_async
(29-43)get_supported_metrics
(153-160)is_available
(162-169)get_evaluator_info
(171-184)evaluators/factory.py (1)
get_evaluator_info
(67-94)
evaluators/academic_evaluator.py (2)
evaluators/base.py (1)
BaseEvaluator
(10-184)utils/embedding_adapter.py (6)
EmbeddingAdapterFactory
(201-229)detect_embedding_config
(240-265)create_adapter
(205-214)embed_query
(29-31)embed_query
(114-120)embed_query
(167-193)
🪛 Ruff (0.12.2)
evaluators/factory.py
16-19: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
22-22: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
28-28: Avoid specifying long messages outside the exception class
(TRY003)
39-39: Do not catch blind exception: Exception
(BLE001)
40-40: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
100-100: String contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF001)
100-100: String contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF001)
101-101: String contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF001)
101-101: String contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF001)
115-115: f-string without any placeholders
Remove extraneous f
prefix
(F541)
128-128: Comment contains ambiguous :
(FULLWIDTH COLON). Did you mean :
(COLON)?
(RUF003)
128-128: Comment contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF003)
139-139: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
152-152: f-string without any placeholders
Remove extraneous f
prefix
(F541)
153-153: Do not catch blind exception: Exception
(BLE001)
154-154: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
evaluators/ragas_evaluator.py
78-78: Abstract raise
to an inner function
(TRY301)
78-78: Avoid specifying long messages outside the exception class
(TRY003)
85-85: Do not catch blind exception: Exception
(BLE001)
86-86: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
89-89: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
101-101: Do not catch blind exception: Exception
(BLE001)
102-102: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
105-105: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
120-120: Comment contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF003)
120-120: Comment contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF003)
135-135: Comment contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF003)
161-161: Do not catch blind exception: Exception
(BLE001)
162-162: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
165-165: Consider moving this statement to an else
block
(TRY300)
167-167: Do not catch blind exception: Exception
(BLE001)
168-168: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
172-172: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
190-190: Comment contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF003)
190-190: Comment contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF003)
209-209: Comment contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF003)
248-248: Comment contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF003)
255-255: String contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF001)
257-257: Do not catch blind exception: Exception
(BLE001)
258-258: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
266-272: Consider moving this statement to an else
block
(TRY300)
274-274: Do not catch blind exception: Exception
(BLE001)
275-275: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
evaluators/academic_evaluator.py
46-46: Do not catch blind exception: Exception
(BLE001)
47-47: String contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF001)
50-50: Comment contains ambiguous :
(FULLWIDTH COLON). Did you mean :
(COLON)?
(RUF003)
50-50: Comment contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF003)
50-50: Comment contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF003)
50-50: Comment contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF003)
50-50: Comment contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF003)
55-55: Do not catch blind exception: Exception
(BLE001)
56-56: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
113-113: Comment contains ambiguous :
(FULLWIDTH COLON). Did you mean :
(COLON)?
(RUF003)
113-113: Comment contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF003)
116-116: Comment contains ambiguous :
(FULLWIDTH COLON). Did you mean :
(COLON)?
(RUF003)
119-119: Do not catch blind exception: Exception
(BLE001)
120-120: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
150-150: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
207-207: f-string without any placeholders
Remove extraneous f
prefix
(F541)
225-225: Do not catch blind exception: Exception
(BLE001)
226-226: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
230-230: Docstring contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF002)
230-230: Docstring contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF002)
233-233: Comment contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF003)
235-235: String contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF001)
264-264: Do not catch blind exception: Exception
(BLE001)
265-265: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
328-328: Do not catch blind exception: Exception
(BLE001)
329-329: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
449-449: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
474-474: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
utils/embedding_adapter.py
47-47: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
48-48: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
48-48: Create your own exception
(TRY002)
48-48: Avoid specifying long messages outside the exception class
(TRY003)
64-64: Comment contains ambiguous (
(FULLWIDTH LEFT PARENTHESIS). Did you mean (
(LEFT PARENTHESIS)?
(RUF003)
64-64: Comment contains ambiguous )
(FULLWIDTH RIGHT PARENTHESIS). Did you mean )
(RIGHT PARENTHESIS)?
(RUF003)
111-111: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
112-112: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
112-112: Create your own exception
(TRY002)
112-112: Avoid specifying long messages outside the exception class
(TRY003)
118-118: Do not catch blind exception: Exception
(BLE001)
119-119: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
120-120: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
120-120: Create your own exception
(TRY002)
120-120: Avoid specifying long messages outside the exception class
(TRY003)
120-120: Use explicit conversion flag
Replace with conversion flag
(RUF010)
126-126: Do not catch blind exception: Exception
(BLE001)
127-127: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
128-128: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
128-128: Create your own exception
(TRY002)
128-128: Avoid specifying long messages outside the exception class
(TRY003)
128-128: Use explicit conversion flag
Replace with conversion flag
(RUF010)
190-190: Abstract raise
to an inner function
(TRY301)
190-190: Create your own exception
(TRY002)
190-190: Avoid specifying long messages outside the exception class
(TRY003)
191-191: Do not catch blind exception: Exception
(BLE001)
192-192: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
193-193: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
193-193: Create your own exception
(TRY002)
193-193: Avoid specifying long messages outside the exception class
(TRY003)
193-193: Use explicit conversion flag
Replace with conversion flag
(RUF010)
210-210: Do not catch blind exception: Exception
(BLE001)
211-211: String contains ambiguous ,
(FULLWIDTH COMMA). Did you mean ,
(COMMA)?
(RUF001)
225-225: f-string without any placeholders
Remove extraneous f
prefix
(F541)
227-227: Do not catch blind exception: Exception
(BLE001)
228-228: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
Summary by cubic
Unifies the evaluator system under a fully async API, removes legacy sync code, and standardizes names (no “async” in class names). Ragas and Academic evaluators are updated, factories/managers are simplified, and result output keys reflect the new names.
Refactors
Migration
Summary by CodeRabbit
New Features
Refactor
Chores