Skip to content

Conversation

boma086
Copy link
Owner

@boma086 boma086 commented Sep 6, 2025

Summary by cubic

Unifies the evaluator system under a fully async API, removes legacy sync code, and standardizes names (no “async” in class names). Ragas and Academic evaluators are updated, factories/managers are simplified, and result output keys reflect the new names.

  • Refactors

    • Replace Async* evaluators with AcademicEvaluator and RagasEvaluator built on BaseEvaluator (async API).
    • Rename AsyncEvaluatorFactory/Manager to EvaluatorFactory/Manager; keep legacy factory/manager as stubs.
    • Remove RagasOllamaEvaluator and other sync paths.
    • RagasEvaluator uses the full Ragas metrics set; AcademicEvaluator updates embedding calls to POST /embeddings with input and falls back to text similarity when no embedding config.
    • Update exports/imports (init, main import) and add optims.md with an optimization plan.
  • Migration

    • Update imports to evaluators.evaluator_factory.{EvaluatorFactory, EvaluatorManager} and evaluators.{academic_evaluator, ragas_evaluator}; drop async_* names.
    • Replace any RagasOllamaEvaluator usage with RagasEvaluator.
    • Ensure your embedding API supports POST /embeddings with the input field; otherwise use pure_chat or omit embedding API key for fallback.
    • Downstream consumers: results keys/CSV columns changed from async_ragas/async_academic to ragas/academic.

Summary by CodeRabbit

  • New Features

    • Concurrent async evaluation across multiple evaluators with timeouts.
    • Added Academic and Ragas evaluators with expanded metrics.
    • Universal embedding adapter with provider auto-detection and fallback.
    • Dify connector supports streaming responses.
    • Added a new Japanese iPhone 13 Pro Max test case.
  • Refactor

    • Standardized evaluator names, public API and default priorities; evaluation APIs moved to async-first.
  • Chores

    • Removed legacy Ollama/old async artifacts and suppressed a noisy warning.
    • Updated saved evaluation results and summaries.

Copy link

coderabbitai bot commented Sep 6, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

The change set replaces legacy async-specific evaluators with unified evaluator classes, migrates the evaluation pipeline to native async, removes the Ollama-specific Ragas evaluator, adds a universal embedding adapter framework, consolidates connectors around a single async base with streaming support, reworks factory/manager to async multi-evaluator orchestration, updates main usage, and refreshes test data/results fields.

Changes

Cohort / File(s) Summary
Evaluators API surface
evaluators/__init__.py, evaluators/base.py, evaluators/academic_evaluator.py, evaluators/ragas_evaluator.py, evaluators/factory.py
Public exports simplified to AcademicEvaluator, RagasEvaluator, EvaluatorFactory, EvaluatorManager. Core evaluator API moved to async (evaluate_*_async, timeout helpers, evaluate_with_timeout, get_evaluator_info). AcademicEvaluator replaces the async-named class and uses BaseEvaluator with optional embedding-adapter integration. RagasEvaluator adopts native async Ragas evaluation and batch Dataset handling. Factory/Manager reworked to async, multi-evaluator creation and concurrent evaluation flows.
Removed legacy evaluator
evaluators/ragas_ollama.py
Deleted RagasOllamaEvaluator and its Ollama/OpenRouter/LangChain wiring.
Main entry adjustment
main_multi_eval_async.py
Replaces AsyncEvaluatorManager import/instantiation with EvaluatorManager from evaluators.factory.
Connectors unification (async base)
connectors/base.py, connectors/async_base.py
Consolidated to a single async base: AsyncBaseRAGConnector is now the primary base in connectors/base.py (with BaseRAGConnector aliased to it); the separate connectors/async_base.py file was removed. Added async query/test methods and timeout wrappers.
Connectors updates
connectors/dify.py, connectors/ragflow.py
DifyConnector updated to use the unified base, adds constructor and streaming-mode request/response handling with SSE parsing. RagFlowConnector import updated to the unified base.
Embedding framework
utils/embedding_adapter.py
New universal embedding adapter framework: EmbeddingProvider enum, UniversalEmbeddingAdapter ABC, LangChainEmbeddingAdapter, DirectEmbeddingAdapter, EmbeddingAdapterFactory, plus helpers create_and_test_embedding and detect_embedding_config (with timeout and error handling).
Data and results
data/test_cases_jp.json, results/detailed_evaluation_results.json
Added a JP test case (iPhone 13 Pro Max). Renamed result fields from async_ragas/async_academic to ragas/academic and updated metrics and content values.
Config
config.py
Suppresses a specific UserWarning about protected namespace model_name.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant App as App
  participant EM as EvaluatorManager
  participant EF as EvaluatorFactory
  participant AE as AcademicEvaluator
  participant RE as RagasEvaluator

  App->>EM: initialize_async(config)
  EM->>EF: create_all_evaluators_async(config, types)
  EF-->>EM: { "ragas": RE, "academic": AE }
  EM-->>App: evaluators ready

  note over EM,EF: Async multi-evaluator initialization
Loading
sequenceDiagram
  autonumber
  participant EM as EvaluatorManager
  participant BE as BaseEvaluator
  participant RE as RagasEvaluator
  participant AE as AcademicEvaluator
  participant RG as ragas.evaluate
  participant EA as EmbeddingAdapter

  EM->>RE: evaluate_with_timeout(questions,...)
  RE->>RG: evaluate(Dataset, metrics, llm, embeddings)
  RG-->>RE: scores[]
  RE-->>EM: {relevancy[], correctness[], faithfulness[], context_precision[], context_recall[]}

  EM->>AE: evaluate_with_timeout(questions,...)
  AE->>EA: embed_query(answer), embed_query(ground_truth)
  EA-->>AE: vectors
  AE-->>EM: {relevancy[], correctness[], completeness[], clarity[]}

  note over EM: Aggregates per-evaluator results concurrently
Loading
sequenceDiagram
  autonumber
  participant Client as Client
  participant Dify as DifyConnector
  participant API as "Dify API (stream)"

  Client->>Dify: query_with_timeout(question)
  Dify->>API: POST /chat-messages (response_mode=streaming)
  API-->>Dify: SSE events (message, message_end)
  Dify-->>Client: Aggregated answer + metadata

  note over Dify: Parses streaming events and finalizes result
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

I thump my paws: async at last!
Two judges hop where four once passed.
Streams of words, embeddings bloom,
Ragas sings in native tune.
Old Ollama fades from view—
New adapters chew the dew. 🥕🐇

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch async

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

qodo-merge-pro bot commented Sep 6, 2025

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Logging/Stdout

The new async evaluator prints status and error messages with print instead of using the module logger; this can spam stdout and bypass log levels. Consider replacing with structured logging and honoring the configured logger.

    print(f"✅ {self.name}增强评估器初始化成功 (模式: {self.evaluation_mode})")
except Exception as e:
    print(f"❌ {self.name}增强评估器初始化失败: {e}")
    self._available = False
Async Claim

The code labels Ragas usage as "native async" but still calls evaluate synchronously from ragas; confirm whether ragas.evaluate supports true async here or if this blocks the event loop. If sync, consider offloading or using per-metric async APIs.

result = evaluate(
    dataset, 
    metrics=self.metrics,
    llm=self.eval_llm,
    embeddings=self.embeddings,
    raise_exceptions=False
)

# 处理结果 - Ragas 0.3.2+ 返回EvaluationResult对象
scores = {}
try:
    scores_dict = result.scores

    # scores_dict 是列表格式,每个元素是一个字典
    if scores_dict and len(scores_dict) > 0:
        item_scores = scores_dict[0]  # 第一个评估结果
Embeddings API Shape

The embeddings endpoint and payload switched to /embeddings with input field and flexible response parsing; verify compatibility with all targeted providers and handle missing api_key gracefully before building headers.

# 并发获取两个文本的嵌入向量
async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=30)) as session:
    headers = {
        "Authorization": f"Bearer {self.embedding_config['api_key']}",
        "Content-Type": "application/json"
    }

    # 获取回答的嵌入向量
    answer_payload = {
        "model": self.embedding_config["model"],
        "input": answer  # 使用标准的 input 字段
    }

    # 获取标准答案的嵌入向量
    ground_truth_payload = {
        "model": self.embedding_config["model"],
        "input": ground_truth
    }

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @boma086, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the evaluation system by transitioning it to a fully asynchronous architecture. The changes streamline the codebase by removing deprecated synchronous components and adopting native asynchronous capabilities, particularly for Ragas evaluations. This refactoring aims to enhance performance, simplify the system's design, and lay the groundwork for future optimizations outlined in the newly added optimization roadmap.

Highlights

  • Unified Asynchronous Evaluator System: The evaluation framework has been refactored to be exclusively asynchronous, removing all legacy synchronous evaluator classes and their associated factories and managers.
  • Native Ragas Async Integration: The Ragas evaluator (ragas_evaluator.py) now leverages Ragas's native asynchronous API for both single and batch evaluations, eliminating the need for asyncio.run_in_executor and improving performance.
  • Simplified Evaluator Naming: Evaluator class and file names have been streamlined by removing the async_ prefix (e.g., async_academic_evaluator.py is now academic_evaluator.py), as all evaluators are now inherently asynchronous.
  • Improved Embedding API Compatibility: The academic evaluator (academic_evaluator.py) now supports multiple embedding API response formats and includes a fallback to text similarity if embedding API keys are not configured.
  • Optimization Roadmap: A new optims.md file has been added, detailing a comprehensive plan for future architectural, performance, and code quality improvements, including further async optimizations, unified evaluator management, and enhanced monitoring.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request does a great job of unifying the evaluator architecture around an asynchronous-first approach. Removing the synchronous evaluators and factories simplifies the codebase significantly. The enhancements to the AcademicEvaluator, such as better embedding API compatibility and more robust response parsing, are also valuable improvements.

However, I've found a few issues that should be addressed:

  • There's a critical performance issue in RagasEvaluator where a synchronous, blocking call is used within an async method, which will freeze the event loop.
  • The EvaluatorFactory can be improved to initialize evaluators concurrently for faster startup.
  • The renaming of classes and modules has resulted in some confusing aliases and inconsistent naming, which impacts code maintainability.

My detailed comments provide specific suggestions to resolve these points. After addressing these, the PR will be in excellent shape.

Comment on lines +187 to +193
result = evaluate(
dataset,
metrics=self.metrics,
llm=self.eval_llm,
embeddings=self.embeddings,
raise_exceptions=False
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The call to ragas.evaluate() is synchronous and will block the event loop. This is a critical issue in an async application, as it negates the performance benefits of asyncio. The ragas library provides an async alternative, aevaluate, which should be used for non-blocking evaluation.

To fix this, you should:

  1. Change the import on line 4 from from ragas import evaluate to from ragas import aevaluate.
  2. Use await aevaluate(...) here.
Suggested change
result = evaluate(
dataset,
metrics=self.metrics,
llm=self.eval_llm,
embeddings=self.embeddings,
raise_exceptions=False
)
result = await aevaluate(
dataset,
metrics=self.metrics,
llm=self.eval_llm,
embeddings=self.embeddings,
raise_exceptions=False
)

from connectors.universal import UniversalRAGConnector
from evaluators.factory import EvaluatorManager
from evaluators.async_factory import AsyncEvaluatorManager
from evaluators.evaluator_factory import EvaluatorManager as AsyncEvaluatorManager

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The import from evaluators.evaluator_factory import EvaluatorManager as AsyncEvaluatorManager is confusing. The class is named EvaluatorManager in evaluator_factory.py, but it's being aliased to AsyncEvaluatorManager here. This refactoring has led to inconsistent naming that can make the code harder to understand and maintain.

Consider renaming the classes in evaluator_factory.py back to AsyncEvaluatorFactory and AsyncEvaluatorManager, or updating the consumer code to use the new names directly without aliasing. A consistent naming scheme across the project would be beneficial. For example, if all evaluators are now async, the Async prefix might be redundant and could be removed everywhere for simplicity, but this should be done consistently.

Copy link

qodo-merge-pro bot commented Sep 6, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix blocking synchronous call issue
Suggestion Impact:The commit changed the synchronous evaluate() invocation to run in a background thread using asyncio.to_thread, preventing event loop blocking. It also adjusted dataset field names and logging, but the key impact matches the suggestion's intent.

code diff:

-            # 使用Ragas评估(同步函数)
-            result = evaluate(
-                dataset, 
+            # 使用Ragas评估(同步函数)- 在单独的线程中运行以避免阻塞事件循环
+            result = await asyncio.to_thread(
+                evaluate,
+                dataset=dataset,
                 metrics=self.metrics,
                 llm=self.eval_llm,
                 embeddings=self.embeddings,

The method claims to use "Ragas native async API" but actually calls the
synchronous evaluate function. This should run in an executor to avoid blocking
the event loop.

evaluators/ragas_evaluator.py [102-124]

 async def _evaluate_ragas_native_async(self, question: str, answer: str, ground_truth: str, context: List[str] = None) -> Dict[str, float]:
-    """使用Ragas原生异步API进行评估"""
+    """使用Ragas评估进行异步评估"""
     try:
         # 准备评估数据
         eval_contexts = context if context else ['相关上下文']
         
         # 创建数据集
         from datasets import Dataset
         dataset = Dataset.from_dict({
             'question': [question],
             'answer': [answer],
             'ground_truth': [ground_truth],
             'contexts': [eval_contexts]
         })
         
-        # 使用Ragas评估(同步函数)
-        result = evaluate(
-            dataset, 
-            metrics=self.metrics,
-            llm=self.eval_llm,
-            embeddings=self.embeddings,
-            raise_exceptions=False
+        # 在线程池中运行同步的Ragas评估
+        loop = asyncio.get_event_loop()
+        result = await loop.run_in_executor(
+            None,
+            lambda: evaluate(
+                dataset, 
+                metrics=self.metrics,
+                llm=self.eval_llm,
+                embeddings=self.embeddings,
+                raise_exceptions=False
+            )
         )

[Suggestion processed]

Suggestion importance[1-10]: 10

__

Why: This suggestion identifies a critical performance bug where a synchronous function evaluate blocks the async event loop, and correctly proposes wrapping it in run_in_executor to restore non-blocking behavior.

High
High-level
Avoid blocking event loop

RagasEvaluator claims native async but calls ragas.evaluate (sync) directly
inside async methods, blocking the event loop and defeating the unified async
architecture under load. Switch to truly asynchronous scoring (e.g., per-metric
async ascore APIs or an evaluate_async equivalent); if unavailable, wrap
evaluate in run_in_executor to keep the event loop responsive.

Examples:

evaluators/ragas_evaluator.py [102-124]
    async def _evaluate_ragas_native_async(self, question: str, answer: str, ground_truth: str, context: List[str] = None) -> Dict[str, float]:
        """使用Ragas原生异步API进行评估"""
        try:
            # 准备评估数据
            eval_contexts = context if context else ['相关上下文']
            
            # 创建数据集
            from datasets import Dataset
            dataset = Dataset.from_dict({
                'question': [question],

 ... (clipped 13 lines)
evaluators/ragas_evaluator.py [167-193]
    async def evaluate_answers_async(self, questions: List[str], answers: List[str], 
                                  ground_truths: List[str], contexts: List[List[str]] = None) -> Dict[str, List[float]]:
        """使用Ragas原生异步API批量评估多个答案"""
        if not self._available:
            return {"relevancy": [None] * len(answers), "correctness": [None] * len(answers), "faithfulness": [None] * len(answers), "context_precision": [None] * len(answers), "context_recall": [None] * len(answers)}
        
        try:
            # 准备评估数据
            eval_contexts = contexts if contexts else [['相关上下文'] for _ in range(len(questions))]
            

 ... (clipped 17 lines)

Solution Walkthrough:

Before:

# In evaluators/ragas_evaluator.py
class RagasEvaluator(BaseEvaluator):
    async def _evaluate_ragas_native_async(self, question, answer, ground_truth, context):
        # ...
        dataset = Dataset.from_dict(...)
        
        # This is a synchronous, blocking call inside an async method.
        # It blocks the entire event loop.
        result = evaluate(
            dataset, 
            metrics=self.metrics,
            ...
        )
        return scores

    async def evaluate_answers_async(self, questions, ...):
        # ...
        # The batch evaluation also uses a blocking call.
        result = evaluate(dataset, ...)
        # ...
        return scores_dict

After:

# In evaluators/ragas_evaluator.py
class RagasEvaluator(BaseEvaluator):
    async def _evaluate_ragas_native_async(self, question, answer, ground_truth, context):
        # ...
        dataset = Dataset.from_dict(...)
        
        # Wrap the blocking call in run_in_executor to avoid blocking the event loop.
        loop = asyncio.get_event_loop()
        result = await loop.run_in_executor(
            None,
            evaluate, # The blocking function
            dataset, 
            metrics=self.metrics,
            ...
        )
        return scores

    async def evaluate_answers_async(self, questions, ...):
        # The batch version should also be run in an executor.
        loop = asyncio.get_event_loop()
        result = await loop.run_in_executor(None, evaluate, dataset, ...)
        # ...
        return scores_dict
Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a critical flaw where a synchronous call ragas.evaluate blocks the async event loop, which negates the PR's goal of a unified async architecture and introduces a severe performance regression.

High
General
Add retry logic for embedding requests
Suggestion Impact:The commit refactored embedding calls to use a generalized embedding adapter, replacing direct aiohttp calls. While it doesn't show explicit retry code in this diff, moving to an adapter likely centralizes error handling, timeouts (timeout added in config), and potentially retries within the adapter, thereby addressing the robustness concern.

code diff:

+            embedding_config = {
                 "api_key": config.get("embedding_api_key", ""),
                 "base_url": config.get("embedding_base_url"),
-                "model": config.get("embedding_model", "nomic-embed-text:latest")
+                "model": config.get("embedding_model", "nomic-embed-text:latest"),
+                "timeout": config.get("embedding_timeout", 30)
             }
+            
+            # 创建通用嵌入适配器
+            try:
+                self.embedding_adapter = EmbeddingAdapterFactory.create_adapter(embedding_config)
+                print(f"✅ 通用嵌入适配器初始化成功: {embedding_config['model']}")
+            except Exception as e:
+                print(f"⚠️  嵌入适配器初始化失败,将使用文本相似度: {e}")
+                self.embedding_adapter = None
             
             # 评估模式:pure_chat(纯聊天模型)或 hybrid(混合模式)
             self.evaluation_mode = config.get("evaluation_mode", "pure_chat")
@@ -96,7 +109,7 @@
         """异步评估单个回答 - 支持多种评估模式和质量指标"""
         
         try:
-            if self.evaluation_mode == "hybrid" and self.embedding_config["api_key"]:
+            if self.evaluation_mode == "hybrid" and self.embedding_adapter:
                 # 混合模式:使用嵌入模型计算相关性,聊天模型计算质量指标
                 return await self._evaluate_hybrid_mode(question, answer, ground_truth, context)
             else:
@@ -214,82 +227,39 @@
             return self._get_enhanced_default_scores()
     
     async def _calculate_semantic_similarity(self, answer: str, ground_truth: str) -> float:
-        """使用嵌入模型计算语义相似度(混合模式用)- 优化版本"""
-        
-        try:
-            # 如果没有嵌入配置,直接使用文本相似度
-            if not self.embedding_config.get("api_key"):
+        """使用嵌入模型计算语义相似度(混合模式用)- 使用通用适配器"""
+        
+        try:
+            # 如果没有嵌入适配器,直接使用文本相似度
+            if not self.embedding_adapter:
+                print("🔍 嵌入适配器不可用,使用文本相似度")
                 return self._calculate_text_similarity(answer, ground_truth)
             
             # 并发获取两个文本的嵌入向量
-            async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=30)) as session:
-                headers = {
-                    "Authorization": f"Bearer {self.embedding_config['api_key']}",
-                    "Content-Type": "application/json"
-                }
-                
-                # 获取回答的嵌入向量
-                answer_payload = {
-                    "model": self.embedding_config["model"],
-                    "input": answer  # 使用标准的 input 字段
-                }
-                
-                # 获取标准答案的嵌入向量
-                ground_truth_payload = {
-                    "model": self.embedding_config["model"],
-                    "input": ground_truth
-                }
-                
-                # 并发请求两个嵌入向量
-                answer_task = session.post(
-                    f"{self.embedding_config['base_url'].rstrip('/')}/embeddings",
-                    headers=headers,
-                    json=answer_payload
-                )
-                
-                ground_truth_task = session.post(
-                    f"{self.embedding_config['base_url'].rstrip('/')}/embeddings",
-                    headers=headers,
-                    json=ground_truth_payload
-                )
-                
-                answer_response, ground_truth_response = await asyncio.gather(
-                    answer_task, ground_truth_task, return_exceptions=True
-                )
-                
-                # 处理回答嵌入向量
-                if isinstance(answer_response, Exception):
-                    print(f"❌ 回答嵌入向量获取失败: {answer_response}")
-                    return self._calculate_text_similarity(answer, ground_truth)
-                
-                if answer_response.status != 200:
-                    print(f"❌ 回答嵌入向量请求失败: {answer_response.status}")
-                    return self._calculate_text_similarity(answer, ground_truth)
-                
-                answer_result = await answer_response.json()
-                # 支持多种API格式
-                answer_embedding = answer_result.get("data", [{}])[0].get("embedding", []) if "data" in answer_result else answer_result.get("embedding", [])
-                
-                # 处理标准答案嵌入向量
-                if isinstance(ground_truth_response, Exception):
-                    print(f"❌ 标准答案嵌入向量获取失败: {ground_truth_response}")
-                    return self._calculate_text_similarity(answer, ground_truth)
-                
-                if ground_truth_response.status != 200:
-                    print(f"❌ 标准答案嵌入向量请求失败: {ground_truth_response.status}")
-                    return self._calculate_text_similarity(answer, ground_truth)
-                
-                ground_truth_result = await ground_truth_response.json()
-                ground_truth_embedding = ground_truth_result.get("data", [{}])[0].get("embedding", []) if "data" in ground_truth_result else ground_truth_result.get("embedding", [])
-                
-                # 计算余弦相似度
-                if len(answer_embedding) > 0 and len(ground_truth_embedding) > 0:
-                    similarity = self._calculate_cosine_similarity(answer_embedding, ground_truth_embedding)
-                    print(f"🔍 嵌入向量语义相似度: {similarity:.4f}")
-                    return similarity
-                else:
-                    print(f"❌ 嵌入向量为空 - answer: {len(answer_embedding)}, ground_truth: {len(ground_truth_embedding)}")
-                    return self._calculate_text_similarity(answer, ground_truth)
+            answer_task = self.embedding_adapter.embed_query(answer)
+            ground_truth_task = self.embedding_adapter.embed_query(ground_truth)
+            
+            answer_embedding, ground_truth_embedding = await asyncio.gather(
+                answer_task, ground_truth_task, return_exceptions=True
+            )
+            
+            # 处理异常情况
+            if isinstance(answer_embedding, Exception):
+                print(f"❌ 回答嵌入向量获取失败: {answer_embedding}")
+                return self._calculate_text_similarity(answer, ground_truth)
+            
+            if isinstance(ground_truth_embedding, Exception):
+                print(f"❌ 标准答案嵌入向量获取失败: {ground_truth_embedding}")
+                return self._calculate_text_similarity(answer, ground_truth)
+            
+            # 计算余弦相似度
+            if len(answer_embedding) > 0 and len(ground_truth_embedding) > 0:
+                similarity = self._calculate_cosine_similarity(answer_embedding, ground_truth_embedding)
+                print(f"🔍 嵌入向量语义相似度: {similarity:.4f}")
+                return similarity
+            else:
+                print(f"❌ 嵌入向量为空 - answer: {len(answer_embedding)}, ground_truth: {len(ground_truth_embedding)}")
+                return self._calculate_text_similarity(answer, ground_truth)

The embedding API calls should include proper error handling for network
failures and invalid responses. Add timeout and retry logic for robustness.

evaluators/academic_evaluator.py [232-248]

 answer_payload = {
     "model": self.embedding_config["model"],
-    "input": answer  # 使用标准的 input 字段
+    "input": answer
 }
 
 ground_truth_payload = {
     "model": self.embedding_config["model"],
     "input": ground_truth
 }
 
-# 并发请求两个嵌入向量
-answer_task = session.post(
-    f"{self.embedding_config['base_url'].rstrip('/')}/embeddings",
-    headers=headers,
-    json=answer_payload
-)
+# 并发请求两个嵌入向量,添加重试逻辑
+async def make_embedding_request(payload):
+    for attempt in range(3):  # 最多重试3次
+        try:
+            response = await session.post(
+                f"{self.embedding_config['base_url'].rstrip('/')}/embeddings",
+                headers=headers,
+                json=payload,
+                timeout=aiohttp.ClientTimeout(total=10)
+            )
+            return response
+        except (aiohttp.ClientError, asyncio.TimeoutError) as e:
+            if attempt == 2:  # 最后一次尝试
+                raise e
+            await asyncio.sleep(0.5 * (attempt + 1))  # 指数退避
 
+answer_task = make_embedding_request(answer_payload)
+ground_truth_task = make_embedding_request(ground_truth_payload)
+

[Suggestion processed]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly points out the lack of robust error handling for network requests and proposes adding a retry mechanism with exponential backoff, which significantly improves the reliability of the embedding API calls.

Medium
Raise deprecation errors instead of silent failures

Instead of returning None/empty dict, these legacy methods should raise a clear
deprecation error to guide users to the correct async implementation.

evaluators/factory.py [18-28]

 def create_evaluator(cls, evaluator_type: str, config: Dict[str, Any]) -> Optional[BaseEvaluator]:
     """指定されたタイプの評価器を作成"""
-    print(f"⚠️  {evaluator_type} 同期評価器は削除されました。非同期バージョンを使用してください。")
-    return None
+    raise DeprecationWarning(f"同期評価器 {evaluator_type} は削除されました。evaluators.evaluator_factory.EvaluatorFactory を使用してください。")
 
 @classmethod
 def create_all_evaluators(cls, config: Dict[str, Any], 
                         types: Optional[List[str]] = None) -> Dict[str, BaseEvaluator]:
     """全ての利用可能な評価器を作成"""
-    print("⚠️  同期評価器はすべて削除されました。非同期バージョンを使用してください。")
-    return {}
+    raise DeprecationWarning("同期評価器はすべて削除されました。evaluators.evaluator_factory.EvaluatorFactory を使用してください。")
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion improves developer experience by replacing silent print statements in legacy methods with explicit DeprecationWarning exceptions, providing clearer guidance and preventing silent failures.

Medium
Remove confusing async alias
Suggestion Impact:The commit removed the alias import of BaseEvaluator as AsyncBaseEvaluator and also dropped 'AsyncBaseEvaluator' from __all__, aligning with the suggestion to eliminate the confusing alias.

code diff:

 from .base import BaseEvaluator
-from .base_evaluator import BaseEvaluator as AsyncBaseEvaluator
 from .academic_evaluator import AcademicEvaluator
 from .ragas_evaluator import RagasEvaluator
-from .factory import EvaluatorFactory, EvaluatorManager  # Legacy classes
-from .evaluator_factory import EvaluatorFactory as AsyncEvaluatorFactory, EvaluatorManager as AsyncEvaluatorManager
+from .factory import EvaluatorFactory, EvaluatorManager
 
-# 注: すべての評価器は非同期APIをサポートしています
+# 注意: 所有评估器都支持异步API
 
 __all__ = [
     'BaseEvaluator',
-    'AsyncBaseEvaluator',
     'AcademicEvaluator',

The alias creates confusion between sync and async evaluators. Since all
evaluators now support async APIs, the alias should be removed to maintain
clarity.

evaluators/init.py [4]

-from .base_evaluator import BaseEvaluator as AsyncBaseEvaluator
+from .base_evaluator import BaseEvaluator

[Suggestion processed]

Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that the alias AsyncBaseEvaluator is confusing and contradicts the PR's goal of unifying evaluator names, so removing it improves code clarity and maintainability.

Low
  • Update

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
evaluators/evaluator_factory.py (1)

52-64: 按注释并发创建评估器:当前实现仍是串行 await

将协程封装为任务并 gather。

-        # 并发创建所有评估器
-        tasks = []
-        for evaluator_type in types:
-            task = cls.create_evaluator_async(evaluator_type, config)
-            tasks.append((evaluator_type, task))
-        
-        # 等待所有评估器创建完成
-        for evaluator_type, task in tasks:
-            evaluator = await task
-            if evaluator:
-                evaluators[evaluator_type] = evaluator
+        # 并发创建所有评估器
+        task_map = {
+            evaluator_type: asyncio.create_task(cls.create_evaluator_async(evaluator_type, config))
+            for evaluator_type in types
+        }
+        results = await asyncio.gather(*task_map.values(), return_exceptions=True)
+        for evaluator_type, result in zip(task_map.keys(), results):
+            if isinstance(result, BaseEvaluator):
+                evaluators[evaluator_type] = result
evaluators/ragas_evaluator.py (1)

117-124: Do not block the event loop: run ragas.evaluate in a thread.

evaluate(...) is synchronous; calling it directly inside async code will block. Use asyncio.to_thread to offload.

-            # 使用Ragas评估(同步函数)
-            result = evaluate(
-                dataset, 
-                metrics=self.metrics,
-                llm=self.eval_llm,
-                embeddings=self.embeddings,
-                raise_exceptions=False
-            )
+            # 使用线程池避免阻塞事件循环
+            result = await asyncio.to_thread(
+                evaluate,
+                dataset,
+                metrics=self.metrics,
+                llm=self.eval_llm,
+                embeddings=self.embeddings,
+                raise_exceptions=False
+            )
🧹 Nitpick comments (35)
evaluators/base_evaluator.py (1)

26-26: Consider lowering init log level to debug to reduce noise.

Initialization logs can be verbose across many evaluators. Suggest logger.debug or gating by config flag.

-        logger.info(f"Evaluator initialized: {name}")
+        logger.debug("Evaluator initialized: %s", name)
evaluators/factory.py (1)

9-16: Tidy up legacy placeholder: fix lint warnings, typing, and logging.

Address ClassVar typing for mutable class attrs, remove f-strings without placeholders, mark optional types, and use logging over print. Also avoid “unused arg” warnings without breaking the public signature.

-from typing import Dict, List, Any, Optional
+from typing import Dict, List, Any, Optional, ClassVar, Type
+import logging
+logger = logging.getLogger(__name__)

 class EvaluatorFactory:
@@
-    EVALUATOR_TYPES = {
-        # 注: 同期評価器は削除されました。非同期バージョンを使用してください。
-    }
+    EVALUATOR_TYPES: ClassVar[Dict[str, Type[BaseEvaluator]]] = {}
@@
-    DEFAULT_PRIORITY = []  # 同期評価器は使用されません
+    DEFAULT_PRIORITY: ClassVar[List[str]] = []  # 同期評価器は使用されません
@@
-    def create_evaluator(cls, evaluator_type: str, config: Dict[str, Any]) -> Optional[BaseEvaluator]:
+    def create_evaluator(cls, evaluator_type: str, config: Dict[str, Any]) -> Optional[BaseEvaluator]:
         """指定されたタイプの評価器を作成"""
-        print(f"⚠️  {evaluator_type} 同期評価器は削除されました。非同期バージョンを使用してください。")
+        logger.warning("⚠️  %s 同期評価器は削除されました。非同期バージョンを使用してください。", evaluator_type)
         return None
@@
-    def create_all_evaluators(cls, config: Dict[str, Any], 
-                            types: Optional[List[str]] = None) -> Dict[str, BaseEvaluator]:
+    def create_all_evaluators(cls, config: Dict[str, Any], 
+                            types: Optional[List[str]] = None) -> Dict[str, BaseEvaluator]:
         """全ての利用可能な評価器を作成"""
-        print("⚠️  同期評価器はすべて削除されました。非同期バージョンを使用してください。")
+        logger.warning("⚠️  同期評価器はすべて削除されました。非同期バージョンを使用してください。")
         return {}
@@
     def get_evaluator_info(cls) -> Dict[str, Dict[str, Any]]:
         """全評価器の情報を取得"""
-        print("⚠️  同期評価器はすべて削除されました。非同期バージョンを使用してください。")
+        logger.warning("⚠️  同期評価器はすべて削除されました。非同期バージョンを使用してください。")
         return {}
@@
-        return f"{evaluator_type} 同期評価器は削除されました。非同期バージョンを使用してください。"
+        return f"{evaluator_type} 同期評価器は削除されました。非同期バージョンを使用してください。"
@@
-        print("⚠️  EvaluatorManager はレガシークラスです。AsyncEvaluatorManager を使用してください。")
-        self.evaluators = {}
-        print(f"🔧 同期評価器はすべて削除されました。非同期バージョンを使用してください。")
+        logger.warning("⚠️  EvaluatorManager はレガシークラスです。AsyncEvaluatorManager を使用してください。")
+        # 保持して将来の互換性と静的解析の警告解消
+        self.chat_config = chat_config
+        self.embedding_config = embedding_config
+        self.evaluators = {}
+        logger.info("🔧 同期評価器はすべて削除されました。非同期バージョンを使用してください。")
@@
-    def evaluate_all(self, questions: List[str], answers: List[str], 
-                    ground_truths: List[str], contexts: List[List[str]] = None) -> Dict[str, Dict[str, List[float]]]:
+    def evaluate_all(self, questions: List[str], answers: List[str], 
+                    ground_truths: List[str], contexts: Optional[List[List[str]]] = None) -> Dict[str, Dict[str, List[float]]]:
         """全評価器で評価を実行"""
-        print("⚠️  同期評価器はすべて削除されました。非同期バージョンを使用してください。")
+        logger.warning("⚠️  同期評価器はすべて削除されました。非同期バージョンを使用してください。")
         return {}
@@
-        print("⚠️  同期評価器はすべて削除されました。非同期バージョンを使用してください。")
+        logger.warning("⚠️  同期評価器はすべて削除されました。非同期バージョンを使用してください。")
         return {
             "total_evaluators": 0,
             "available_evaluators": [],
             "evaluator_details": {}
         }

Notes:

  • Kept parameter names to avoid breaking keyword callers; stored them on self to satisfy ARG002.
  • _get_evaluator_description legitimately uses an f-string with a placeholder; retained.

Also applies to: 18-35, 37-40, 44-55, 58-61, 48-48

results/detailed_evaluation_results.json (1)

1-134: Add JSON validation to CI
Validate the JSON output in your CI pipeline to catch formatting regressions early. For example, add a step running:

jq -e . results/detailed_evaluation_results.json >/dev/null && echo "JSON OK"
optims.md (4)

3-8: 避免绝对路径与命名矛盾;改为仓库相对路径并统一命名策略

  • 文档中出现的本地绝对路径如“/Users/mabo/.../ragas_ollama.py”不可移植,建议改为仓库相对路径(如“evaluators/ragas_ollama.py”)。
  • 文档要求产物名不再包含“async”,而仓库仍存在“main_multi_eval_async.py”。请在迁移完成后统一命名或在文档中明确过渡期策略。

81-97: 统一管理器接口与现有 BaseEvaluator API 不一致

当前仓库 BaseEvaluator 以 evaluate_answers_async/evaluate_with_timeout 为主,文档里的 UnifiedEvaluatorManager 使用 evaluate/evaluate_async 检测,命名不一致会增加实现/调用负担。建议文档与代码对齐(沿用 evaluate_with_timeout 与 batch 方法)。


212-226: EvaluationCache 标记为 async 但无 await;考虑改为同步或加锁保证并发安全

get/set 无异步 I/O,完全可以改为同步方法;若保留 async,建议至少用 asyncio.Lock 保护 self.cache 写入,避免与 clear_expired 并发竞争(尤其在多任务场景)。

示意(改为同步):

-    async def get(...):
+    def get(...):
         ...
-    async def set(...):
+    def set(...):
         ...

或保留 async 并引入锁:

self._lock = asyncio.Lock()
async with self._lock:
    self.cache[key] = {...}

350-360: BaseEvaluator 示例与仓库实现不一致

文档中的 BaseEvaluator(evaluate/evaluate_batch) 与仓库中 BaseEvaluator(evaluate_answers_async/evaluate_with_timeout/...) 不同。建议统一术语和方法名,避免后续 API 演进产生歧义。

main_multi_eval_async.py (5)

13-15: 移除未使用的导入,避免歧义

from evaluators.factory import EvaluatorManager 未使用且与 AsyncEvaluatorManager 的别名并存,容易混淆。建议删除。

-from evaluators.factory import EvaluatorManager
 from evaluators.evaluator_factory import EvaluatorManager as AsyncEvaluatorManager

44-51: 在后台线程读取 JSON,避免阻塞事件循环

load_test_cases 当前为同步文件 I/O。建议使用 asyncio.to_thread 包装读取。

 async def load_test_cases(self, file_path: str) -> list:
-        """加载测试用例"""
-        try:
-            with open(file_path, 'r', encoding='utf-8') as f:
-                return json.load(f)
+        """加载测试用例(在线程池中执行以避免阻塞事件循环)"""
+        def _read_json(fp: str):
+            with open(fp, 'r', encoding='utf-8') as f:
+                return json.load(f)
+        try:
+            return await asyncio.to_thread(_read_json, file_path)
         except Exception as e:
             raise ValueError(f"测试用例加载失败 {file_path}: {e}")

52-66: 并发测试连接以提升启动速度

test_connections 逐个 await,串行等待。可并发发起连接测试,明显缩短总耗时。

     async def test_connections(self) -> Dict[str, bool]:
         """测试所有连接"""
-        results = {}
-        
-        for system_name, connector in self.connectors.items():
-            try:
-                is_connected = await connector.test_connection_async()
-                results[system_name] = is_connected
-                print(f"{'✅' if is_connected else '❌'} {system_name} RAG系统连接{'成功' if is_connected else '失败'}")
-            except Exception as e:
-                results[system_name] = False
-                print(f"❌ {system_name} RAG系统连接测试失败: {e}")
-        
-        return results
+        tasks = {
+            name: asyncio.create_task(conn.test_connection_async())
+            for name, conn in self.connectors.items()
+        }
+        results: Dict[str, bool] = {}
+        for name, task in tasks.items():
+            try:
+                is_ok = await task
+                results[name] = bool(is_ok)
+                print(f"{'✅' if is_ok else '❌'} {name} RAG系统连接{'成功' if is_ok else '失败'}")
+            except Exception as e:
+                results[name] = False
+                print(f"❌ {name} RAG系统连接测试失败: {e}")
+        return results

71-88: 对各连接器的查询并发执行,减少单题查询延时

query_rag_systems 逐个 await,可用 gather 并发执行。

     async def query_rag_systems(self, question: str) -> Dict[str, Dict[str, Any]]:
         """查询所有RAG系统"""
-        results = {}
-        
-        for system_name, connector in self.connectors.items():
-            try:
-                result = await connector.query_with_timeout(
-                    question, 
-                    timeout=ASYNC_CONFIG["rag_query_timeout"]
-                )
-                results[system_name] = result
-                
-                if result.get("error"):
-                    print(f"  {system_name} 错误: {result['error']}")
-                else:
-                    print(f"  {system_name} 成功获取回答")
-                    
-            except Exception as e:
-                results[system_name] = {"answer": "", "contexts": [], "error": str(e)}
-                print(f"  {system_name} 查询失败: {e}")
-        
-        return results
+        tasks = {
+            name: asyncio.create_task(
+                connector.query_with_timeout(question, timeout=ASYNC_CONFIG["rag_query_timeout"])
+            )
+            for name, connector in self.connectors.items()
+        }
+        results: Dict[str, Dict[str, Any]] = {}
+        for name, task in tasks.items():
+            try:
+                result = await task
+                results[name] = result
+                print(f"  {name} {'错误: ' + result['error'] if result.get('error') else '成功获取回答'}")
+            except Exception as e:
+                results[name] = {"answer": "", "contexts": [], "error": str(e)}
+                print(f"  {name} 查询失败: {e}")
+        return results

1-2: 命名对齐计划

文档要求最终产物不再包含“async”。当统一管理器/工厂落地后,建议将本文件重命名为 main_multi_eval.py(并更新调用入口)。

evaluators/evaluator_factory.py (6)

3-3: 补充 typing:使用 ClassVar/Type/Tuple 以满足 lint 与类型安全

为类属性注解 ClassVar,并引入 Type/Tuple。

-from typing import Dict, List, Any, Optional
+from typing import Dict, List, Any, Optional, Type, ClassVar, Tuple

37-41: 工厂错误信息使用 logger,并保留上下文

统一日志出口,方便生产观测。

-                print(f"⚠️  {evaluator_type}评估器不可用")
+                logger.warning("评估器不可用: %s", evaluator_type)
                 return None
         except Exception as e:
-            print(f"❌ {evaluator_type}评估器创建失败: {e}")
+            logger.exception("评估器创建失败: %s", evaluator_type)
             return None

115-116: 移除无占位符 f-string 或改为 logger

避免 F541。

-        print(f"🔧 评估器管理器初始化完成")
+        logger.info("🔧 评估器管理器初始化完成")

138-140: PEP 484:Optional 显式标注

将 contexts 的默认 None 改为 Optional[...]。

-    async def evaluate_all_async(self, questions: List[str], answers: List[str], 
-                               ground_truths: List[str], contexts: List[List[str]] = None) -> Dict[str, Dict[str, List[float]]]:
+    async def evaluate_all_async(self, questions: List[str], answers: List[str],
+                                 ground_truths: List[str], contexts: Optional[List[List[str]]] = None
+                                 ) -> Dict[str, Dict[str, List[float]]]:

144-159: 统一使用 logger 取代 print,便于接入结构化日志与等级控制

运行期输出应通过 logger。

-            print(f"\n📊 使用{evaluator_name}评估器评估中...")
+            logger.info("📊 使用评估器评估: %s", evaluator_name)
 ...
-                print(f"    ✅ 完成")
+                logger.info("    ✅ 完成: %s", evaluator_name)
-            except Exception as e:
-                print(f"    ❌ 失败: {e}")
+            except Exception as e:
+                logger.exception("    ❌ 失败: %s", evaluator_name)
 ...
-        return all_results
+        return all_results

100-103: (可选)全角括号导致 lint 警告

若启用 Ruff RUF001,可将描述字符串中的全角括号(())改为半角 ()。不影响功能,仅为消除告警。

evaluators/ragas_evaluator.py (9)

86-90: Type hints allow None but signature says Dict[str, float].

You return None values when unavailable/empty answer; annotate return and context accordingly.

-    async def evaluate_single_answer_async(self, question: str, answer: str, ground_truth: str, context: List[str] = None) -> Dict[str, float]:
+    async def evaluate_single_answer_async(
+        self,
+        question: str,
+        answer: str,
+        ground_truth: str,
+        context: Optional[List[str]] = None
+    ) -> Dict[str, Optional[float]]:

102-104: Same Optional typing for helper.

Keep helper consistent with public method.

-    async def _evaluate_ragas_native_async(self, question: str, answer: str, ground_truth: str, context: List[str] = None) -> Dict[str, float]:
+    async def _evaluate_ragas_native_async(
+        self,
+        question: str,
+        answer: str,
+        ground_truth: str,
+        context: Optional[List[str]] = None
+    ) -> Dict[str, Optional[float]]:

167-175: Validate input lengths for batch.

Guard against mismatched list lengths to avoid silent misalignment.

     async def evaluate_answers_async(self, questions: List[str], 
                                   answers: List[str], 
                                   ground_truths: List[str], contexts: List[List[str]] = None) -> Dict[str, List[float]]:
         """使用Ragas原生异步API批量评估多个答案"""
         if not self._available:
             return {"relevancy": [None] * len(answers), "correctness": [None] * len(answers), "faithfulness": [None] * len(answers), "context_precision": [None] * len(answers), "context_recall": [None] * len(answers)}
-        
+        if not (len(questions) == len(answers) == len(ground_truths)):
+            raise ValueError("questions, answers, and ground_truths must have the same length")
+        
         try:
             # 准备评估数据
-            eval_contexts = contexts if contexts else [['相关上下文'] for _ in range(len(questions))]
+            eval_contexts = contexts if contexts else [['相关上下文'] for _ in range(len(questions))]

30-37: Use BaseEvaluator timeout and avoid hard-coded values.

Propagate self.timeout to the client for consistency and configurability.

-            self.eval_llm = LangchainLLMWrapper(ChatOpenAI(
+            self.eval_llm = LangchainLLMWrapper(ChatOpenAI(
                 api_key=config["api_key"],
                 base_url=config["base_url"],
                 model=config.get("model", "gpt-3.5-turbo"),
                 temperature=0,
                 max_tokens=1000,
-                timeout=60  # 增加超时时间
+                timeout=self.timeout
             ))

55-60: Default OpenAI embedding model is outdated.

text-embedding-ada-002 is deprecated. Prefer text-embedding-3-small or -3-large.

-                self.embeddings = OpenAIEmbeddings(
+                self.embeddings = OpenAIEmbeddings(
                     api_key=config.get("embedding_api_key", config["api_key"]),
-                    model=embedding_config.get("model", "text-embedding-ada-002")
+                    model=embedding_config.get("model", "text-embedding-3-small")
                 )
-                embedding_name = f"{embedding_config.get('model', 'text-embedding-ada-002')} (OpenAI)"
+                embedding_name = f"{embedding_config.get('model', 'text-embedding-3-small')} (OpenAI)"

89-101: Inconsistent “missing score” semantics across evaluators.

This class returns None on failure/unavailable, while BaseEvaluator defaults to 0.0. Align one way to avoid downstream schema confusion.

Do you want to standardize on None for “not evaluated” and update BaseEvaluator._get_default_scores/_get_default_single_score accordingly?


71-83: Prefer logger over print for observability.

Route init diagnostics through logging to enable level control and structured sinks.


155-156: Avoid noisy prints in hot paths.

Consider logging at debug level instead of printing every sample’s scores.


253-272: Broad exception swallowing; tighten handlers.

Catching bare Exception repeatedly hides actionable errors. Narrow where possible or at least log with traceback.

evaluators/academic_evaluator.py (8)

95-105: Optional typing for context and return values.

Methods return None for metrics in several paths; reflect that in annotations.

-    async def evaluate_single_answer_async(self, question: str, answer: str, 
-                                         ground_truth: str, context: List[str] = None) -> Dict[str, float]:
+    async def evaluate_single_answer_async(
+        self,
+        question: str,
+        answer: str,
+        ground_truth: str,
+        context: Optional[List[str]] = None
+    ) -> Dict[str, float]:

110-117: Same Optional for helpers.

Keep signatures consistent for readability and type checkers.

-    async def _evaluate_hybrid_mode(self, question: str, answer: str, ground_truth: str, context: List[str] = None) -> Dict[str, float]:
+    async def _evaluate_hybrid_mode(self, question: str, answer: str, ground_truth: str, context: Optional[List[str]] = None) -> Dict[str, float]:

140-149: Same Optional for pure chat path.

-    async def _evaluate_pure_chat_mode(self, question: str, answer: str, ground_truth: str, context: List[str] = None) -> Dict[str, float]:
+    async def _evaluate_pure_chat_mode(self, question: str, answer: str, ground_truth: str, context: Optional[List[str]] = None) -> Dict[str, float]:

298-306: Same Optional for quality helper.

-    async def _assess_enhanced_quality_with_chat_model(self, question: str, answer: str, ground_truth: str, context: List[str] = None) -> Dict[str, float]:
+    async def _assess_enhanced_quality_with_chat_model(self, question: str, answer: str, ground_truth: str, context: Optional[List[str]] = None) -> Dict[str, float]:

382-384: Avoid bare except.

Use except Exception as e to prevent masking system-exiting exceptions and aid debugging.

-            except:
+            except Exception:
                 pass

419-421: Avoid bare except.

Same change here.

-            except:
+            except Exception:
                 pass

523-526: Sync wrappers may fail inside running event loops.

get_event_loop().run_until_complete will raise when a loop is already running. Consider an executor or exposing only async API.


24-29: Add timeout to ChatOpenAI for consistency with BaseEvaluator.

             self.chat_llm = ChatOpenAI(
                 api_key=config.get("chat_api_key", config.get("api_key")),
                 base_url=config.get("chat_base_url", config.get("base_url")),
                 model=config.get("chat_model", config.get("model", "gpt-3.5-turbo")),
-                temperature=0
+                temperature=0,
+                timeout=self.timeout
             )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 71851a9 and 967af91.

⛔ Files ignored due to path filters (1)
  • results/multi_evaluation_results.csv is excluded by !**/*.csv
📒 Files selected for processing (10)
  • evaluators/__init__.py (1 hunks)
  • evaluators/academic_evaluator.py (7 hunks)
  • evaluators/base_evaluator.py (3 hunks)
  • evaluators/evaluator_factory.py (6 hunks)
  • evaluators/factory.py (1 hunks)
  • evaluators/ragas_evaluator.py (6 hunks)
  • evaluators/ragas_ollama.py (0 hunks)
  • main_multi_eval_async.py (1 hunks)
  • optims.md (1 hunks)
  • results/detailed_evaluation_results.json (3 hunks)
💤 Files with no reviewable changes (1)
  • evaluators/ragas_ollama.py
🧰 Additional context used
🧬 Code graph analysis (6)
main_multi_eval_async.py (2)
evaluators/evaluator_factory.py (1)
  • EvaluatorManager (105-173)
evaluators/factory.py (1)
  • EvaluatorManager (41-63)
evaluators/factory.py (3)
evaluators/base_evaluator.py (2)
  • BaseEvaluator (10-184)
  • get_evaluator_info (171-184)
evaluators/evaluator_factory.py (5)
  • EvaluatorFactory (12-103)
  • get_evaluator_info (67-94)
  • _get_evaluator_description (97-103)
  • EvaluatorManager (105-173)
  • get_evaluator_summary (162-173)
evaluators/ragas_evaluator.py (1)
  • get_evaluator_info (282-290)
evaluators/__init__.py (5)
evaluators/base_evaluator.py (1)
  • BaseEvaluator (10-184)
evaluators/academic_evaluator.py (1)
  • AcademicEvaluator (14-533)
evaluators/ragas_evaluator.py (1)
  • RagasEvaluator (21-290)
evaluators/evaluator_factory.py (2)
  • EvaluatorFactory (12-103)
  • EvaluatorManager (105-173)
evaluators/factory.py (2)
  • EvaluatorFactory (6-39)
  • EvaluatorManager (41-63)
evaluators/academic_evaluator.py (1)
evaluators/base_evaluator.py (1)
  • BaseEvaluator (10-184)
evaluators/evaluator_factory.py (3)
evaluators/base_evaluator.py (1)
  • BaseEvaluator (10-184)
evaluators/academic_evaluator.py (1)
  • AcademicEvaluator (14-533)
evaluators/ragas_evaluator.py (1)
  • RagasEvaluator (21-290)
evaluators/ragas_evaluator.py (2)
evaluators/base_evaluator.py (2)
  • BaseEvaluator (10-184)
  • evaluate_answers_async (29-43)
evaluators/academic_evaluator.py (1)
  • evaluate_answers_async (46-92)
🪛 LanguageTool
optims.md

[uncategorized] ~19-~19: 动词的修饰一般为‘形容词(副词)+地+动词’。您的意思是否是:不必要"地"同步
Context: ...cutor` 包装同步代码,而非利用 Ragas 的原生异步能力 - 存在不必要的同步依赖,影响性能和架构清晰度 - 异步实现本质上是"异步外壳 + 同步内核"的...

(wb4)

🪛 Ruff (0.12.2)
evaluators/factory.py

10-12: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


15-15: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


18-18: Unused class method argument: config

(ARG003)


24-24: Unused class method argument: config

(ARG003)


25-25: Unused class method argument: types

(ARG003)


44-44: Unused method argument: chat_config

(ARG002)


44-44: Unused method argument: embedding_config

(ARG002)


48-48: f-string without any placeholders

Remove extraneous f prefix

(F541)


50-50: Unused method argument: questions

(ARG002)


50-50: Unused method argument: answers

(ARG002)


51-51: Unused method argument: ground_truths

(ARG002)


51-51: Unused method argument: contexts

(ARG002)


51-51: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)

evaluators/academic_evaluator.py

42-42: Do not catch blind exception: Exception

(BLE001)


217-217: Docstring contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF002)


217-217: Docstring contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF002)


220-220: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)

evaluators/evaluator_factory.py

16-19: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


22-22: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


39-39: Do not catch blind exception: Exception

(BLE001)


87-87: Do not use bare except

(E722)


100-100: String contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF001)


100-100: String contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF001)


101-101: String contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF001)


101-101: String contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF001)


115-115: f-string without any placeholders

Remove extraneous f prefix

(F541)


139-139: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)

evaluators/ragas_evaluator.py

98-98: Do not catch blind exception: Exception

(BLE001)


102-102: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


117-117: Comment contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF003)


117-117: Comment contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF003)


168-168: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


186-186: Comment contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF003)


186-186: Comment contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF003)


205-205: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


244-244: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


251-251: String contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)


253-253: Do not catch blind exception: Exception

(BLE001)


262-268: Consider moving this statement to an else block

(TRY300)


270-270: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (5)
evaluators/base_evaluator.py (1)

1-1: Header comment clarity improved.

No action required.

results/detailed_evaluation_results.json (1)

21-36: Confirm downstream handling of nulls in metric arrays.

faithfulness includes nulls; ensure any consumers ignore nulls or impute safely to avoid NaN in aggregates.

optims.md (2)

127-135: Pydantic v2 设置写法更新建议(若项目使用 v2)

文档示例使用 v1 的 Config 内嵌类。若项目已升级到 Pydantic v2,建议改为 SettingsConfigDict 写法,避免迁移时踩坑。

若为 v2,可按下述思路调整(示意):

from pydantic_settings import BaseSettings, SettingsConfigDict

class AppConfig(BaseSettings):
    database: DatabaseConfig = DatabaseConfig()
    llm: LLMConfig = LLMConfig()
    evaluation: EvaluationConfig = EvaluationConfig()
    model_config = SettingsConfigDict(env_file=".env", env_nested_delimiter="__")

31-44: 确认 Ragas 原生异步接口可用
Ragas metrics 已在官方文档中提供 single_turn_ascore(sample, callbacks=None, timeout=None)multi_turn_ascore(sample, callbacks=None, timeout=None) 异步方法,示例中 await metric.single_turn_ascore(sample) 合规 (docs.ragas.io);顶层 evaluate(..., is_async=True) 支持批量异步执行。

evaluators/__init__.py (1)

12-21: all matches re-exports.

After the import fix, all looks good.

Comment on lines 3 to 9
from .base import BaseEvaluator
from .ragas_ollama import RagasOllamaEvaluator # Re-enabled with compatible versions
# from .academic_evaluator import AcademicEvaluator # Removed in favor of async version
from .factory import EvaluatorFactory, EvaluatorManager
from .async_base import AsyncBaseEvaluator
from .async_academic_evaluator import AsyncAcademicEvaluator
from .async_ragas_evaluator import AsyncRagasEvaluator
from .async_factory import AsyncEvaluatorFactory, AsyncEvaluatorManager
from .base_evaluator import BaseEvaluator as AsyncBaseEvaluator
from .academic_evaluator import AcademicEvaluator
from .ragas_evaluator import RagasEvaluator
from .factory import EvaluatorFactory, EvaluatorManager # Legacy classes
from .evaluator_factory import EvaluatorFactory as AsyncEvaluatorFactory, EvaluatorManager as AsyncEvaluatorManager

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Broken import: .base does not exist.

from .base import BaseEvaluator will ImportError; also you want to re-export BaseEvaluator twice. Import solely from base_evaluator and alias.

-from .base import BaseEvaluator
-from .base_evaluator import BaseEvaluator as AsyncBaseEvaluator
+from .base_evaluator import BaseEvaluator
+AsyncBaseEvaluator = BaseEvaluator
 from .academic_evaluator import AcademicEvaluator
 from .ragas_evaluator import RagasEvaluator
-from .factory import EvaluatorFactory, EvaluatorManager  # Legacy classes
-from .evaluator_factory import EvaluatorFactory as AsyncEvaluatorFactory, EvaluatorManager as AsyncEvaluatorManager
+from .factory import EvaluatorFactory, EvaluatorManager  # Legacy classes
+from .evaluator_factory import EvaluatorFactory as AsyncEvaluatorFactory, EvaluatorManager as AsyncEvaluatorManager
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from .base import BaseEvaluator
from .ragas_ollama import RagasOllamaEvaluator # Re-enabled with compatible versions
# from .academic_evaluator import AcademicEvaluator # Removed in favor of async version
from .factory import EvaluatorFactory, EvaluatorManager
from .async_base import AsyncBaseEvaluator
from .async_academic_evaluator import AsyncAcademicEvaluator
from .async_ragas_evaluator import AsyncRagasEvaluator
from .async_factory import AsyncEvaluatorFactory, AsyncEvaluatorManager
from .base_evaluator import BaseEvaluator as AsyncBaseEvaluator
from .academic_evaluator import AcademicEvaluator
from .ragas_evaluator import RagasEvaluator
from .factory import EvaluatorFactory, EvaluatorManager # Legacy classes
from .evaluator_factory import EvaluatorFactory as AsyncEvaluatorFactory, EvaluatorManager as AsyncEvaluatorManager
from .base_evaluator import BaseEvaluator
AsyncBaseEvaluator = BaseEvaluator
from .academic_evaluator import AcademicEvaluator
from .ragas_evaluator import RagasEvaluator
from .factory import EvaluatorFactory, EvaluatorManager # Legacy classes
from .evaluator_factory import EvaluatorFactory as AsyncEvaluatorFactory, EvaluatorManager as AsyncEvaluatorManager
🤖 Prompt for AI Agents
In evaluators/__init__.py around lines 3 to 9, remove the broken import "from
.base import BaseEvaluator" and instead import BaseEvaluator only from
.base_evaluator (aliasing if you need an async name), ensuring you don't
re-export the same symbol twice; keep the other evaluator and factory imports
as-is and export a single BaseEvaluator symbol (plus any intentional aliases)
from base_evaluator.

Comment on lines 15 to 23
# 可用的评估器类型
EVALUATOR_TYPES = {
"async_academic": AsyncAcademicEvaluator,
"async_ragas": AsyncRagasEvaluator
"academic": AcademicEvaluator,
"ragas": RagasEvaluator
}

# 默认评估器优先级
DEFAULT_PRIORITY = ["async_ragas", "async_academic"]
DEFAULT_PRIORITY = ["ragas", "academic"]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

类属性应为不可变并显式声明 ClassVar

避免可变类属性被实例共享篡改;同时用 Type[BaseEvaluator] 标注。

-    # 可用的评估器类型
-    EVALUATOR_TYPES = {
-        "academic": AcademicEvaluator,
-        "ragas": RagasEvaluator
-    }
+    # 可用的评估器类型
+    EVALUATOR_TYPES: ClassVar[Dict[str, Type[BaseEvaluator]]] = {
+        "academic": AcademicEvaluator,
+        "ragas": RagasEvaluator,
+    }
 ...
-    # 默认评估器优先级
-    DEFAULT_PRIORITY = ["ragas", "academic"]
+    # 默认评估器优先级(使用不可变元组)
+    DEFAULT_PRIORITY: ClassVar[Tuple[str, ...]] = ("ragas", "academic")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# 可用的评估器类型
EVALUATOR_TYPES = {
"async_academic": AsyncAcademicEvaluator,
"async_ragas": AsyncRagasEvaluator
"academic": AcademicEvaluator,
"ragas": RagasEvaluator
}
# 默认评估器优先级
DEFAULT_PRIORITY = ["async_ragas", "async_academic"]
DEFAULT_PRIORITY = ["ragas", "academic"]
# 可用的评估器类型
EVALUATOR_TYPES: ClassVar[Dict[str, Type[BaseEvaluator]]] = {
"academic": AcademicEvaluator,
"ragas": RagasEvaluator,
}
# 默认评估器优先级(使用不可变元组)
DEFAULT_PRIORITY: ClassVar[Tuple[str, ...]] = ("ragas", "academic")
🧰 Tools
🪛 Ruff (0.12.2)

16-19: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


22-22: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

🤖 Prompt for AI Agents
In evaluators/evaluator_factory.py around lines 15 to 23, the class-level
EVALUATOR_TYPES and DEFAULT_PRIORITY are currently mutable and lack proper
typing; change their declarations to use typing.ClassVar with explicit types
(EVALUATOR_TYPES: ClassVar[Mapping[str, Type[BaseEvaluator]]] and
DEFAULT_PRIORITY: ClassVar[Tuple[str, ...]]), replace the mutable dict with an
immutable Mapping (or wrap with types.MappingProxyType) and use a tuple (or
frozenset/tuple) for DEFAULT_PRIORITY to prevent instance-level mutation and
ensure correct static typing.

Comment on lines 87 to 92
info[name] = {
"name": name,
"supported_metrics": [],
"description": cls._get_evaluator_description(name),
"async": True
"description": cls._get_evaluator_description(name)
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

避免 bare except;记录异常上下文

bare except 会吞掉系统异常。改为捕获 Exception 并记录。

-            except:
-                info[name] = {
+            except Exception as e:
+                logger.debug("获取评估器信息失败: %s (%s)", name, e)
+                info[name] = {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except:
info[name] = {
"name": name,
"supported_metrics": [],
"description": cls._get_evaluator_description(name),
"async": True
"description": cls._get_evaluator_description(name)
}
except Exception as e:
logger.debug("获取评估器信息失败: %s (%s)", name, e)
info[name] = {
"name": name,
"supported_metrics": [],
"description": cls._get_evaluator_description(name)
}
🧰 Tools
🪛 Ruff (0.12.2)

87-87: Do not use bare except

(E722)

🤖 Prompt for AI Agents
In evaluators/evaluator_factory.py around lines 87 to 92, replace the bare
"except:" with "except Exception as e" and record the exception context before
continuing to populate info[name]; use the module logger (or import logging) and
call logger.exception or logging.exception with a message that includes the
evaluator name so the stack trace and error message are preserved, then keep the
existing info[name] assignment unchanged.

# 評価器ファクトリー - Factory Pattern実装 (Legacy - すべて非同期バージョンを使用)

from typing import Dict, List, Any, Optional
from .base import BaseEvaluator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Broken import: module name mismatch.

This will raise ImportError at import time. The base class lives in base_evaluator.py.

-from .base import BaseEvaluator
+from .base_evaluator import BaseEvaluator
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from .base import BaseEvaluator
++ b/evaluators/factory.py
@@ -1,4 +1,4 @@
from .base_evaluator import BaseEvaluator
🤖 Prompt for AI Agents
In evaluators/factory.py around line 4 the import "from .base import
BaseEvaluator" is incorrect because the base class is defined in
base_evaluator.py; change the import to reference the correct module name (e.g.,
from .base_evaluator import BaseEvaluator) so the module loads without
ImportError.

optims.md Outdated
Comment on lines 278 to 289
tasks = [self.process_batch(evaluator, batch) for batch in batches]
results = await asyncio.gather(*tasks, return_exceptions=True)

# 展平结果
final_results = []
for result in results:
if isinstance(result, Exception):
final_results.extend([{"error": str(result)} for _ in batch])
else:
final_results.extend(result)

return final_results
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Bug:异常分支引用未定义的 batch,导致 NameError/错误填充数量

在 process_all 的展平阶段,异常分支使用了未定义变量 batch。应依据批次索引使用 batches[idx] 的长度进行占位。并且建议真正并发创建任务。

应用以下修复:

-        tasks = [self.process_batch(evaluator, batch) for batch in batches]
-        results = await asyncio.gather(*tasks, return_exceptions=True)
+        tasks = [asyncio.create_task(self.process_batch(evaluator, batch)) for batch in batches]
+        results = await asyncio.gather(*tasks, return_exceptions=True)

         # 展平结果
         final_results = []
-        for result in results:
-            if isinstance(result, Exception):
-                final_results.extend([{"error": str(result)} for _ in batch])
+        for idx, result in enumerate(results):
+            if isinstance(result, Exception):
+                final_results.extend([{"error": str(result)} for _ in batches[idx]])
             else:
                 final_results.extend(result)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
tasks = [self.process_batch(evaluator, batch) for batch in batches]
results = await asyncio.gather(*tasks, return_exceptions=True)
# 展平结果
final_results = []
for result in results:
if isinstance(result, Exception):
final_results.extend([{"error": str(result)} for _ in batch])
else:
final_results.extend(result)
return final_results
tasks = [asyncio.create_task(self.process_batch(evaluator, batch)) for batch in batches]
results = await asyncio.gather(*tasks, return_exceptions=True)
# 展平结果
final_results = []
for idx, result in enumerate(results):
if isinstance(result, Exception):
final_results.extend([{"error": str(result)} for _ in batches[idx]])
else:
final_results.extend(result)
return final_results
🤖 Prompt for AI Agents
In optims.md around lines 278 to 289, the exception branch in the
result-flattening loop references an undefined variable batch and will mis-fill
placeholders; change the code to create tasks concurrently (build tasks =
[self.process_batch(evaluator, batch) for batch in batches] before awaiting) and
when iterating results use enumerate so on exception you can use batches[idx]
(or its length) to generate the correct number of error placeholder entries
(e.g., final_results.extend([{"error": str(result)} for _ in batches[idx]]));
ensure non-exception results are extended as before.

@boma086
Copy link
Owner Author

boma086 commented Sep 7, 2025

@claude fix it。

Copy link

agentfarmx bot commented Sep 7, 2025

No operation ID found for this PR

@boma086
Copy link
Owner Author

boma086 commented Sep 7, 2025

@claude fix 1

Copy link

agentfarmx bot commented Sep 7, 2025

No operation ID found for this PR

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
connectors/dify.py (3)

27-48: Add Accept header for SSE when using streaming mode.

Dify streaming uses text/event-stream. Without Accept, some gateways may downgrade to buffered JSON.

         return {
             "method": "POST",
             "url": f"{self.config['base_url']}/chat-messages",
             "headers": {
                 "Authorization": f"Bearer {self.config['api_key']}",
-                "Content-Type": "application/json"
+                "Content-Type": "application/json",
+                "Accept": "text/event-stream"
             },
             "body": {
                 "inputs": {
                     "background": "I am a software developer working on Japanese apps",
                     "instruction": "Please help with software development questions"
                 },
                 "query": question,
                 "response_mode": "streaming",
                 "auto_generate_name": True,
                 "user": user_id
             }
         }

50-71: Propagate timeouts and raise-for-status; use typed exceptions.

Currently no client timeout is set and HTTP errors rely on manual status checks. Use aiohttp.ClientTimeout and raise_for_status; convert to a custom exception and chain original errors.

     async def send_request_async(self, request_data: Dict[str, Any]) -> Dict[str, Any]:
         """异步发送HTTP请求到Dify API"""
         headers = request_data["headers"]
         url = request_data["url"]
         body = request_data["body"]
 
         try:
-            async with aiohttp.ClientSession() as session:
-                async with session.post(url, headers=headers, json=body) as response:
-                    if response.status == 200:
-                        # 检查是否是streaming模式
-                        if body.get("response_mode") == "streaming":
-                            return await self._parse_streaming_response(response)
-                        else:
-                            return await response.json()
-                    else:
-                        error_text = await response.text()
-                        raise Exception(f"Dify API error: {response.status} - {error_text}")
-        except asyncio.TimeoutError:
-            raise Exception("Dify API请求超时")
-        except Exception as e:
-            raise Exception(f"Dify API请求失败: {str(e)}")
+            timeout = aiohttp.ClientTimeout(total=self.timeout)
+            async with aiohttp.ClientSession(timeout=timeout) as session:
+                async with session.post(url, headers=headers, json=body) as response:
+                    response.raise_for_status()
+                    if body.get("response_mode") == "streaming":
+                        return await self._parse_streaming_response(response)
+                    return await response.json()
+        except asyncio.TimeoutError as e:
+            raise DifyAPIError("Dify API请求超时") from e
+        except aiohttp.ClientError as e:
+            raise DifyAPIError(f"Dify API请求失败: {e!s}") from e
+        except Exception as e:
+            raise DifyAPIError(f"Dify API未知错误: {e!s}") from e

Add once at module scope:

class DifyAPIError(Exception):
    pass

Also applies to: 60-65


115-121: Remove sys.path hacking; use normal imports.

sys.path mutation is brittle and can mask packaging issues.

-        import sys
-        import os
-        sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-        from utils.async_utils import AsyncUtils
+        from utils.async_utils import AsyncUtils

Optionally move the import to module scope if import cost is small.

connectors/ragflow.py (2)

45-47: Timeouts not actually enforced in HTTP calls.

You catch asyncio.TimeoutError but aiohttp calls have no timeout configured; requests may hang indefinitely. Pass ClientTimeout.

-            async with aiohttp.ClientSession() as session:
+            timeout = aiohttp.ClientTimeout(total=self.timeout)
+            async with aiohttp.ClientSession(timeout=timeout) as session:
                 async with session.post(url, headers=headers, json=body) as response:
                     if response.status == 200:
                         return await response.json()
                     else:
                         error_text = await response.text()
-                        raise Exception(f"RagFlow API error: {response.status} - {error_text}")
+                        raise Exception(f"RagFlow API error: {response.status} - {error_text}")

59-63: Remove sys.path hacks; use proper imports.

Manipulating sys.path inside a hot path is brittle and slows calls. Import AsyncUtils at module scope via the package path.

-        import sys
-        import os
-        sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-        from utils.async_utils import AsyncUtils
+        from utils.async_utils import AsyncUtils

If package layout requires relative import, switch to:

from ..utils.async_utils import AsyncUtils
evaluators/ragas_evaluator.py (1)

117-124: Blocking call: wrap ragas.evaluate in a thread or use aevaluate

This blocks the event loop. Use asyncio.to_thread (or ragas.aevaluate if available).

-            # 使用Ragas评估(同步函数)
-            result = evaluate(
-                dataset, 
-                metrics=self.metrics,
-                llm=self.eval_llm,
-                embeddings=self.embeddings,
-                raise_exceptions=False
-            )
+            # 非阻塞执行同步evaluate
+            result = await asyncio.to_thread(
+                evaluate,
+                dataset,
+                metrics=self.metrics,
+                llm=self.eval_llm,
+                embeddings=self.embeddings,
+                raise_exceptions=False
+            )
♻️ Duplicate comments (5)
results/detailed_evaluation_results.json (1)

11-12: Fix invalid JSON: unterminated/truncated dify_answer strings.

The value near Line 11 ends mid-SQL and lacks closing quotes/backticks; the API-performance answer also appears truncated. This breaks JSON parsers.

-      "dify_answer": "語彙データベースを設計する際は、以下の構造が適切です:\n\n1. **基本テーブル構造**\n   - **words**(単語)\n     ...snip...\n**実装例(SQL)**:\n```sql\nCREATE TABLE words (\n  id INTEGER PRIMARY KEY,"
+      "dify_answer": "語彙データベースを設計する際は、以下の構造が適切です:\n\n1. **基本テーブル構造**\n   - **words**(単語)\n     ...snip...\n**実装例(SQL)**:\n```sql\nCREATE TABLE words (\n  id INTEGER PRIMARY KEY\n);\n```\n"
@@
-      "dify_answer": "To optimize API performance for delivering lesson content in your Japanese app, consider the following strategies:\n\n### 1. **Caching Mechanisms**\n   ...snip...\n### 7. **Load Balancing & Auto-Scaling**\n   - Distribute traffic across multiple servers using Nginx, HAProxy, or cloud-native solutions (e.g., AWS ALB).\n   - Enable auto-scaling to handle traffic spikes during peak learning hours in Japan.\n\n###"
+      "dify_answer": "To optimize API performance for delivering lesson content in your Japanese app, consider the following strategies:\n\n### 1. **Caching Mechanisms**\n   ...snip...\n### 7. **Load Balancing & Auto-Scaling**\n   - Distribute traffic across multiple servers using Nginx, HAProxy, or cloud-native solutions (e.g., AWS ALB).\n   - Enable auto-scaling to handle traffic spikes during peak learning hours in Japan.\n"

Also applies to: 21-21

evaluators/ragas_evaluator.py (1)

186-193: Batch path still blocks the loop; mirror the to_thread change

Same issue as single-item path.

-            # 使用Ragas批量评估(同步函数)
-            result = evaluate(
-                dataset, 
-                metrics=self.metrics,
-                llm=self.eval_llm,
-                embeddings=self.embeddings,
-                raise_exceptions=False
-            )
+            # 使用线程池避免阻塞事件循环
+            result = await asyncio.to_thread(
+                evaluate,
+                dataset,
+                metrics=self.metrics,
+                llm=self.eval_llm,
+                embeddings=self.embeddings,
+                raise_exceptions=False
+            )
evaluators/academic_evaluator.py (1)

35-40: Add validation for embedding_base_url when embedding_api_key is provided.

The code might attempt to call .rstrip('/') on None if embedding_base_url is not provided but embedding_api_key is set.

Based on the code in utils/embedding_adapter.py, the EmbeddingAdapterFactory.create_adapter expects a valid base_url. Add validation:

 # 初始化通用嵌入适配器
 embedding_config = {
     "api_key": config.get("embedding_api_key", ""),
     "base_url": config.get("embedding_base_url"),
     "model": config.get("embedding_model", "nomic-embed-text:latest"),
     "timeout": config.get("embedding_timeout", 30)
 }
+
+# Ensure base_url is set if api_key is provided
+if embedding_config["api_key"] and not embedding_config["base_url"]:
+    # Try fallback URLs
+    embedding_config["base_url"] = (config.get("chat_base_url") or 
+                                   config.get("base_url") or 
+                                   "https://api.openai.com/v1")
evaluators/__init__.py (1)

3-3: Fix broken import path.

The import from .base import BaseEvaluator will fail because the base class is in base_evaluator.py, not base.py.

-from .base import BaseEvaluator
+from .base_evaluator import BaseEvaluator
evaluators/factory.py (1)

4-4: Fix broken import path.

The import will fail as the base class is in base_evaluator.py, not base.py.

-from .base import BaseEvaluator
+from .base_evaluator import BaseEvaluator
🧹 Nitpick comments (33)
config.py (1)

5-11: Scope the warning suppression to Pydantic (avoid global mute).

A global ignore can hide unrelated, useful warnings. Narrow it to the emitting module and keep the regex literal.

-import warnings
+import warnings

-# 过滤 Pydantic 命名空间冲突警告
-warnings.filterwarnings("ignore", category=UserWarning, message="Field \"model_name\" has conflict with protected namespace")
+# 仅忽略来自 Pydantic 的该条提示
+warnings.filterwarnings(
+    "ignore",
+    message=r'Field "model_name" has conflict with protected namespace',
+    category=UserWarning,
+    module=r"pydantic(\.|$)"
+)
connectors/dify.py (2)

143-151: Make connection test assert non-empty answer.

Current check passes even when answer is empty.

-            return result.get("error") is None
+            return not result.get("error") and bool(result.get("answer"))

152-169: Harden context extraction for alternate Dify payload shapes.

Some payloads nest metadata under message. Fall back to that path.

-        contexts = []
-        
-        # 提取上下文信息
-        if "metadata" in response_data and "retriever_resources" in response_data["metadata"]:
-            contexts = [
-                resource.get("content", "") 
-                for resource in response_data["metadata"]["retriever_resources"]
-                if resource.get("content")
-            ]
+        contexts: List[str] = []
+        metadata = (
+            response_data.get("metadata")
+            or response_data.get("message", {}).get("metadata")
+            or {}
+        )
+        resources = metadata.get("retriever_resources") or []
+        contexts = [
+            r.get("content", "")
+            for r in resources
+            if r.get("content")
+        ]
connectors/base.py (3)

63-78: Timeout None-check and structured logging.

  • Use None-check to allow explicit 0 if ever intended.
  • Prefer logger.exception/parameterized logging to satisfy linters and keep tracebacks.
-        timeout = timeout or self.timeout
+        timeout = self.timeout if timeout is None else timeout
@@
-            logger.warning(f"{self.system_name} {error_msg}")
+            logger.warning("%s %s", self.system_name, error_msg)
@@
-            error_msg = f"查询失败: {str(e)}"
-            logger.error(f"{self.system_name} {error_msg}")
+            error_msg = f"查询失败: {e!s}"
+            logger.exception("%s %s", self.system_name, error_msg)
-        timeout = timeout or self.timeout
+        timeout = self.timeout if timeout is None else timeout
@@
-            logger.warning(f"{self.system_name} 连接测试超时({timeout}秒)")
+            logger.warning("%s 连接测试超时(%s秒)", self.system_name, timeout)
@@
-        except Exception as e:
-            logger.error(f"{self.system_name} 连接测试失败: {str(e)}")
+        except Exception as e:
+            logger.exception("%s 连接测试失败: %s", self.system_name, e)

Also applies to: 90-102


21-23: Avoid external mutation of config.

Make a shallow copy to prevent callers mutating internal state accidentally.

-        self.config = config
+        self.config = dict(config)

55-61: Fix fullwidth punctuation in docstrings/strings (Ruff RUF001/RUF002).

Standardize to ASCII punctuation for consistency and lint compliance.

-            timeout: 超时时间(秒)
+            timeout: 超时时间(秒)
@@
-            error_msg = f"查询超时({timeout}秒)"
+            error_msg = f"查询超时({timeout}秒)"
@@
-            timeout: 超时时间(秒)
+            timeout: 超时时间(秒)
@@
-            logger.warning(f"{self.system_name} 连接测试超时({timeout}秒)")
+            logger.warning(f"{self.system_name} 连接测试超时({timeout}秒)")
@@
-            错误信息列表,空列表表示验证通过
+            错误信息列表, 空列表表示验证通过

Also applies to: 71-74, 84-89, 97-99, 109-111

connectors/ragflow.py (2)

88-89: Leverage base timeout wrapper for connectivity test.

Reuse query_with_timeout for consistent timeout handling.

-            result = await self.query_async("test connection", max_retries=1)
+            result = await self.query_with_timeout("test connection", timeout=self.timeout)

91-91: Log with traceback on connection test failures.

Use logger.exception to retain stack trace.

-            logger.error(f"RagFlow连接测试失败: {e}")
+            logger.exception("RagFlow连接测试失败: %s", e)
results/detailed_evaluation_results.json (2)

14-14: Minor: remove duplicated question text.

The question string repeats the same sentence twice.

-      "question": "iPhone 13 Pro Maxのスペックを教えてください iPhone 13 Pro Maxのスペックを教えてください ",
+      "question": "iPhone 13 Pro Maxのスペックを教えてください",

1-148: Add JSON validity check to CI
Include a pipeline step that runs:

jq . results/detailed_evaluation_results.json >/dev/null

to fail on malformed JSON and prevent regressions.

evaluators/base.py (2)

91-94: Use logger.exception and avoid stringifying exceptions

Keeps tracebacks and cleaner logs.

-        except Exception as e:
-            error_msg = f"评价失败: {str(e)}"
-            logger.error(f"{self.name} {error_msg}")
+        except Exception:
+            logger.exception(f"{self.name} 评价失败")
             return self._get_default_scores(len(answers))
...
-        except Exception as e:
-            error_msg = f"单个评价失败: {str(e)}"
-            logger.error(f"{self.name} {error_msg}")
+        except Exception:
+            logger.exception(f"{self.name} 单个评价失败")
             return self._get_default_single_score()

Also applies to: 125-127


32-58: Docstring punctuation: replace full-width parens

Replace Chinese full-width parentheses with ASCII to appease linters; content can remain Chinese.

Also applies to: 66-76, 100-111, 154-159

utils/embedding_adapter.py (2)

110-113: Preserve tracebacks with logger.exception and re-raise

Improves debuggability; avoid wrapping with broad Exception.

-        except ImportError as e:
-            logger.error(f"LangChain导入失败: {e}")
-            raise Exception(f"不支持的嵌入模型配置: {self.config}")
+        except ImportError:
+            logger.exception("LangChain导入失败")
+            raise
...
-        except Exception as e:
-            logger.error(f"嵌入失败: {e}")
-            raise Exception(f"嵌入失败: {str(e)}")
+        except Exception:
+            logger.exception("嵌入失败")
+            raise
...
-        except Exception as e:
-            logger.error(f"批量嵌入失败: {e}")
-            raise Exception(f"批量嵌入失败: {str(e)}")
+        except Exception:
+            logger.exception("批量嵌入失败")
+            raise
...
-        except Exception as e:
-            logger.error(f"直接嵌入失败: {e}")
-            raise Exception(f"嵌入失败: {str(e)}")
+        except Exception:
+            logger.exception("直接嵌入失败")
+            raise

Also applies to: 118-121, 126-129, 191-194


176-190: Reuse a single ClientSession for performance

Creating a session per request is expensive; consider a per-adapter session lifecycle.

evaluators/ragas_evaluator.py (3)

30-37: Respect configured timeout

Use BaseEvaluator.timeout instead of a hardcoded 60s.

-            self.eval_llm = LangchainLLMWrapper(ChatOpenAI(
+            self.eval_llm = LangchainLLMWrapper(ChatOpenAI(
                 api_key=config["api_key"],
                 base_url=config["base_url"],
                 model=config.get("model", "gpt-3.5-turbo"),
                 temperature=0,
                 max_tokens=1000,
-                timeout=60  # 增加超时时间
+                timeout=self.timeout
             ))

86-94: Type hints: make context(s) Optional to match usage

Prevents implicit-Optional violations and matches Base change.

-    async def evaluate_single_answer_async(self, question: str, answer: str, ground_truth: str, context: List[str] = None) -> Dict[str, float]:
+    async def evaluate_single_answer_async(self, question: str, answer: str, ground_truth: str,
+                                           context: Optional[List[str]] = None) -> Dict[str, Optional[float]]:
...
-    async def evaluate_answers_async(self, questions: List[str], 
-                                  answers: List[str], ground_truths: List[str], contexts: List[List[str]] = None) -> Dict[str, List[float]]:
+    async def evaluate_answers_async(self, questions: List[str],
+                                     answers: List[str], ground_truths: List[str],
+                                     contexts: Optional[List[List[str]]] = None) -> Dict[str, List[Optional[float]]]:

Also applies to: 167-169


71-85: Prefer logger over print for structured logs

Unify on logging like BaseEvaluator for consistency and level control.

Also applies to: 98-101, 155-165, 251-272

evaluators/academic_evaluator.py (5)

1-1: Comment formatting: Replace fullwidth characters with standard ASCII.

The comment contains fullwidth characters that should be replaced with standard ASCII equivalents.

-# 增强学术评估器 - 合并学术和混合模型优势
+# Enhanced academic evaluator - combines academic and hybrid model advantages

43-49: Avoid catching bare Exception.

Catching bare Exception can mask unexpected errors. Be more specific about the exceptions you expect.

 try:
     self.embedding_adapter = EmbeddingAdapterFactory.create_adapter(embedding_config)
     print(f"✅ 通用嵌入适配器初始化成功: {embedding_config['model']}")
-except Exception as e:
+except (ValueError, ConnectionError, KeyError) as e:
     print(f"⚠️  嵌入适配器初始化失败,将使用文本相似度: {e}")
     self.embedding_adapter = None

50-50: Fix fullwidth characters in comments.

Multiple fullwidth characters detected in the comment that should be replaced.

-# 评估模式:pure_chat(纯聊天模型)或 hybrid(混合模式)
+# 评估模式: pure_chat(纯聊天模型)或 hybrid(混合模式)

230-230: Fix fullwidth parentheses in docstring.

The docstring contains fullwidth parentheses that should be standard ASCII.

-"""使用嵌入模型计算语义相似度(混合模式用)- 使用通用适配器"""
+"""使用嵌入模型计算语义相似度(混合模式用) - 使用通用适配器"""

233-237: Fix fullwidth comma in string and comment.

Replace fullwidth comma with standard ASCII comma.

-# 如果没有嵌入适配器,直接使用文本相似度
+# 如果没有嵌入适配器,直接使用文本相似度
 if not self.embedding_adapter:
-    print("🔍 嵌入适配器不可用,使用文本相似度")
+    print("🔍 嵌入适配器不可用,使用文本相似度")
     return self._calculate_text_similarity(answer, ground_truth)
evaluators/__init__.py (2)

1-1: Fix fullwidth parentheses in comment.

Replace fullwidth parentheses with ASCII equivalents.

-# 评估器模块 - 统一接口 (全部使用异步版本)
+# 评估器模块 - 统一接口 (全部使用异步版本)

8-8: Improve comment accuracy.

The comment states all evaluators support async API, but based on the code, the evaluators also provide sync methods for backward compatibility.

-# 注意: 所有评估器都支持异步API
+# 注意: 所有评估器都支持异步API,并提供同步方法以保持向后兼容
evaluators/factory.py (9)

16-19: Add ClassVar annotation for mutable class attributes.

Mutable class attributes should be annotated with typing.ClassVar to indicate they're shared across all instances.

+from typing import Dict, List, Any, Optional, ClassVar
 from .base import BaseEvaluator

 class EvaluatorFactory:
     """评估器工厂类"""
     
     # 可用的评估器类型
-    EVALUATOR_TYPES = {
+    EVALUATOR_TYPES: ClassVar[Dict[str, type[BaseEvaluator]]] = {
         "academic": AcademicEvaluator,
         "ragas": RagasEvaluator
     }

22-22: Add ClassVar annotation for DEFAULT_PRIORITY.

Add type annotation for the mutable class attribute.

 # 默认评估器优先级
-DEFAULT_PRIORITY = ["ragas", "academic"]
+DEFAULT_PRIORITY: ClassVar[List[str]] = ["ragas", "academic"]

39-41: Avoid catching bare Exception.

Be more specific about the exceptions you expect to handle.

-except Exception as e:
+except (ValueError, AttributeError, TypeError) as e:
     print(f"❌ {evaluator_type}评估器创建失败: {e}")
     return None

100-101: Fix fullwidth parentheses in strings.

Replace fullwidth parentheses with ASCII equivalents.

-"academic": "增强学术评估器 - 支持6维度质量评估(相关性、正确性、完整性、清晰度、连贯性、有用性)",
-"ragas": "Ragas框架评估器 - 完整的RAG评估指标集(相关性、正确性、忠实性、上下文精度、上下文召回率)"
+"academic": "增强学术评估器 - 支持6维度质量评估(相关性、正确性、完整性、清晰度、连贯性、有用性)",
+"ragas": "Ragas框架评估器 - 完整的RAG评估指标集(相关性、正确性、忠实性、上下文精度、上下文召回率)"

115-115: Remove unnecessary f-string prefix.

The string doesn't contain any placeholders.

-print(f"🔧 评估器管理器初始化完成")
+print("🔧 评估器管理器初始化完成")

128-128: Fix fullwidth characters in comment.

Replace fullwidth colon and comma with ASCII equivalents.

-"evaluation_mode": "hybrid"  # 使用混合模式:embedding计算相关性,聊天模型评估质量
+"evaluation_mode": "hybrid"  # 使用混合模式: embedding计算相关性,聊天模型评估质量

152-152: Remove unnecessary f-string prefix.

The string doesn't contain any placeholders.

-print(f"    ✅ 完成")
+print("    ✅ 完成")

153-153: Avoid catching bare Exception.

Be more specific about expected exceptions.

-except Exception as e:
+except (asyncio.TimeoutError, ValueError, AttributeError) as e:
     print(f"    ❌ 失败: {e}")

139-139: Add explicit Optional annotation.

PEP 484 prohibits implicit Optional. Add explicit type annotation.

-async def evaluate_all_async(self, questions: List[str], answers: List[str], 
-                           ground_truths: List[str], contexts: List[List[str]] = None) -> Dict[str, Dict[str, List[float]]]:
+async def evaluate_all_async(self, questions: List[str], answers: List[str], 
+                           ground_truths: List[str], contexts: Optional[List[List[str]]] = None) -> Dict[str, Dict[str, List[float]]]:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 967af91 and 2d675c4.

⛔ Files ignored due to path filters (1)
  • results/multi_evaluation_results.csv is excluded by !**/*.csv
📒 Files selected for processing (14)
  • config.py (1 hunks)
  • connectors/async_base.py (0 hunks)
  • connectors/base.py (1 hunks)
  • connectors/dify.py (3 hunks)
  • connectors/ragflow.py (1 hunks)
  • data/test_cases_jp.json (1 hunks)
  • evaluators/__init__.py (1 hunks)
  • evaluators/academic_evaluator.py (5 hunks)
  • evaluators/base.py (1 hunks)
  • evaluators/factory.py (3 hunks)
  • evaluators/ragas_evaluator.py (6 hunks)
  • main_multi_eval_async.py (1 hunks)
  • results/detailed_evaluation_results.json (2 hunks)
  • utils/embedding_adapter.py (1 hunks)
💤 Files with no reviewable changes (1)
  • connectors/async_base.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • main_multi_eval_async.py
🧰 Additional context used
🧬 Code graph analysis (8)
connectors/ragflow.py (1)
connectors/base.py (1)
  • AsyncBaseRAGConnector (10-152)
connectors/dify.py (1)
connectors/base.py (1)
  • AsyncBaseRAGConnector (10-152)
evaluators/academic_evaluator.py (2)
evaluators/base.py (1)
  • BaseEvaluator (10-184)
utils/embedding_adapter.py (6)
  • EmbeddingAdapterFactory (201-229)
  • detect_embedding_config (240-265)
  • create_adapter (205-214)
  • embed_query (29-31)
  • embed_query (114-120)
  • embed_query (167-193)
connectors/base.py (4)
connectors/dify.py (5)
  • query_async (115-141)
  • test_connection_async (143-150)
  • validate_config (18-25)
  • build_request (27-48)
  • parse_response (152-169)
connectors/ragflow.py (5)
  • query_async (57-83)
  • test_connection_async (85-92)
  • validate_config (14-21)
  • build_request (23-36)
  • parse_response (94-109)
connectors/universal.py (4)
  • query_async (59-83)
  • test_connection_async (134-141)
  • query_with_timeout (85-105)
  • get_system_info (161-163)
connectors/factory.py (1)
  • get_system_info (56-69)
evaluators/ragas_evaluator.py (2)
evaluators/base.py (2)
  • BaseEvaluator (10-184)
  • evaluate_answers_async (29-43)
evaluators/academic_evaluator.py (1)
  • evaluate_answers_async (59-105)
evaluators/__init__.py (4)
evaluators/base.py (1)
  • BaseEvaluator (10-184)
evaluators/academic_evaluator.py (1)
  • AcademicEvaluator (18-503)
evaluators/ragas_evaluator.py (1)
  • RagasEvaluator (21-290)
evaluators/factory.py (2)
  • EvaluatorFactory (12-103)
  • EvaluatorManager (105-173)
evaluators/base.py (3)
evaluators/academic_evaluator.py (3)
  • evaluate_answers_async (59-105)
  • evaluate_single_answer_async (107-121)
  • get_supported_metrics (481-487)
evaluators/ragas_evaluator.py (5)
  • evaluate_answers_async (167-272)
  • evaluate_single_answer_async (86-100)
  • get_supported_metrics (274-276)
  • is_available (278-280)
  • get_evaluator_info (282-290)
evaluators/factory.py (1)
  • get_evaluator_info (67-94)
evaluators/factory.py (3)
evaluators/base.py (4)
  • BaseEvaluator (10-184)
  • get_evaluator_info (171-184)
  • evaluate_with_timeout (62-94)
  • get_supported_metrics (153-160)
evaluators/academic_evaluator.py (2)
  • AcademicEvaluator (18-503)
  • get_supported_metrics (481-487)
evaluators/ragas_evaluator.py (3)
  • RagasEvaluator (21-290)
  • get_evaluator_info (282-290)
  • get_supported_metrics (274-276)
🪛 Ruff (0.12.2)
connectors/dify.py

67-67: Abstract raise to an inner function

(TRY301)


67-67: Create your own exception

(TRY002)


67-67: Avoid specifying long messages outside the exception class

(TRY003)


69-69: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


69-69: Create your own exception

(TRY002)


69-69: Avoid specifying long messages outside the exception class

(TRY003)


70-70: Do not catch blind exception: Exception

(BLE001)


71-71: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


71-71: Create your own exception

(TRY002)


71-71: Avoid specifying long messages outside the exception class

(TRY003)


71-71: Use explicit conversion flag

Replace with conversion flag

(RUF010)


91-91: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


102-102: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


108-108: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


113-113: Create your own exception

(TRY002)

utils/embedding_adapter.py

47-47: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


48-48: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


48-48: Create your own exception

(TRY002)


48-48: Avoid specifying long messages outside the exception class

(TRY003)


64-64: Comment contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF003)


64-64: Comment contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF003)


111-111: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


112-112: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


112-112: Create your own exception

(TRY002)


112-112: Avoid specifying long messages outside the exception class

(TRY003)


118-118: Do not catch blind exception: Exception

(BLE001)


119-119: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


120-120: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


120-120: Create your own exception

(TRY002)


120-120: Avoid specifying long messages outside the exception class

(TRY003)


120-120: Use explicit conversion flag

Replace with conversion flag

(RUF010)


126-126: Do not catch blind exception: Exception

(BLE001)


127-127: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


128-128: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


128-128: Create your own exception

(TRY002)


128-128: Avoid specifying long messages outside the exception class

(TRY003)


128-128: Use explicit conversion flag

Replace with conversion flag

(RUF010)


190-190: Abstract raise to an inner function

(TRY301)


190-190: Create your own exception

(TRY002)


190-190: Avoid specifying long messages outside the exception class

(TRY003)


191-191: Do not catch blind exception: Exception

(BLE001)


192-192: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


193-193: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


193-193: Create your own exception

(TRY002)


193-193: Avoid specifying long messages outside the exception class

(TRY003)


193-193: Use explicit conversion flag

Replace with conversion flag

(RUF010)


210-210: Do not catch blind exception: Exception

(BLE001)


211-211: String contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)


225-225: f-string without any placeholders

Remove extraneous f prefix

(F541)


227-227: Do not catch blind exception: Exception

(BLE001)


228-228: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

evaluators/academic_evaluator.py

46-46: Do not catch blind exception: Exception

(BLE001)


47-47: String contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)


50-50: Comment contains ambiguous (FULLWIDTH COLON). Did you mean : (COLON)?

(RUF003)


50-50: Comment contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF003)


50-50: Comment contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF003)


50-50: Comment contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF003)


50-50: Comment contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF003)


55-55: Do not catch blind exception: Exception

(BLE001)


230-230: Docstring contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF002)


230-230: Docstring contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF002)


233-233: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


235-235: String contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)

connectors/base.py

57-57: Docstring contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF002)


57-57: Docstring contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF002)


70-70: Consider moving this statement to an else block

(TRY300)


72-72: String contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF001)


72-72: String contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF001)


75-75: Do not catch blind exception: Exception

(BLE001)


76-76: Use explicit conversion flag

Replace with conversion flag

(RUF010)


77-77: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


85-85: Docstring contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF002)


85-85: Docstring contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF002)


98-98: String contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF001)


98-98: String contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF001)


100-100: Do not catch blind exception: Exception

(BLE001)


101-101: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


101-101: Use explicit conversion flag

Replace with conversion flag

(RUF010)


110-110: Docstring contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF002)

evaluators/ragas_evaluator.py

98-98: Do not catch blind exception: Exception

(BLE001)


102-102: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


117-117: Comment contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF003)


117-117: Comment contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF003)


168-168: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


186-186: Comment contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF003)


186-186: Comment contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF003)


205-205: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


244-244: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


251-251: String contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)


253-253: Do not catch blind exception: Exception

(BLE001)


262-268: Consider moving this statement to an else block

(TRY300)


270-270: Do not catch blind exception: Exception

(BLE001)

evaluators/base.py

30-30: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


38-38: Docstring contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF002)


38-38: Docstring contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF002)


47-47: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


55-55: Docstring contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF002)


55-55: Docstring contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF002)


63-63: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


72-72: Docstring contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF002)


72-72: Docstring contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF002)


73-73: Docstring contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF002)


73-73: Docstring contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF002)


85-85: Consider moving this statement to an else block

(TRY300)


87-87: String contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF001)


87-87: String contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF001)


91-91: Do not catch blind exception: Exception

(BLE001)


92-92: Use explicit conversion flag

Replace with conversion flag

(RUF010)


93-93: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


97-97: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


106-106: Docstring contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF002)


106-106: Docstring contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF002)


107-107: Docstring contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF002)


107-107: Docstring contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF002)


119-119: Consider moving this statement to an else block

(TRY300)


121-121: String contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF001)


121-121: String contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF001)


124-124: Do not catch blind exception: Exception

(BLE001)


125-125: Use explicit conversion flag

Replace with conversion flag

(RUF010)


126-126: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

evaluators/factory.py

16-19: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


22-22: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


28-28: Avoid specifying long messages outside the exception class

(TRY003)


39-39: Do not catch blind exception: Exception

(BLE001)


100-100: String contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF001)


100-100: String contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF001)


101-101: String contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF001)


101-101: String contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF001)


115-115: f-string without any placeholders

Remove extraneous f prefix

(F541)


128-128: Comment contains ambiguous (FULLWIDTH COLON). Did you mean : (COLON)?

(RUF003)


128-128: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


139-139: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


152-152: f-string without any placeholders

Remove extraneous f prefix

(F541)


153-153: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (4)
connectors/dify.py (2)

7-7: Import path consolidation looks correct.

Aligned with base class relocation.


14-17: Constructor wiring LGTM.

System name and config passed to base correctly.

results/detailed_evaluation_results.json (1)

16-16: Verify device facts; remove inaccurate claims.

iPhone 13 Pro Max does not ship with Always‑On Display or Micro‑LED. Please correct or drop that bullet to avoid penalizing models incorrectly.

-その他の特徴として、Always-Onディスプレイ、Micro-LEDディスプレイ技術、およびAR機能強化が含まれます。
+その他の特徴として、AR機能の活用が挙げられます。

Run a quick check and adjust OS-upgrade wording (avoid pinning to a specific current iOS in static data).

evaluators/ragas_evaluator.py (1)

4-15: Import aevaluate when available

If using ragas>=0.2 with aevaluate, prefer it over to_thread for clarity.

Would you like me to switch to aevaluate if present and fall back to to_thread otherwise?

Comment on lines 147 to 152
return {
"name": self.system_name,
"description": self.__class__.__doc__ or "",
"config_valid": len(self.validate_config()) == 0
}
"system_name": self.system_name,
"timeout": self.timeout,
"config": {k: v for k, v in self.config.items() if k != 'api_key'}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Harden secret redaction in get_system_info.

Only removing api_key risks leaking other secrets (token, password, auth headers). Redact a broader set of sensitive keys.

-        return {
-            "system_name": self.system_name,
-            "timeout": self.timeout,
-            "config": {k: v for k, v in self.config.items() if k != 'api_key'}
-        }
+        SENSITIVE_KEYS = ("api_key", "key", "token", "secret", "password", "authorization", "cookie")
+        return {
+            "system_name": self.system_name,
+            "timeout": self.timeout,
+            "config": {
+                k: ("***" if any(s in k.lower() for s in SENSITIVE_KEYS) else v)
+                for k, v in self.config.items()
+            },
+        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return {
"name": self.system_name,
"description": self.__class__.__doc__ or "",
"config_valid": len(self.validate_config()) == 0
}
\ No newline at end of file
"system_name": self.system_name,
"timeout": self.timeout,
"config": {k: v for k, v in self.config.items() if k != 'api_key'}
}
SENSITIVE_KEYS = ("api_key", "key", "token", "secret", "password", "authorization", "cookie")
return {
"system_name": self.system_name,
"timeout": self.timeout,
"config": {
k: ("***" if any(s in k.lower() for s in SENSITIVE_KEYS) else v)
for k, v in self.config.items()
},
}
🤖 Prompt for AI Agents
In connectors/base.py around lines 148 to 152, get_system_info currently only
filters out the 'api_key' key which can leak other secrets; update the logic to
redact a predefined set of sensitive keys (e.g., api_key, apiToken, token,
password, secret, auth, authorization, bearer) in a case-insensitive way and
return those keys with their values replaced by a constant like "REDACTED"
(rather than omitting them) so the shape remains predictable; ensure the filter
handles nested dicts at the top level only and does not modify non-sensitive
entries.

Comment on lines +73 to 114
async def _parse_streaming_response(self, response) -> Dict[str, Any]:
"""解析streaming响应"""
import json

message_events = []
message_end_result = None

async for line in response.content:
line = line.decode('utf-8').strip()
if line.startswith('data: '):
try:
data = json.loads(line[6:]) # 移除 'data: ' 前缀
event_type = data.get("event")

if event_type == "message":
message_events.append(data)
elif event_type == "message_end":
message_end_result = data
break # 找到message_end后停止,这是最完整的响应

except json.JSONDecodeError:
continue

# 合并所有message事件的答案
full_answer = ""
for msg in message_events:
if "answer" in msg:
full_answer += msg["answer"]

# 使用message_end作为基础,它包含完整的metadata
if message_end_result:
final_result = message_end_result
final_result["answer"] = full_answer
return final_result
elif message_events:
# 如果没有message_end,使用最后一个message事件
last_message = message_events[-1]
last_message["answer"] = full_answer
return last_message
else:
raise Exception("无法从streaming响应中解析出有效消息")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

SSE parser is chunk-fragile; switch to line-based read and handle [DONE]/error events.

Iterating response.content yields arbitrary chunks, so startswith('data: ') can break across boundaries.

-    async def _parse_streaming_response(self, response) -> Dict[str, Any]:
-        """解析streaming响应"""
-        import json
-        
-        message_events = []
-        message_end_result = None
-        
-        async for line in response.content:
-            line = line.decode('utf-8').strip()
-            if line.startswith('data: '):
-                try:
-                    data = json.loads(line[6:])  # 移除 'data: ' 前缀
-                    event_type = data.get("event")
-                    
-                    if event_type == "message":
-                        message_events.append(data)
-                    elif event_type == "message_end":
-                        message_end_result = data
-                        break  # 找到message_end后停止,这是最完整的响应
-                        
-                except json.JSONDecodeError:
-                    continue
-        
-        # 合并所有message事件的答案
-        full_answer = ""
-        for msg in message_events:
-            if "answer" in msg:
-                full_answer += msg["answer"]
-        
-        # 使用message_end作为基础,它包含完整的metadata
-        if message_end_result:
-            final_result = message_end_result
-            final_result["answer"] = full_answer
-            return final_result
-        elif message_events:
-            # 如果没有message_end,使用最后一个message事件
-            last_message = message_events[-1]
-            last_message["answer"] = full_answer
-            return last_message
-        else:
-            raise Exception("无法从streaming响应中解析出有效消息")
+    async def _parse_streaming_response(self, response) -> Dict[str, Any]:
+        """解析streaming响应(逐行SSE解析)"""
+        import json
+
+        full_answer_parts: List[str] = []
+        message_end_result: Dict[str, Any] | None = None
+
+        while True:
+            raw = await response.content.readline()
+            if not raw:
+                break
+            line = raw.decode("utf-8", errors="ignore").strip()
+            if not line or line.startswith(":"):
+                continue
+            if not line.startswith("data:"):
+                continue
+            payload = line[5:].strip()
+            if payload in ("", "[DONE]"):
+                continue
+            try:
+                data = json.loads(payload)
+            except json.JSONDecodeError:
+                continue
+
+            event_type = data.get("event")
+            if event_type == "message":
+                ans = data.get("answer") or data.get("message", {}).get("content", "")
+                if ans:
+                    full_answer_parts.append(ans)
+            elif event_type == "message_end":
+                message_end_result = data
+                break
+            elif event_type == "error":
+                raise DifyAPIError(f"Streaming error: {data.get('error') or data!s}")
+
+        full_answer = "".join(full_answer_parts)
+        if message_end_result:
+            final = dict(message_end_result)
+            final["answer"] = full_answer
+            return final
+        if full_answer:
+            return {"answer": full_answer}
+        raise DifyAPIError("无法从streaming响应中解析出有效消息")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def _parse_streaming_response(self, response) -> Dict[str, Any]:
"""解析streaming响应"""
import json
message_events = []
message_end_result = None
async for line in response.content:
line = line.decode('utf-8').strip()
if line.startswith('data: '):
try:
data = json.loads(line[6:]) # 移除 'data: ' 前缀
event_type = data.get("event")
if event_type == "message":
message_events.append(data)
elif event_type == "message_end":
message_end_result = data
break # 找到message_end后停止,这是最完整的响应
except json.JSONDecodeError:
continue
# 合并所有message事件的答案
full_answer = ""
for msg in message_events:
if "answer" in msg:
full_answer += msg["answer"]
# 使用message_end作为基础,它包含完整的metadata
if message_end_result:
final_result = message_end_result
final_result["answer"] = full_answer
return final_result
elif message_events:
# 如果没有message_end,使用最后一个message事件
last_message = message_events[-1]
last_message["answer"] = full_answer
return last_message
else:
raise Exception("无法从streaming响应中解析出有效消息")
async def _parse_streaming_response(self, response) -> Dict[str, Any]:
"""解析streaming响应(逐行SSE解析)"""
import json
full_answer_parts: List[str] = []
message_end_result: Dict[str, Any] | None = None
while True:
raw = await response.content.readline()
if not raw:
break
line = raw.decode("utf-8", errors="ignore").strip()
if not line or line.startswith(":"):
continue
if not line.startswith("data:"):
continue
payload = line[5:].strip()
if payload in ("", "[DONE]"):
continue
try:
data = json.loads(payload)
except json.JSONDecodeError:
continue
event_type = data.get("event")
if event_type == "message":
ans = data.get("answer") or data.get("message", {}).get("content", "")
if ans:
full_answer_parts.append(ans)
elif event_type == "message_end":
message_end_result = data
break
elif event_type == "error":
raise DifyAPIError(f"Streaming error: {data.get('error') or data!s}")
full_answer = "".join(full_answer_parts)
if message_end_result:
final = dict(message_end_result)
final["answer"] = full_answer
return final
if full_answer:
return {"answer": full_answer}
raise DifyAPIError("无法从streaming响应中解析出有效消息")
🧰 Tools
🪛 Ruff (0.12.2)

91-91: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


102-102: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


108-108: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


113-113: Create your own exception

(TRY002)

Comment on lines +10 to +13
{
"question": "iPhone 13 Pro Maxのスペックを教えてください iPhone 13 Pro Maxのスペックを教えてください ",
"ground_truth": "iPhone 13 Pro Maxの主な仕様は以下の通りです:\n\n- **ディスプレイ**: 6.7インチ Super Retina XDR(解像度2778×1284ピクセル) \n- **プロセッサ**: A15 Bionicチップ \n- **RAM**: 6GB \n- **ストレージ**: 128GB / 256GB / 512GB / 1TB(日本モデルは一部容量が異なる場合あり) \n- **カメラ**: \n - トリプル12MPシステム(超広角・広角・望遠) \n - ライダー(LiDAR)スキャナー搭載 \n- **バッテリー**: 約21時間の動画再生(再生時間は使用状況により変化) \n- **通信**: 5G対応、Wi-Fi 6 \n- **OS**: 発売時はiOS 15を搭載 \n\nアプリ開発において特に重要なのは画面解像度やカメラ機能、5G対応の有無です。日本向けアプリの場合、キャリアや地域固有の仕様(例: 防水性能)も考慮が必要な場合があります。何か特定の機能や開発上の懸念があれば、詳しく教えてください!"
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Fix duplicated question text and incorrect battery spec in JP test case.

  • Question string is duplicated and has a trailing space.
  • iPhone 13 Pro Max video playback is up to 28 hours (not ~21). Ambiguous note about JP storage variants is likely incorrect.
-    {
-        "question": "iPhone 13 Pro Maxのスペックを教えてください iPhone 13 Pro Maxのスペックを教えてください ",
-        "ground_truth": "iPhone 13 Pro Maxの主な仕様は以下の通りです:\n\n- **ディスプレイ**: 6.7インチ Super Retina XDR(解像度2778×1284ピクセル)  \n- **プロセッサ**: A15 Bionicチップ  \n- **RAM**: 6GB  \n- **ストレージ**: 128GB / 256GB / 512GB / 1TB(日本モデルは一部容量が異なる場合あり)  \n- **カメラ**:  \n  - トリプル12MPシステム(超広角・広角・望遠)  \n  - ライダー(LiDAR)スキャナー搭載  \n- **バッテリー**: 約21時間の動画再生(再生時間は使用状況により変化)  \n- **通信**: 5G対応、Wi-Fi 6  \n- **OS**: 発売時はiOS 15を搭載  \n\nアプリ開発において特に重要なのは画面解像度やカメラ機能、5G対応の有無です。日本向けアプリの場合、キャリアや地域固有の仕様(例: 防水性能)も考慮が必要な場合があります。何か特定の機能や開発上の懸念があれば、詳しく教えてください!"
-    },
+    {
+        "question": "iPhone 13 Pro Maxのスペックを教えてください",
+        "ground_truth": "iPhone 13 Pro Maxの主な仕様は以下の通りです:\n\n- **ディスプレイ**: 6.7インチ Super Retina XDR(2778×1284)\n- **プロセッサ**: A15 Bionic\n- **RAM**: 6GB\n- **ストレージ**: 128GB / 256GB / 512GB / 1TB\n- **カメラ**: トリプル12MP(超広角・広角・望遠)+LiDARスキャナ\n- **バッテリー**: 最大28時間のビデオ再生(使用状況により変動)\n- **通信**: 5G、Wi‑Fi 6\n- **OS**: 発売時はiOS 15\n\n開発観点では画面解像度、ProMotion(120Hz)、カメラ機能、5G対応が重要です。特定の要件があれば教えてください。"
+    },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{
"question": "iPhone 13 Pro Maxのスペックを教えてください iPhone 13 Pro Maxのスペックを教えてください ",
"ground_truth": "iPhone 13 Pro Maxの主な仕様は以下の通りです:\n\n- **ディスプレイ**: 6.7インチ Super Retina XDR(解像度2778×1284ピクセル) \n- **プロセッサ**: A15 Bionicチップ \n- **RAM**: 6GB \n- **ストレージ**: 128GB / 256GB / 512GB / 1TB(日本モデルは一部容量が異なる場合あり) \n- **カメラ**: \n - トリプル12MPシステム(超広角・広角・望遠) \n - ライダー(LiDAR)スキャナー搭載 \n- **バッテリー**: 約21時間の動画再生(再生時間は使用状況により変化) \n- **通信**: 5G対応、Wi-Fi 6 \n- **OS**: 発売時はiOS 15を搭載 \n\nアプリ開発において特に重要なのは画面解像度やカメラ機能、5G対応の有無です。日本向けアプリの場合、キャリアや地域固有の仕様(例: 防水性能)も考慮が必要な場合があります。何か特定の機能や開発上の懸念があれば、詳しく教えてください!"
},
{
"question": "iPhone 13 Pro Maxのスペックを教えてください",
"ground_truth": "iPhone 13 Pro Maxの主な仕様は以下の通りです:\n\n- **ディスプレイ**: 6.7インチ Super Retina XDR(2778×1284)\n- **プロセッサ**: A15 Bionic\n- **RAM**: 6GB\n- **ストレージ**: 128GB / 256GB / 512GB / 1TB\n- **カメラ**: トリプル12MP(超広角・広角・望遠)+LiDARスキャナ\n- **バッテリー**: 最大28時間のビデオ再生(使用状況により変動)\n- **通信**: 5G、Wi-Fi 6\n- **OS**: 発売時はiOS 15\n\n開発観点では画面解像度、ProMotion(120Hz)、カメラ機能、5G対応が重要です。特定の要件があれば教えてください。"
},
🤖 Prompt for AI Agents
In data/test_cases_jp.json around lines 10 to 13, the "question" value repeats
the same text and includes an extra trailing space, and the "ground_truth" has
an incorrect battery playback spec and an unnecessary/ambiguous note about
Japanese storage variants; fix by keeping a single non-duplicated question
string with no trailing space, update the battery line to state video playback
up to 28 hours, and remove or rephrase the parenthetical about Japan-specific
storage variants so the spec block is accurate and unambiguous.

Comment on lines +6 to +9
import sys
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from utils.embedding_adapter import EmbeddingAdapterFactory, detect_embedding_config
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove unnecessary sys.path manipulation.

Adding parent directory to sys.path is an anti-pattern that can cause import issues. The import should work without path manipulation since utils is at the repository root.

-import sys
-import os
-sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-from utils.embedding_adapter import EmbeddingAdapterFactory, detect_embedding_config
+from utils.embedding_adapter import EmbeddingAdapterFactory, detect_embedding_config
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import sys
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from utils.embedding_adapter import EmbeddingAdapterFactory, detect_embedding_config
from utils.embedding_adapter import EmbeddingAdapterFactory, detect_embedding_config
🤖 Prompt for AI Agents
In evaluators/academic_evaluator.py around lines 6 to 9, remove the sys.path
manipulation (sys.path.append(...)) and its os import; simply rely on the
repository root being on PYTHONPATH so the line "from utils.embedding_adapter
import EmbeddingAdapterFactory, detect_embedding_config" works. Delete the
unnecessary imports and sys.path modification, ensure any test or runtime
entrypoint sets the project root on PYTHONPATH or uses a proper package entry
(e.g., run as a module) so imports resolve without manually mutating sys.path.

Comment on lines +29 to 31
async def evaluate_answers_async(self, questions: List[str], answers: List[str],
ground_truths: List[str], contexts: List[List[str]] = None) -> Dict[str, List[float]]:
"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Fix typing: use Optional for contexts and allow None scores

Align with callers that return None on failure/unavailable. Update signatures to Optional and return types to include Optional[float].

-    async def evaluate_answers_async(self, questions: List[str], answers: List[str], 
-                                  ground_truths: List[str], contexts: List[List[str]] = None) -> Dict[str, List[float]]:
+    async def evaluate_answers_async(self, questions: List[str], answers: List[str],
+                                     ground_truths: List[str],
+                                     contexts: Optional[List[List[str]]] = None
+                                     ) -> Dict[str, List[Optional[float]]]:
...
-    async def evaluate_single_answer_async(self, question: str, answer: str, 
-                                        ground_truth: str, context: List[str] = None) -> Dict[str, float]:
+    async def evaluate_single_answer_async(self, question: str, answer: str,
+                                           ground_truth: str, context: Optional[List[str]] = None
+                                           ) -> Dict[str, Optional[float]]:
...
-    async def evaluate_with_timeout(self, questions: List[str], answers: List[str], 
-                                  ground_truths: List[str], contexts: List[List[str]] = None,
-                                  timeout: Optional[int] = None) -> Dict[str, List[float]]:
+    async def evaluate_with_timeout(self, questions: List[str], answers: List[str],
+                                    ground_truths: List[str], contexts: Optional[List[List[str]]] = None,
+                                    timeout: Optional[int] = None) -> Dict[str, List[Optional[float]]]:
...
-    async def evaluate_single_with_timeout(self, question: str, answer: str, 
-                                         ground_truth: str, context: List[str] = None,
-                                         timeout: Optional[int] = None) -> Dict[str, float]:
+    async def evaluate_single_with_timeout(self, question: str, answer: str,
+                                           ground_truth: str, context: Optional[List[str]] = None,
+                                           timeout: Optional[int] = None) -> Dict[str, Optional[float]]:

Also applies to: 46-47, 62-65, 96-99

🧰 Tools
🪛 Ruff (0.12.2)

30-30: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)

🤖 Prompt for AI Agents
In evaluators/base.py around lines 29-31, change the signature to use Optional
for the contexts parameter and allow None scores in the return type (e.g.,
contexts: Optional[List[List[str]]] = None and return Dict[str,
List[Optional[float]]]); import Optional from typing. Apply the same typing
changes to the other affected method signatures at lines 46-47, 62-65, and 96-99
so that any lists of scores are List[Optional[float]] and optional contexts use
Optional; update docstrings/comments to reflect that evaluators may return None
for individual scores.

Comment on lines +139 to +151
return {metric: [0.0] * count for metric in metrics}

def _get_default_single_score(self) -> Dict[str, float]:
"""
获取默认单个评分
Returns:
默认单个评分字典
"""
metrics = self.get_supported_metrics()
return {metric: 0.0 for metric in metrics}

@abstractmethod
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Default fallbacks should be “missing” (None), not 0.0

0.0 conflates “bad score” with “no score”. Return None consistently.

-        return {metric: [0.0] * count for metric in metrics}
+        return {metric: [None] * count for metric in metrics}
...
-        return {metric: 0.0 for metric in metrics}
+        return {metric: None for metric in metrics}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
metrics = self.get_supported_metrics()
return {metric: [0.0] * count for metric in metrics}
def _get_default_single_score(self) -> Dict[str, float]:
"""
获取默认单个评分
Returns:
默认单个评分字典
"""
metrics = self.get_supported_metrics()
return {metric: 0.0 for metric in metrics}
metrics = self.get_supported_metrics()
return {metric: [None] * count for metric in metrics}
def _get_default_single_score(self) -> Dict[str, float]:
"""
获取默认单个评分
Returns:
默认单个评分字典
"""
metrics = self.get_supported_metrics()
return {metric: None for metric in metrics}
🤖 Prompt for AI Agents
In evaluators/base.py around lines 139 to 151, the default fallbacks currently
return 0.0 which conflates “bad score” with “missing”; update the two functions
so they return None for missing scores instead of 0.0, adjust return type hints
to use Optional[float] (e.g., Dict[str, List[Optional[float]]] for the
multi-score method and Dict[str, Optional[float]] for the single-score method),
and add/import typing.Optional at the top of the module; keep function behavior
otherwise the same so callers receive None to represent missing scores.

Comment on lines 202 to 251
try:
scores_dict = result.scores

# scores_dict 是列表格式,每个元素是一个字典
if scores_dict and len(scores_dict) > 0:
for i, item_scores in enumerate(scores_dict):
if i < len(answers):
# Answer Relevancy
if 'answer_relevancy' in item_scores:
rel_score = item_scores['answer_relevancy']
relevancy_scores.append(float(rel_score) if rel_score is not None and not math.isnan(rel_score) else None)
else:
relevancy_scores.append(None)

# Answer Correctness
if 'answer_correctness' in item_scores:
cor_score = item_scores['answer_correctness']
correctness_scores.append(float(cor_score) if cor_score is not None and not math.isnan(cor_score) else None)
else:
correctness_scores.append(None)

# Faithfulness
if 'faithfulness' in item_scores:
faith_score = item_scores['faithfulness']
faithfulness_scores.append(float(faith_score) if faith_score is not None and not math.isnan(faith_score) else None)
else:
faithfulness_scores.append(None)

# Context Precision
if 'context_precision' in item_scores:
ctx_prec_score = item_scores['context_precision']
context_precision_scores.append(float(ctx_prec_score) if ctx_prec_score is not None and not math.isnan(ctx_prec_score) else None)
else:
context_precision_scores.append(None)

# Context Recall
if 'context_recall' in item_scores:
ctx_rec_score = item_scores['context_recall']
context_recall_scores.append(float(ctx_rec_score) if ctx_rec_score is not None and not math.isnan(ctx_rec_score) else None)
else:
context_recall_scores.append(None)
else:
# 如果没有足够的评估结果,填充None
relevancy_scores.append(None)
correctness_scores.append(None)
faithfulness_scores.append(None)
context_precision_scores.append(None)
context_recall_scores.append(None)

print(f" Ragas原生异步批量评估完成,处理了 {len(relevancy_scores)} 个样本")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure metric lists match len(answers) when ragas returns fewer rows

Currently you don’t pad trailing entries; consumers may break on length mismatch.

                 if scores_dict and len(scores_dict) > 0:
                     for i, item_scores in enumerate(scores_dict):
                         if i < len(answers):
                             ...
-                        else:
-                            # 如果没有足够的评估结果,填充None
-                            relevancy_scores.append(None)
-                            correctness_scores.append(None)
-                            faithfulness_scores.append(None)
-                            context_precision_scores.append(None)
-                            context_recall_scores.append(None)
+                    # 结果条目不足时补齐
+                    missing = len(answers) - len(relevancy_scores)
+                    if missing > 0:
+                        relevancy_scores.extend([None] * missing)
+                        correctness_scores.extend([None] * missing)
+                        faithfulness_scores.extend([None] * missing)
+                        context_precision_scores.extend([None] * missing)
+                        context_recall_scores.extend([None] * missing)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
scores_dict = result.scores
# scores_dict 是列表格式,每个元素是一个字典
if scores_dict and len(scores_dict) > 0:
for i, item_scores in enumerate(scores_dict):
if i < len(answers):
# Answer Relevancy
if 'answer_relevancy' in item_scores:
rel_score = item_scores['answer_relevancy']
relevancy_scores.append(float(rel_score) if rel_score is not None and not math.isnan(rel_score) else None)
else:
relevancy_scores.append(None)
# Answer Correctness
if 'answer_correctness' in item_scores:
cor_score = item_scores['answer_correctness']
correctness_scores.append(float(cor_score) if cor_score is not None and not math.isnan(cor_score) else None)
else:
correctness_scores.append(None)
# Faithfulness
if 'faithfulness' in item_scores:
faith_score = item_scores['faithfulness']
faithfulness_scores.append(float(faith_score) if faith_score is not None and not math.isnan(faith_score) else None)
else:
faithfulness_scores.append(None)
# Context Precision
if 'context_precision' in item_scores:
ctx_prec_score = item_scores['context_precision']
context_precision_scores.append(float(ctx_prec_score) if ctx_prec_score is not None and not math.isnan(ctx_prec_score) else None)
else:
context_precision_scores.append(None)
# Context Recall
if 'context_recall' in item_scores:
ctx_rec_score = item_scores['context_recall']
context_recall_scores.append(float(ctx_rec_score) if ctx_rec_score is not None and not math.isnan(ctx_rec_score) else None)
else:
context_recall_scores.append(None)
else:
# 如果没有足够的评估结果,填充None
relevancy_scores.append(None)
correctness_scores.append(None)
faithfulness_scores.append(None)
context_precision_scores.append(None)
context_recall_scores.append(None)
print(f" Ragas原生异步批量评估完成,处理了 {len(relevancy_scores)} 个样本")
try:
scores_dict = result.scores
# scores_dict 是列表格式,每个元素是一个字典
if scores_dict and len(scores_dict) > 0:
for i, item_scores in enumerate(scores_dict):
if i < len(answers):
# Answer Relevancy
if 'answer_relevancy' in item_scores:
rel_score = item_scores['answer_relevancy']
relevancy_scores.append(
float(rel_score) if rel_score is not None and not math.isnan(rel_score) else None
)
else:
relevancy_scores.append(None)
# Answer Correctness
if 'answer_correctness' in item_scores:
cor_score = item_scores['answer_correctness']
correctness_scores.append(
float(cor_score) if cor_score is not None and not math.isnan(cor_score) else None
)
else:
correctness_scores.append(None)
# Faithfulness
if 'faithfulness' in item_scores:
faith_score = item_scores['faithfulness']
faithfulness_scores.append(
float(faith_score) if faith_score is not None and not math.isnan(faith_score) else None
)
else:
faithfulness_scores.append(None)
# Context Precision
if 'context_precision' in item_scores:
ctx_prec_score = item_scores['context_precision']
context_precision_scores.append(
float(ctx_prec_score) if ctx_prec_score is not None and not math.isnan(ctx_prec_score) else None
)
else:
context_precision_scores.append(None)
# Context Recall
if 'context_recall' in item_scores:
ctx_rec_score = item_scores['context_recall']
context_recall_scores.append(
float(ctx_rec_score) if ctx_rec_score is not None and not math.isnan(ctx_rec_score) else None
)
else:
context_recall_scores.append(None)
# 结果条目不足时补齐
missing = len(answers) - len(relevancy_scores)
if missing > 0:
relevancy_scores.extend([None] * missing)
correctness_scores.extend([None] * missing)
faithfulness_scores.extend([None] * missing)
context_precision_scores.extend([None] * missing)
context_recall_scores.extend([None] * missing)
print(f" Ragas原生异步批量评估完成,处理了 {len(relevancy_scores)} 个样本")
🧰 Tools
🪛 Ruff (0.12.2)

205-205: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


244-244: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


251-251: String contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)

🤖 Prompt for AI Agents
In evaluators/ragas_evaluator.py around lines 202 to 251, the metric lists
(relevancy_scores, correctness_scores, faithfulness_scores,
context_precision_scores, context_recall_scores) can be shorter than
len(answers) when ragas returns fewer rows or no scores; ensure each metric list
is padded with trailing None entries so their lengths equal len(answers). After
the existing loop (and also when scores_dict is falsy), append None repeatedly
to each metric list until each list's length == len(answers) to guarantee
consumers won't break on length mismatch.

Comment on lines +146 to +165
def _get_api_format(self) -> Dict[str, Any]:
"""获取API格式配置"""
if self.provider == EmbeddingProvider.OLLAMA:
return {
"input_field": "prompt",
"response_path": ["embedding"],
"headers": {
"Content-Type": "application/json",
"Authorization": f"Bearer {self.config.get('api_key', '')}"
}
}
else:
return {
"input_field": "input",
"response_path": ["data", 0, "embedding"],
"headers": {
"Content-Type": "application/json",
"Authorization": f"Bearer {self.config.get('api_key', '')}"
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix Ollama endpoint; use provider-specific path

Ollama expects /api/embeddings. Current hardcoded /embeddings will 404.

     def _get_api_format(self) -> Dict[str, Any]:
         """获取API格式配置"""
         if self.provider == EmbeddingProvider.OLLAMA:
             return {
+                "path": "/api/embeddings",
                 "input_field": "prompt",
                 "response_path": ["embedding"],
                 "headers": {
                     "Content-Type": "application/json",
                     "Authorization": f"Bearer {self.config.get('api_key', '')}"
                 }
             }
         else:
             return {
+                "path": "/embeddings",
                 "input_field": "input",
                 "response_path": ["data", 0, "embedding"],
                 "headers": {
                     "Content-Type": "application/json",
                     "Authorization": f"Bearer {self.config.get('api_key', '')}"
                 }
             }
...
-        url = f"{self.config['base_url'].rstrip('/')}/embeddings"
+        url = f"{self.config['base_url'].rstrip('/')}{self.api_format['path']}"

Also applies to: 167-175

🤖 Prompt for AI Agents
In utils/embedding_adapter.py around lines 146-165 (and also check 167-175), the
Ollama configuration is still using the generic/hardcoded embeddings path which
causes 404s; update the adapter to use the Ollama-specific endpoint path
"/api/embeddings" when provider == EmbeddingProvider.OLLAMA and ensure other
providers keep their expected path (e.g., "/embeddings" or provider-specific as
appropriate), and adjust any code around lines 167-175 that constructs the
request URL to branch on provider and use "/api/embeddings" for Ollama.

Comment on lines +195 to +200
async def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""嵌入多个文本"""
# 并发处理多个文本
tasks = [self.embed_query(text) for text in texts]
return await asyncio.gather(*tasks, return_exceptions=True)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Do not return exceptions from embed_documents

Returning exceptions violates the return type and leaks errors downstream. Let gather raise.

-        tasks = [self.embed_query(text) for text in texts]
-        return await asyncio.gather(*tasks, return_exceptions=True)
+        tasks = [self.embed_query(text) for text in texts]
+        return await asyncio.gather(*tasks, return_exceptions=False)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""嵌入多个文本"""
# 并发处理多个文本
tasks = [self.embed_query(text) for text in texts]
return await asyncio.gather(*tasks, return_exceptions=True)
async def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""嵌入多个文本"""
# 并发处理多个文本
tasks = [self.embed_query(text) for text in texts]
return await asyncio.gather(*tasks, return_exceptions=False)
🤖 Prompt for AI Agents
In utils/embedding_adapter.py around lines 195 to 200, embed_documents currently
calls asyncio.gather(..., return_exceptions=True) which returns exceptions in
the result list and violates the declared return type; change the call to await
asyncio.gather(*tasks) (remove return_exceptions=True) so exceptions propagate
normally (or alternatively catch and re-raise specific errors if you need custom
handling) ensuring the function always returns List[List[float]] on success and
does not leak exceptions into the returned value.

- Replace all print() statements with structured logging calls
- Fix async blocking issue in Ragas evaluator using asyncio.to_thread()
- Improve error handling in embedding adapter for missing api_key
- Add proper logging imports to evaluator modules

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@boma086 boma086 merged commit e784549 into main Sep 7, 2025
1 of 2 checks passed
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
evaluators/academic_evaluator.py (3)

82-105: Batch result returns only 4 metrics but get_supported_metrics returns 6.

This will break consumers expecting coherence and helpfulness. Track and return both.

-        relevancy_scores = []
-        correctness_scores = []
-        completeness_scores = []
-        clarity_scores = []
+        relevancy_scores = []
+        correctness_scores = []
+        completeness_scores = []
+        clarity_scores = []
+        coherence_scores = []
+        helpfulness_scores = []
@@
-            if isinstance(result, Exception):
+            if isinstance(result, Exception):
                 logger.error(f"评估异常: {result}")
                 relevancy_scores.append(0.0)
                 correctness_scores.append(0.0)
                 completeness_scores.append(0.0)
-                clarity_scores.append(0.0)
+                clarity_scores.append(0.0)
+                coherence_scores.append(0.0)
+                helpfulness_scores.append(0.0)
             else:
                 relevancy_scores.append(result["relevancy"])
                 correctness_scores.append(result["correctness"])
                 completeness_scores.append(result["completeness"])
-                clarity_scores.append(result["clarity"])
+                clarity_scores.append(result["clarity"])
+                coherence_scores.append(result.get("coherence", 0.0))
+                helpfulness_scores.append(result.get("helpfulness", 0.0))
@@
-        return {
-            "relevancy": relevancy_scores,
-            "correctness": correctness_scores,
-            "completeness": completeness_scores,
-            "clarity": clarity_scores
-        }
+        return {
+            "relevancy": relevancy_scores,
+            "correctness": correctness_scores,
+            "completeness": completeness_scores,
+            "clarity": clarity_scores,
+            "coherence": coherence_scores,
+            "helpfulness": helpfulness_scores,
+        }

207-213: Fix f-string lint and guard base_url when posting to /chat/completions.

Avoid F541 and .rstrip() on None.

-                logger.debug(f"🔍 增强异步评估请求发送中...")
-                async with session.post(
-                    f"{self.config.get('chat_base_url', self.config.get('base_url')).rstrip('/')}/chat/completions",
+                logger.debug("🔍 增强异步评估请求发送中...")
+                base_url = (self.config.get("chat_base_url")
+                            or self.config.get("base_url")
+                            or "https://api.openai.com/v1").rstrip("/")
+                async with session.post(
+                    f"{base_url}/chat/completions",
                     headers=headers,
                     json=payload
                 ) as response:

312-316: Same base_url guard for quality assessment path.

-                async with session.post(
-                    f"{self.config.get('chat_base_url', self.config.get('base_url')).rstrip('/')}/chat/completions",
+                base_url = (self.config.get("chat_base_url")
+                            or self.config.get("base_url")
+                            or "https://api.openai.com/v1").rstrip("/")
+                async with session.post(
+                    f"{base_url}/chat/completions",
                     headers=headers,
                     json=payload
                 ) as response:
♻️ Duplicate comments (5)
evaluators/academic_evaluator.py (1)

6-9: Drop sys.path hacks; rely on package imports.

This is fragile and can mask import issues. Remove the path mutation and unused detect_embedding_config import.

-import sys
-import os
-sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-from utils.embedding_adapter import EmbeddingAdapterFactory, detect_embedding_config
+from utils.embedding_adapter import EmbeddingAdapterFactory
utils/embedding_adapter.py (2)

146-166: Ollama endpoint path is wrong; use provider-specific path and robust base_url.

Current hardcoded “/embeddings” 404s on Ollama and .rstrip() may hit None.

     def _get_api_format(self) -> Dict[str, Any]:
         """获取API格式配置"""
         if self.provider == EmbeddingProvider.OLLAMA:
             return {
+                "path": "/api/embeddings",
                 "input_field": "prompt",
                 "response_path": ["embedding"],
                 "headers": {
                     "Content-Type": "application/json",
                     "Authorization": f"Bearer {self.config.get('api_key', '')}"
                 }
             }
         else:
             return {
+                "path": "/embeddings",
                 "input_field": "input",
                 "response_path": ["data", 0, "embedding"],
                 "headers": {
                     "Content-Type": "application/json",
                     "Authorization": f"Bearer {self.config.get('api_key', '')}"
                 }
             }
@@
     async def embed_query(self, text: str) -> List[float]:
         """嵌入单个文本"""
-        url = f"{self.config['base_url'].rstrip('/')}/embeddings"
+        base_url = (self.config.get("base_url") or
+                    ("http://localhost:11434" if self.provider == EmbeddingProvider.OLLAMA else "https://api.openai.com/v1")).rstrip("/")
+        url = f"{base_url}{self.api_format['path']}"
         
         payload = {
             "model": self.config["model"],
             self.api_format["input_field"]: text
         }
@@
-        except Exception as e:
-            logger.error(f"直接嵌入失败: {e}")
-            raise Exception(f"嵌入失败: {str(e)}")
+        except Exception as e:
+            logger.exception(f"直接嵌入失败: {e}")
+            raise

Also applies to: 167-175, 176-193


195-200: Don’t return exceptions from embed_documents.

Propagate errors; return type should be List[List[float]].

-        tasks = [self.embed_query(text) for text in texts]
-        return await asyncio.gather(*tasks, return_exceptions=True)
+        tasks = [self.embed_query(text) for text in texts]
+        return await asyncio.gather(*tasks)
evaluators/ragas_evaluator.py (2)

190-197: Batch path blocks the event loop; run evaluate off-thread (or switch to aevaluate).

Keep async non-blocking like the single-item path.

-            # 使用Ragas批量评估(同步函数)
-            result = evaluate(
-                dataset, 
-                metrics=self.metrics,
-                llm=self.eval_llm,
-                embeddings=self.embeddings,
-                raise_exceptions=False
-            )
+            # 使用线程池避免阻塞事件循环
+            result = await asyncio.to_thread(
+                evaluate,
+                dataset,
+                metrics=self.metrics,
+                llm=self.eval_llm,
+                embeddings=self.embeddings,
+                raise_exceptions=False
+            )

206-265: Pad metric lists when ragas returns fewer rows than inputs.

Prevents length mismatch for consumers.

                 if scores_dict and len(scores_dict) > 0:
                     for i, item_scores in enumerate(scores_dict):
                         if i < len(answers):
@@
-                        else:
-                            # 如果没有足够的评估结果,填充None
-                            relevancy_scores.append(None)
-                            correctness_scores.append(None)
-                            faithfulness_scores.append(None)
-                            context_precision_scores.append(None)
-                            context_recall_scores.append(None)
+                    # 结果条目不足时补齐
+                    missing = len(answers) - len(relevancy_scores)
+                    if missing > 0:
+                        relevancy_scores.extend([None] * missing)
+                        correctness_scores.extend([None] * missing)
+                        faithfulness_scores.extend([None] * missing)
+                        context_precision_scores.extend([None] * missing)
+                        context_recall_scores.extend([None] * missing)
🧹 Nitpick comments (14)
evaluators/academic_evaluator.py (2)

56-57: Prefer logging.exception() in exception handlers.

Improves traceability without changing behavior.

-            logger.error(f"❌ {self.name}增强评估器初始化失败: {e}")
+            logger.exception(f"❌ {self.name}增强评估器初始化失败: {e}")
@@
-            logger.error(f"异步评估错误: {e}")
+            logger.exception(f"异步评估错误: {e}")
@@
-            logger.error(f"混合评估错误: {e}")
+            logger.exception(f"混合评估错误: {e}")
@@
-            logger.error(f"纯聊天模式评估错误: {e}")
+            logger.exception(f"纯聊天模式评估错误: {e}")
@@
-            logger.error(f"嵌入模型调用失败: {e}")
+            logger.exception(f"嵌入模型调用失败: {e}")
@@
-            logger.error(f"聊天模型质量评估失败: {e}")
+            logger.exception(f"聊天模型质量评估失败: {e}")
@@
-            logger.error(f"余弦相似度计算失败: {e}")
+            logger.exception(f"余弦相似度计算失败: {e}")
@@
-            logger.error(f"文本相似度计算失败: {e}")
+            logger.exception(f"文本相似度计算失败: {e}")

Also applies to: 120-121, 150-151, 226-227, 265-266, 329-330, 449-450, 474-475


34-49: Consider using detect_embedding_config to normalize adapter config.

You import it but don’t use it; applying it would auto-detect provider and reduce misconfig.

-            embedding_config = {
-                "api_key": config.get("embedding_api_key", ""),
-                "base_url": config.get("embedding_base_url"),
-                "model": config.get("embedding_model", "nomic-embed-text:latest"),
-                "timeout": config.get("embedding_timeout", 30)
-            }
+            raw_cfg = {
+                "api_key": config.get("embedding_api_key", ""),
+                "base_url": config.get("embedding_base_url", "") or "",
+                "model": config.get("embedding_model", "nomic-embed-text:latest"),
+                "timeout": config.get("embedding_timeout", 30)
+            }
+            embedding_config = detect_embedding_config(raw_cfg["base_url"], raw_cfg["model"], raw_cfg["api_key"])
+            embedding_config["timeout"] = raw_cfg["timeout"]
utils/embedding_adapter.py (2)

58-76: Unify provider detection with port heuristics.

Mirror 11434-port and model-name checks to reduce mis-detection.

Want a small helper to share detection logic between adapters and detect_embedding_config?

Also applies to: 138-145


114-121: Use logging.exception and raise from original errors.

Improves tracebacks.

-        except Exception as e:
-            logger.error(f"嵌入失败: {e}")
-            raise Exception(f"嵌入失败: {str(e)}")
+        except Exception as e:
+            logger.exception("嵌入失败")
+            raise
@@
-        except Exception as e:
-            logger.error(f"批量嵌入失败: {e}")
-            raise Exception(f"批量嵌入失败: {str(e)}")
+        except Exception as e:
+            logger.exception("批量嵌入失败")
+            raise

Also applies to: 122-129

evaluators/ragas_evaluator.py (2)

85-87: Prefer logging.exception in catches.

Keeps stack traces.

-            logger.error(f"❌ {self.name}评估器初始化失败: {e}")
+            logger.exception(f"❌ {self.name}评估器初始化失败: {e}")
@@
-            logger.error(f"❌ {self.name}异步评估失败: {e}")
+            logger.exception(f"❌ {self.name}异步评估失败: {e}")
@@
-                logger.error(f"    Ragas分数处理错误: {e}")
+                logger.exception(f"    Ragas分数处理错误: {e}")
@@
-            logger.error(f"❌ {self.name}异步批量评估失败: {e}")
+            logger.exception(f"❌ {self.name}异步批量评估失败: {e}")

Also applies to: 101-103, 161-169, 274-276


16-18: Remove unused imports.

aiohttp and Optional are unused.

-from typing import Dict, List, Any, Optional
+from typing import Dict, List, Any
@@
-import aiohttp
+

Also applies to: 21-21

evaluators/factory.py (8)

15-23: Annotate class variables with ClassVar and fix typing.

Satisfies RUF012 and improves type clarity.

-from typing import Dict, List, Any, Optional
+from typing import Any, Dict, List, Optional, ClassVar, Type
@@
-    EVALUATOR_TYPES = {
+    EVALUATOR_TYPES: ClassVar[Dict[str, Type[BaseEvaluator]]] = {
@@
-    DEFAULT_PRIORITY = ["ragas", "academic"]
+    DEFAULT_PRIORITY: ClassVar[List[str]] = ["ragas", "academic"]

39-41: Log full trace on factory creation failures.

Use logging.exception to preserve stack traces (TRY400/BLE001).

-        except Exception as e:
-            logger.error(f"❌ {evaluator_type}评估器创建失败: {e}")
+        except Exception:
+            logger.exception(f"❌ {evaluator_type}评估器创建失败")

52-64: Make evaluator creation truly concurrent.

Current code awaits coroutines sequentially. Create tasks and gather with exception handling.

-        # 并发创建所有评估器
-        tasks = []
-        for evaluator_type in types:
-            task = cls.create_evaluator_async(evaluator_type, config)
-            tasks.append((evaluator_type, task))
-        
-        # 等待所有评估器创建完成
-        for evaluator_type, task in tasks:
-            evaluator = await task
-            if evaluator:
-                evaluators[evaluator_type] = evaluator
+        # 并发创建所有评估器
+        tasks = {et: asyncio.create_task(cls.create_evaluator_async(et, config)) for et in types}
+        results = await asyncio.gather(*tasks.values(), return_exceptions=True)
+        for et, res in zip(tasks.keys(), results):
+            if isinstance(res, Exception):
+                logger.exception(f"❌ {et}评估器创建失败(并发)")
+                continue
+            if res:
+                evaluators[et] = res

71-87: Avoid heavyweight instantiation in get_evaluator_info.

Constructing evaluators here can trigger network calls (e.g., embeddings test) and slow or fail this metadata path. Prefer class-level metadata (e.g., a classmethod supported_metrics()/description) or a local static map.


100-104: Normalize punctuation to ASCII to appease linters.

Silences RUF001/RUF003 without altering meaning.

-            "academic": "增强学术评估器 - 支持6维度质量评估(相关性、正确性、完整性、清晰度、连贯性、有用性)",
-            "ragas": "Ragas框架评估器 - 完整的RAG评估指标集(相关性、正确性、忠实性、上下文精度、上下文召回率)"
+            "academic": "增强学术评估器 - 支持6维度质量评估(相关性、正确性、完整性、清晰度、连贯性、有用性)",
+            "ragas": "Ragas框架评估器 - 完整的RAG评估指标集(相关性、正确性、忠实性、上下文精度、上下文召回率)"
-            "evaluation_mode": "hybrid"  # 使用混合模式:embedding计算相关性,聊天模型评估质量
+            "evaluation_mode": "hybrid"  # 使用混合模式: embedding计算相关性, 聊天模型评估质量

Also applies to: 128-128


115-115: Remove f-strings without placeholders.

Silences F541.

-        logger.info(f"🔧 评估器管理器初始化完成")
+        logger.info("🔧 评估器管理器初始化完成")
-                logger.debug(f"    ✅ 完成")
+                logger.debug("    ✅ 完成")

Also applies to: 152-152


138-141: Annotate contexts as Optional to match default.

Silences RUF013 and improves API clarity.

-    async def evaluate_all_async(self, questions: List[str], answers: List[str], 
-                               ground_truths: List[str], contexts: List[List[str]] = None) -> Dict[str, Dict[str, List[float]]]:
+    async def evaluate_all_async(self, questions: List[str], answers: List[str], 
+                               ground_truths: List[str], contexts: Optional[List[List[str]]] = None) -> Dict[str, Dict[str, List[float]]]:

143-159: Run all evaluators concurrently for throughput.

Keeps per-evaluator error isolation; also switches to logging.exception.

-        for evaluator_name, evaluator in self.evaluators.items():
-            logger.info(f"\n📊 使用{evaluator_name}评估器评估中...")
-            
-            try:
-                # 使用带超时的异步评估
-                metrics = await evaluator.evaluate_with_timeout(
-                    questions, answers, ground_truths, contexts
-                )
-                all_results[evaluator_name] = metrics
-                logger.debug(f"    ✅ 完成")
-            except Exception as e:
-                logger.error(f"    ❌ 失败: {e}")
-                # 使用默认值填充
-                default_metrics = {metric: [None] * len(answers) 
-                                 for metric in evaluator.get_supported_metrics()}
-                all_results[evaluator_name] = default_metrics
+        # 并发执行所有评估器以减少总耗时
+        tasks = {
+            name: asyncio.create_task(
+                evaluator.evaluate_with_timeout(questions, answers, ground_truths, contexts)
+            )
+            for name, evaluator in self.evaluators.items()
+        }
+        for name in tasks:
+            logger.info(f"\n📊 使用{name}评估器评估中...")
+        results = await asyncio.gather(*tasks.values(), return_exceptions=True)
+        for (name, evaluator), res in zip(self.evaluators.items(), results):
+            if isinstance(res, Exception):
+                logger.exception(f"    ❌ 失败({name})")
+                all_results[name] = {metric: [None] * len(answers) for metric in evaluator.get_supported_metrics()}
+            else:
+                all_results[name] = res
+                logger.debug("    ✅ 完成")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2d675c4 and b703f00.

📒 Files selected for processing (4)
  • evaluators/academic_evaluator.py (15 hunks)
  • evaluators/factory.py (3 hunks)
  • evaluators/ragas_evaluator.py (1 hunks)
  • utils/embedding_adapter.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
evaluators/factory.py (3)
evaluators/base.py (4)
  • BaseEvaluator (10-184)
  • get_evaluator_info (171-184)
  • evaluate_with_timeout (62-94)
  • get_supported_metrics (153-160)
evaluators/academic_evaluator.py (2)
  • AcademicEvaluator (18-503)
  • get_supported_metrics (481-487)
evaluators/ragas_evaluator.py (3)
  • RagasEvaluator (24-294)
  • get_evaluator_info (286-294)
  • get_supported_metrics (278-280)
evaluators/ragas_evaluator.py (2)
evaluators/base.py (6)
  • BaseEvaluator (10-184)
  • evaluate_single_answer_async (46-60)
  • evaluate_answers_async (29-43)
  • get_supported_metrics (153-160)
  • is_available (162-169)
  • get_evaluator_info (171-184)
evaluators/factory.py (1)
  • get_evaluator_info (67-94)
evaluators/academic_evaluator.py (2)
evaluators/base.py (1)
  • BaseEvaluator (10-184)
utils/embedding_adapter.py (6)
  • EmbeddingAdapterFactory (201-229)
  • detect_embedding_config (240-265)
  • create_adapter (205-214)
  • embed_query (29-31)
  • embed_query (114-120)
  • embed_query (167-193)
🪛 Ruff (0.12.2)
evaluators/factory.py

16-19: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


22-22: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


28-28: Avoid specifying long messages outside the exception class

(TRY003)


39-39: Do not catch blind exception: Exception

(BLE001)


40-40: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


100-100: String contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF001)


100-100: String contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF001)


101-101: String contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF001)


101-101: String contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF001)


115-115: f-string without any placeholders

Remove extraneous f prefix

(F541)


128-128: Comment contains ambiguous (FULLWIDTH COLON). Did you mean : (COLON)?

(RUF003)


128-128: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


139-139: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


152-152: f-string without any placeholders

Remove extraneous f prefix

(F541)


153-153: Do not catch blind exception: Exception

(BLE001)


154-154: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

evaluators/ragas_evaluator.py

78-78: Abstract raise to an inner function

(TRY301)


78-78: Avoid specifying long messages outside the exception class

(TRY003)


85-85: Do not catch blind exception: Exception

(BLE001)


86-86: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


89-89: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


101-101: Do not catch blind exception: Exception

(BLE001)


102-102: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


105-105: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


120-120: Comment contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF003)


120-120: Comment contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF003)


135-135: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


161-161: Do not catch blind exception: Exception

(BLE001)


162-162: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


165-165: Consider moving this statement to an else block

(TRY300)


167-167: Do not catch blind exception: Exception

(BLE001)


168-168: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


172-172: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)


190-190: Comment contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF003)


190-190: Comment contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF003)


209-209: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


248-248: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


255-255: String contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)


257-257: Do not catch blind exception: Exception

(BLE001)


258-258: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


266-272: Consider moving this statement to an else block

(TRY300)


274-274: Do not catch blind exception: Exception

(BLE001)


275-275: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

evaluators/academic_evaluator.py

46-46: Do not catch blind exception: Exception

(BLE001)


47-47: String contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)


50-50: Comment contains ambiguous (FULLWIDTH COLON). Did you mean : (COLON)?

(RUF003)


50-50: Comment contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF003)


50-50: Comment contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF003)


50-50: Comment contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF003)


50-50: Comment contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF003)


55-55: Do not catch blind exception: Exception

(BLE001)


56-56: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


113-113: Comment contains ambiguous (FULLWIDTH COLON). Did you mean : (COLON)?

(RUF003)


113-113: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


116-116: Comment contains ambiguous (FULLWIDTH COLON). Did you mean : (COLON)?

(RUF003)


119-119: Do not catch blind exception: Exception

(BLE001)


120-120: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


150-150: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


207-207: f-string without any placeholders

Remove extraneous f prefix

(F541)


225-225: Do not catch blind exception: Exception

(BLE001)


226-226: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


230-230: Docstring contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF002)


230-230: Docstring contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF002)


233-233: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


235-235: String contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)


264-264: Do not catch blind exception: Exception

(BLE001)


265-265: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


328-328: Do not catch blind exception: Exception

(BLE001)


329-329: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


449-449: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


474-474: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

utils/embedding_adapter.py

47-47: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


48-48: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


48-48: Create your own exception

(TRY002)


48-48: Avoid specifying long messages outside the exception class

(TRY003)


64-64: Comment contains ambiguous (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?

(RUF003)


64-64: Comment contains ambiguous (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?

(RUF003)


111-111: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


112-112: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


112-112: Create your own exception

(TRY002)


112-112: Avoid specifying long messages outside the exception class

(TRY003)


118-118: Do not catch blind exception: Exception

(BLE001)


119-119: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


120-120: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


120-120: Create your own exception

(TRY002)


120-120: Avoid specifying long messages outside the exception class

(TRY003)


120-120: Use explicit conversion flag

Replace with conversion flag

(RUF010)


126-126: Do not catch blind exception: Exception

(BLE001)


127-127: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


128-128: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


128-128: Create your own exception

(TRY002)


128-128: Avoid specifying long messages outside the exception class

(TRY003)


128-128: Use explicit conversion flag

Replace with conversion flag

(RUF010)


190-190: Abstract raise to an inner function

(TRY301)


190-190: Create your own exception

(TRY002)


190-190: Avoid specifying long messages outside the exception class

(TRY003)


191-191: Do not catch blind exception: Exception

(BLE001)


192-192: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


193-193: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


193-193: Create your own exception

(TRY002)


193-193: Avoid specifying long messages outside the exception class

(TRY003)


193-193: Use explicit conversion flag

Replace with conversion flag

(RUF010)


210-210: Do not catch blind exception: Exception

(BLE001)


211-211: String contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)


225-225: f-string without any placeholders

Remove extraneous f prefix

(F541)


227-227: Do not catch blind exception: Exception

(BLE001)


228-228: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant