Merge branch 'main' into main

ZV-Liu · web-flow · commit b7738621d4c7 · 2025-06-04T17:21:04.000+08:00
diff --git a/README.md b/README.md
@@ -37,9 +37,11 @@ The system adopts a two-layer structure:
 
 ## Features
 
-* Hierarchical agent collaboration for complex and dynamic task scenarios
-* Extensible agent system, allowing easy integration of additional specialized agents
-* Automated information analysis, research, and web interaction capabilities
+- Hierarchical agent collaboration for complex and dynamic task scenarios
+- Extensible agent system, allowing easy integration of additional specialized agents
+- Automated information analysis, research, and web interaction capabilities
+- Secure Python code execution environment for tools, featuring configurable import controls, restricted built-ins, attribute access limitations, and resource limits. (See [PythonInterpreterTool Sandboxing](./docs/python_interpreter_sandbox.md) for details).
+  
 
 ## Updates
 
diff --git a/docs/python_interpreter_sandbox.md b/docs/python_interpreter_sandbox.md
@@ -0,0 +1,34 @@
+# PythonInterpreterTool Sandboxing
+
+The `PythonInterpreterTool` allows the agent to execute Python code in a controlled environment. To ensure safety and prevent malicious or unintended actions, the tool employs several sandboxing mechanisms.
+
+## Core Sandboxing Principles
+
+1.  **Custom AST Evaluator:** Instead of using Python's `eval()` or `exec()` directly on user code, the tool parses the code into an Abstract Syntax Tree (AST) and then walks through this tree, evaluating nodes one by one in a controlled manner. This allows fine-grained interception and control over what operations are permitted.
+
+2.  **Import Control:**
+    *   **Allowlist:** The tool uses an allowlist (`authorized_imports`) to specify which Python modules can be imported. Attempts to import modules not on this list will be blocked.
+    *   **Granular Control:** The allowlist can be configured to allow specific submodules (e.g., `os.path`) without allowing the entire parent module (e.g., `os`).
+    *   **Safe Module Copying:** When a module is imported, a "safe copy" is created. This process inspects the module and its submodules, pruning any parts that are not explicitly authorized, to prevent indirect importing of disallowed code.
+    *   **Configuration:** The `authorized_imports` list is configurable when an instance of `PythonInterpreterTool` is created. The default list includes common safe modules like `math`.
+
+3.  **Restricted Built-ins and Functions:**
+    *   Only a curated list of Python built-in functions (`BASE_PYTHON_TOOLS`) are available by default (e.g., `len()`, `str()`, `range()`, math functions).
+    *   Known dangerous functions (e.g., `eval`, `exec`, `open`, `os.system`, `subprocess.call`) are explicitly blacklisted (`DANGEROUS_FUNCTIONS`) and cannot be called even if they are part of an allowed module (defense in depth via the `safer_eval` mechanism).
+
+4.  **Attribute Access Control:**
+    *   Direct access to "dunder" attributes (e.g., `object.__dict__`, `function.__globals__`, `object.__subclasses__`) is blocked. This helps prevent introspection and manipulation of internal states that could lead to sandbox escapes.
+
+5.  **Resource Limits:**
+    *   The interpreter imposes limits on the number of operations (`MAX_OPERATIONS`) and loop iterations (`MAX_WHILE_ITERATIONS`) to prevent denial-of-service attacks through infinite loops or overly complex computations.
+
+6.  **Unsupported Operations:**
+    *   Python features that are complex to sandbox or pose security risks (e.g., `global`, `nonlocal` keywords, direct memory manipulation via `ctypes` unless explicitly allowed) are generally not supported by the custom AST evaluator and will result in errors.
+
+## Security Considerations for Developers
+
+*   **Review `authorized_imports`:** When using the `PythonInterpreterTool`, carefully consider which modules are truly necessary for the intended tasks and restrict the `authorized_imports` list accordingly.
+*   **Least Privilege:** Grant only the minimum necessary permissions. If a task only needs `math.sqrt`, consider if authorizing the entire `math` module is acceptable or if more granular control is needed (though typically, standard library modules like `math` are safe if the functions themselves are not dangerous).
+*   **Tool Output:** Be mindful that the output of the executed code (both return values and print statements) could potentially contain sensitive information if the code handles such data.
+
+By combining these mechanisms, the `PythonInterpreterTool` aims to provide a reasonably safe environment for executing Python code generated by LLMs or other sources, while still offering significant computational capabilities.
diff --git a/src/tools/executor/local_python_executor.py b/src/tools/executor/local_python_executor.py
@@ -129,6 +129,27 @@ def nodunder_getattr(obj, name, default=None):
     "socket",
     "subprocess",
     "sys",
+    "ctypes",
+    "fcntl",
+    "grp",
+    "pwd",
+    # "resource",  # Potentially too restrictive, can be added if specific issues arise
+    "signal",
+    # "syslog",    # Application-specific, might be needed
+    "termios",
+    "tty",
+    "select",
+    "gc",        # Added (can be used to inspect arbitrary objects)
+    "_thread",   # Added (low-level threading)
+    "asyncio",   # Added (can be used for I/O, networking, subprocesses)
+    "marshal",   # Added (can be used to create code objects)
+    "msvcrt",    # Added (Windows specific low-level routines)
+    "pickle",    # Added (can execute arbitrary code)
+    "pipes",     # Added (shell command pipelines)
+    "posix",     # Added (alias for os functions)
+    "threading", # Added (while less dangerous than _thread, still needs caution)
+    "wsgiref",   # Added (can start web servers)
+    "xmlrpc",    # Added (can make network requests)
 ]
 
 DANGEROUS_FUNCTIONS = [
@@ -138,9 +159,29 @@ def nodunder_getattr(obj, name, default=None):
     "builtins.globals",
     "builtins.locals",
     "builtins.__import__",
+    "builtins.open",
+    "builtins.getattr",
+    "builtins.setattr",
+    "builtins.delattr",
+    "builtins.vars",
     "os.popen",
     "os.system",
+    "os.execl", "os.execle", "os.execlp", "os.execlpe", "os.execv", "os.execve", "os.execvp", "os.execvpe",
+    "os.fork", "os.forkpty",
+    "os.kill", "os.killpg",
+    "os.plock",
+    "os.putenv", "os.unsetenv",
+    "os.spawnl", "os.spawnle", "os.spawnlp", "os.spawnlpe", "os.spawnv", "os.spawnve", "os.spawnvp", "os.spawnvpe",
     "posix.system",
+    "subprocess.call", "subprocess.check_call", "subprocess.check_output", "subprocess.Popen", "subprocess.run",
+    "sys.exit", "sys.gettrace", "sys.settrace", "sys.meta_path", "sys.path_hooks", "sys.path_importer_cache",
+    "shutil.copy", "shutil.copy2", "shutil.copyfile", "shutil.copyfileobj", "shutil.copymode", "shutil.copystat", "shutil.copytree",
+    "shutil.move", "shutil.rmtree",
+    "socket.socket",
+    "pickle.load", "pickle.loads",
+    "ctypes.CDLL", "ctypes.PyDLL", "ctypes.WinDLL",
+    "gc.get_objects", "gc.get_referrers", "gc.get_referents",
+    # "object.__subclasses__", # Relies on nodunder_getattr
 ]
 
 
@@ -233,14 +274,17 @@ def build_import_tree(authorized_imports: List[str]) -> Dict[str, Any]:
 
 
 def check_import_authorized(import_to_check: str, authorized_imports: list[str]) -> bool:
-    current_node = build_import_tree(authorized_imports)
-    for part in import_to_check.split("."):
+    tree = build_import_tree(authorized_imports)
+    current_node = tree
+    parts = import_to_check.split(".")
+    for i, part in enumerate(parts):
         if "*" in current_node:
             return True
         if part not in current_node:
             return False
         current_node = current_node[part]
-    return True
+
+    return not current_node or "*" in current_node
 
 
 def safer_eval(func: Callable):
@@ -1097,39 +1141,42 @@ def evaluate_with(
 
 
 def get_safe_module(raw_module, authorized_imports, visited=None):
-    """Creates a safe copy of a module or returns the original if it's a function"""
-    # If it's a function or non-module object, return it directly
     if not isinstance(raw_module, ModuleType):
         return raw_module
 
-    # Handle circular references: Initialize visited set for the first call
     if visited is None:
         visited = set()
 
     module_id = id(raw_module)
     if module_id in visited:
-        return raw_module  # Return original for circular refs
+        return raw_module
 
     visited.add(module_id)
 
-    # Create new module for actual modules
+    # Check authorization for the module itself before proceeding
+    if not check_import_authorized(raw_module.__name__, authorized_imports):
+        raise InterpreterError(f"Import of module {raw_module.__name__} is not allowed.")
+
     safe_module = ModuleType(raw_module.__name__)
 
-    # Copy all attributes by reference, recursively checking modules
     for attr_name in dir(raw_module):
         try:
             attr_value = getattr(raw_module, attr_name)
         except (ImportError, AttributeError) as e:
-            # lazy / dynamic loading module -> INFO log and skip
             logger.info(
                 f"Skipping import error while copying {raw_module.__name__}.{attr_name}: {type(e).__name__} - {e}"
             )
             continue
-        # Recursively process nested modules, passing visited set
-        if isinstance(attr_value, ModuleType):
-            attr_value = get_safe_module(attr_value, authorized_imports, visited=visited)
 
-        setattr(safe_module, attr_name, attr_value)
+        if isinstance(attr_value, ModuleType):
+            submodule_full_name = f"{raw_module.__name__}.{attr_name}"
+            # Only add authorized submodules
+            if check_import_authorized(submodule_full_name, authorized_imports):
+                processed_attr_value = get_safe_module(attr_value, authorized_imports, visited=visited)
+                setattr(safe_module, attr_name, processed_attr_value)
+            # Else: unauthorized submodule, so we don't add it to safe_module
+        else:
+            setattr(safe_module, attr_name, attr_value)
 
     return safe_module
 
diff --git a/tests/test_local_python_executor.py b/tests/test_local_python_executor.py