Merge pull request #24 from imnotfancy/feat/strengthen-python-sandbox

DVampire · web-flow · commit 58c4e4731b70 · 2025-06-04T10:56:26.000+08:00
feat: Strengthen PythonInterpreterTool sandboxing
diff --git a/README.md b/README.md
@@ -32,6 +32,7 @@ The system adopts a two-layer structure:
 - Hierarchical agent collaboration for complex and dynamic task scenarios
 - Extensible agent system, allowing easy integration of additional specialized agents
 - Automated information analysis, research, and web interaction capabilities
+- Secure Python code execution environment for tools, featuring configurable import controls, restricted built-ins, attribute access limitations, and resource limits. (See [PythonInterpreterTool Sandboxing](./docs/python_interpreter_sandbox.md) for details).
   
 
 ## Updates
diff --git a/docs/python_interpreter_sandbox.md b/docs/python_interpreter_sandbox.md
@@ -0,0 +1,34 @@
+# PythonInterpreterTool Sandboxing
+
+The `PythonInterpreterTool` allows the agent to execute Python code in a controlled environment. To ensure safety and prevent malicious or unintended actions, the tool employs several sandboxing mechanisms.
+
+## Core Sandboxing Principles
+
+1.  **Custom AST Evaluator:** Instead of using Python's `eval()` or `exec()` directly on user code, the tool parses the code into an Abstract Syntax Tree (AST) and then walks through this tree, evaluating nodes one by one in a controlled manner. This allows fine-grained interception and control over what operations are permitted.
+
+2.  **Import Control:**
+    *   **Allowlist:** The tool uses an allowlist (`authorized_imports`) to specify which Python modules can be imported. Attempts to import modules not on this list will be blocked.
+    *   **Granular Control:** The allowlist can be configured to allow specific submodules (e.g., `os.path`) without allowing the entire parent module (e.g., `os`).
+    *   **Safe Module Copying:** When a module is imported, a "safe copy" is created. This process inspects the module and its submodules, pruning any parts that are not explicitly authorized, to prevent indirect importing of disallowed code.
+    *   **Configuration:** The `authorized_imports` list is configurable when an instance of `PythonInterpreterTool` is created. The default list includes common safe modules like `math`.
+
+3.  **Restricted Built-ins and Functions:**
+    *   Only a curated list of Python built-in functions (`BASE_PYTHON_TOOLS`) are available by default (e.g., `len()`, `str()`, `range()`, math functions).
+    *   Known dangerous functions (e.g., `eval`, `exec`, `open`, `os.system`, `subprocess.call`) are explicitly blacklisted (`DANGEROUS_FUNCTIONS`) and cannot be called even if they are part of an allowed module (defense in depth via the `safer_eval` mechanism).
+
+4.  **Attribute Access Control:**
+    *   Direct access to "dunder" attributes (e.g., `object.__dict__`, `function.__globals__`, `object.__subclasses__`) is blocked. This helps prevent introspection and manipulation of internal states that could lead to sandbox escapes.
+
+5.  **Resource Limits:**
+    *   The interpreter imposes limits on the number of operations (`MAX_OPERATIONS`) and loop iterations (`MAX_WHILE_ITERATIONS`) to prevent denial-of-service attacks through infinite loops or overly complex computations.
+
+6.  **Unsupported Operations:**
+    *   Python features that are complex to sandbox or pose security risks (e.g., `global`, `nonlocal` keywords, direct memory manipulation via `ctypes` unless explicitly allowed) are generally not supported by the custom AST evaluator and will result in errors.
+
+## Security Considerations for Developers
+
+*   **Review `authorized_imports`:** When using the `PythonInterpreterTool`, carefully consider which modules are truly necessary for the intended tasks and restrict the `authorized_imports` list accordingly.
+*   **Least Privilege:** Grant only the minimum necessary permissions. If a task only needs `math.sqrt`, consider if authorizing the entire `math` module is acceptable or if more granular control is needed (though typically, standard library modules like `math` are safe if the functions themselves are not dangerous).
+*   **Tool Output:** Be mindful that the output of the executed code (both return values and print statements) could potentially contain sensitive information if the code handles such data.
+
+By combining these mechanisms, the `PythonInterpreterTool` aims to provide a reasonably safe environment for executing Python code generated by LLMs or other sources, while still offering significant computational capabilities.
diff --git a/src/tools/executor/local_python_executor.py b/src/tools/executor/local_python_executor.py
@@ -129,6 +129,27 @@ def nodunder_getattr(obj, name, default=None):
     "socket",
     "subprocess",
     "sys",
+    "ctypes",
+    "fcntl",
+    "grp",
+    "pwd",
+    # "resource",  # Potentially too restrictive, can be added if specific issues arise
+    "signal",
+    # "syslog",    # Application-specific, might be needed
+    "termios",
+    "tty",
+    "select",
+    "gc",        # Added (can be used to inspect arbitrary objects)
+    "_thread",   # Added (low-level threading)
+    "asyncio",   # Added (can be used for I/O, networking, subprocesses)
+    "marshal",   # Added (can be used to create code objects)
+    "msvcrt",    # Added (Windows specific low-level routines)
+    "pickle",    # Added (can execute arbitrary code)
+    "pipes",     # Added (shell command pipelines)
+    "posix",     # Added (alias for os functions)
+    "threading", # Added (while less dangerous than _thread, still needs caution)
+    "wsgiref",   # Added (can start web servers)
+    "xmlrpc",    # Added (can make network requests)
 ]
 
 DANGEROUS_FUNCTIONS = [
@@ -138,9 +159,29 @@ def nodunder_getattr(obj, name, default=None):
     "builtins.globals",
     "builtins.locals",
     "builtins.__import__",
+    "builtins.open",
+    "builtins.getattr",
+    "builtins.setattr",
+    "builtins.delattr",
+    "builtins.vars",
     "os.popen",
     "os.system",
+    "os.execl", "os.execle", "os.execlp", "os.execlpe", "os.execv", "os.execve", "os.execvp", "os.execvpe",
+    "os.fork", "os.forkpty",
+    "os.kill", "os.killpg",
+    "os.plock",
+    "os.putenv", "os.unsetenv",
+    "os.spawnl", "os.spawnle", "os.spawnlp", "os.spawnlpe", "os.spawnv", "os.spawnve", "os.spawnvp", "os.spawnvpe",
     "posix.system",
+    "subprocess.call", "subprocess.check_call", "subprocess.check_output", "subprocess.Popen", "subprocess.run",
+    "sys.exit", "sys.gettrace", "sys.settrace", "sys.meta_path", "sys.path_hooks", "sys.path_importer_cache",
+    "shutil.copy", "shutil.copy2", "shutil.copyfile", "shutil.copyfileobj", "shutil.copymode", "shutil.copystat", "shutil.copytree",
+    "shutil.move", "shutil.rmtree",
+    "socket.socket",
+    "pickle.load", "pickle.loads",
+    "ctypes.CDLL", "ctypes.PyDLL", "ctypes.WinDLL",
+    "gc.get_objects", "gc.get_referrers", "gc.get_referents",
+    # "object.__subclasses__", # Relies on nodunder_getattr
 ]
 
 
@@ -233,14 +274,17 @@ def build_import_tree(authorized_imports: List[str]) -> Dict[str, Any]:
 
 
 def check_import_authorized(import_to_check: str, authorized_imports: list[str]) -> bool:
-    current_node = build_import_tree(authorized_imports)
-    for part in import_to_check.split("."):
+    tree = build_import_tree(authorized_imports)
+    current_node = tree
+    parts = import_to_check.split(".")
+    for i, part in enumerate(parts):
         if "*" in current_node:
             return True
         if part not in current_node:
             return False
         current_node = current_node[part]
-    return True
+
+    return not current_node or "*" in current_node
 
 
 def safer_eval(func: Callable):
@@ -1097,39 +1141,42 @@ def evaluate_with(
 
 
 def get_safe_module(raw_module, authorized_imports, visited=None):
-    """Creates a safe copy of a module or returns the original if it's a function"""
-    # If it's a function or non-module object, return it directly
     if not isinstance(raw_module, ModuleType):
         return raw_module
 
-    # Handle circular references: Initialize visited set for the first call
     if visited is None:
         visited = set()
 
     module_id = id(raw_module)
     if module_id in visited:
-        return raw_module  # Return original for circular refs
+        return raw_module
 
     visited.add(module_id)
 
-    # Create new module for actual modules
+    # Check authorization for the module itself before proceeding
+    if not check_import_authorized(raw_module.__name__, authorized_imports):
+        raise InterpreterError(f"Import of module {raw_module.__name__} is not allowed.")
+
     safe_module = ModuleType(raw_module.__name__)
 
-    # Copy all attributes by reference, recursively checking modules
     for attr_name in dir(raw_module):
         try:
             attr_value = getattr(raw_module, attr_name)
         except (ImportError, AttributeError) as e:
-            # lazy / dynamic loading module -> INFO log and skip
             logger.info(
                 f"Skipping import error while copying {raw_module.__name__}.{attr_name}: {type(e).__name__} - {e}"
             )
             continue
-        # Recursively process nested modules, passing visited set
-        if isinstance(attr_value, ModuleType):
-            attr_value = get_safe_module(attr_value, authorized_imports, visited=visited)
 
-        setattr(safe_module, attr_name, attr_value)
+        if isinstance(attr_value, ModuleType):
+            submodule_full_name = f"{raw_module.__name__}.{attr_name}"
+            # Only add authorized submodules
+            if check_import_authorized(submodule_full_name, authorized_imports):
+                processed_attr_value = get_safe_module(attr_value, authorized_imports, visited=visited)
+                setattr(safe_module, attr_name, processed_attr_value)
+            # Else: unauthorized submodule, so we don't add it to safe_module
+        else:
+            setattr(safe_module, attr_name, attr_value)
 
     return safe_module
 
diff --git a/tests/test_local_python_executor.py b/tests/test_local_python_executor.py
@@ -0,0 +1,219 @@
+import unittest
+from src.tools.executor.local_python_executor import evaluate_python_code, InterpreterError, BASE_PYTHON_TOOLS, BASE_BUILTIN_MODULES, DEFAULT_MAX_LEN_OUTPUT
+
+# It's good practice to define a small, fixed list for default authorized_imports in tests
+# unless a test specifically needs to modify it.
+TEST_DEFAULT_AUTHORIZED_IMPORTS = ["math"] # Example, can be empty if preferred for stricter tests
+
+class TestPythonInterpreterSandbox(unittest.TestCase):
+
+    def setUp(self):
+        # These are defaults for the tools/state available during evaluation.
+        # Tests can override state or custom_tools if needed.
+        self.static_tools = BASE_PYTHON_TOOLS.copy()
+        self.custom_tools = {}
+        # self.state is not defined here, as evaluate_python_code takes state as an argument
+        # and it's better to pass a fresh state for each test call to avoid interference.
+
+    def _evaluate(self, code, authorized_imports=None, state=None):
+        if authorized_imports is None:
+            authorized_imports = list(TEST_DEFAULT_AUTHORIZED_IMPORTS) # Use a copy
+
+        current_state = state if state is not None else {}
+
+        # evaluate_python_code returns (result, is_final_answer)
+        return evaluate_python_code(
+            code,
+            static_tools=self.static_tools,
+            custom_tools=self.custom_tools, # Pass along self.custom_tools
+            state=current_state,            # Pass along current_state
+            authorized_imports=authorized_imports,
+            max_print_outputs_length=DEFAULT_MAX_LEN_OUTPUT
+        )
+
+    # === Import Tests ===
+    def test_import_disallowed_module_direct(self):
+        with self.assertRaisesRegex(InterpreterError, "Import of os is not allowed"):
+            self._evaluate("import os", authorized_imports=[])
+
+    def test_import_disallowed_module_from(self):
+        with self.assertRaisesRegex(InterpreterError, "Import from os is not allowed"):
+            self._evaluate("from os import path", authorized_imports=[])
+
+    def test_import_allowed_module(self):
+        result, _ = self._evaluate("import math; x = math.sqrt(4)", authorized_imports=["math"])
+        self.assertEqual(result, 2.0)
+
+    def test_import_submodule_allowed_implicitly(self):
+        # If 'collections' is allowed, 'collections.abc' should be usable via attribute access.
+        # The get_safe_module ensures submodules are also checked if they were explicitly imported.
+        # This test checks if 'collections.abc' can be accessed if 'collections' is authorized.
+        # The updated get_safe_module will try to check 'collections.abc' when 'collections' is processed.
+        # So, 'collections.abc' must also be in authorized_imports or match a wildcard like 'collections.*'
+        # For this test, let's authorize both specifically.
+        result, _ = self._evaluate("import collections; c = collections.abc.Callable", authorized_imports=["collections", "collections.abc"])
+        # Check that 'c' is indeed the Callable type from collections.abc
+        import collections.abc as abc_module
+        self.assertIs(result, abc_module.Callable)
+
+
+    def test_import_only_specific_submodule_denies_parent_access(self):
+        # Allow "os.path" but try to use "os.listdir()" -> should fail on "os" not being fully allowed for that.
+        # This tests the precision of check_import_authorized.
+        # If only "os.path" is authorized, "import os" should fail.
+        with self.assertRaisesRegex(InterpreterError, "Import of os is not allowed"):
+             self._evaluate("import os; os.listdir('.')", authorized_imports=["os.path"])
+
+    def test_import_authorized_submodule_directly(self):
+        result, _ = self._evaluate("import os.path; x = os.path.basename('/a/b')", authorized_imports=["os.path"])
+        self.assertEqual(result, "b")
+
+    def test_import_from_authorized_submodule(self):
+        result, _ = self._evaluate("from os.path import basename; x = basename('/a/b')", authorized_imports=["os.path"])
+        self.assertEqual(result, "b")
+
+    # === Dangerous Function Call Tests ===
+    def test_call_dangerous_builtin_function_eval(self):
+        with self.assertRaisesRegex(InterpreterError, "Forbidden access to function: eval"):
+            self._evaluate("eval('1+1')")
+
+    def test_call_dangerous_builtin_function_exec(self):
+        with self.assertRaisesRegex(InterpreterError, "Forbidden access to function: exec"):
+            self._evaluate("exec('a=1')")
+
+    def test_call_dangerous_os_function_system_via_import(self):
+        # This relies on 'os' module itself being blocked from import.
+        with self.assertRaisesRegex(InterpreterError, "Import of os is not allowed"):
+            self._evaluate("import os; os.system('echo hello')")
+
+    def test_call_dangerous_function_if_module_was_somehow_allowed(self):
+        # If 'os' was authorized, safer_eval should still block 'os.system' if it's in DANGEROUS_FUNCTIONS
+        # This tests the defense in depth of safer_eval.
+        with self.assertRaisesRegex(InterpreterError, "Forbidden access to function: system"):
+            self._evaluate("import os; os.system('echo hello')", authorized_imports=["os"])
+
+
+    def test_call_allowed_builtin_function(self):
+        result, _ = self._evaluate("len([1,2,3])")
+        self.assertEqual(result, 3)
+
+    def test_call_function_returned_by_tool_if_dangerous(self):
+        # Mocking state to contain a dangerous function
+        current_state = {"my_dangerous_func": eval}
+        with self.assertRaisesRegex(InterpreterError, "Forbidden access to function: eval"):
+             self._evaluate("my_dangerous_func('1+1')", state=current_state, authorized_imports=[])
+
+
+    # === Dunder Attribute Access Tests ===
+    def test_access_disallowed_dunder_directly_on_dict(self):
+        with self.assertRaisesRegex(InterpreterError, "Forbidden access to dunder attribute: __dict__"):
+            self._evaluate("x = {}; x.__dict__")
+
+    def test_access_disallowed_dunder_directly_on_module(self):
+        # math.__loader__ is an example.
+        with self.assertRaisesRegex(InterpreterError, "Forbidden access to dunder attribute: __loader__"):
+            self._evaluate("import math; math.__loader__", authorized_imports=["math"])
+
+
+    def test_access_disallowed_dunder_via_getattr(self):
+        # getattr is nodunder_getattr in BASE_PYTHON_TOOLS
+        with self.assertRaisesRegex(InterpreterError, "Forbidden access to dunder attribute: __subclasses__"):
+            self._evaluate("x = type(0); getattr(x, '__subclasses__')")
+
+    def test_allowed_dunder_method_indirectly_len(self):
+        result, _ = self._evaluate("x = [1,2]; len(x)")
+        self.assertEqual(result, 2)
+
+    def test_allowed_dunder_method_indirectly_getitem(self):
+        result, _ = self._evaluate("x = [10,20]; x[1]")
+        self.assertEqual(result, 20)
+
+    # === AST Node Behavior Tests ===
+    def test_assign_to_static_tool_name_blocked(self):
+        with self.assertRaisesRegex(InterpreterError, "Cannot assign to name 'len'"):
+            self._evaluate("len = lambda x: x")
+
+    def test_lambda_executes_in_sandbox_blocks_import(self):
+        with self.assertRaisesRegex(InterpreterError, "Import of sys is not allowed"):
+            self._evaluate("f = lambda: __import__('sys'); f()", authorized_imports=[])
+
+    def test_def_function_executes_in_sandbox_blocks_import(self):
+        code = """
+def my_func():
+    import shutil # Disallowed
+    return shutil.disk_usage('.')
+my_func()
+"""
+        with self.assertRaisesRegex(InterpreterError, "Import of shutil is not allowed"):
+            self._evaluate(code, authorized_imports=[])
+
+    def test_class_def_executes_in_sandbox_blocks_import_in_init(self):
+        code = """
+class MyClass:
+    def __init__(self):
+        import subprocess # Disallowed
+        self.name = subprocess.call('echo')
+    def get_name(self):
+        return self.name
+x = MyClass()
+x.get_name()
+"""
+        with self.assertRaisesRegex(InterpreterError, "Import of subprocess is not allowed"):
+            self._evaluate(code, authorized_imports=[])
+
+    def test_class_def_executes_in_sandbox_blocks_import_in_method(self):
+        code = """
+class MyClassMethod:
+    def do_bad_stuff(self):
+        import _thread # Disallowed
+        return _thread.get_ident()
+x = MyClassMethod()
+x.do_bad_stuff()
+"""
+        with self.assertRaisesRegex(InterpreterError, "Import of _thread is not allowed"):
+            self._evaluate(code, authorized_imports=[])
+
+    def test_unsupported_ast_node_global_keyword(self):
+        code = """
+x = 0
+def f():
+    global x # ast.Global node
+    x = 1
+"""
+        with self.assertRaisesRegex(InterpreterError, "Global is not supported"):
+             self._evaluate(code)
+
+    def test_unsupported_ast_node_nonlocal_keyword(self):
+        code = """
+def f():
+    x = 1
+    def g():
+        nonlocal x # ast.Nonlocal node
+        x = 2
+    g()
+"""
+        with self.assertRaisesRegex(InterpreterError, "Nonlocal is not supported"):
+             self._evaluate(code)
+
+    def test_comprehension_sandbox_import(self):
+        with self.assertRaisesRegex(InterpreterError, "Import of os is not allowed"):
+            self._evaluate("[__import__('os') for i in range(1)]", authorized_imports=[])
+
+    def test_try_except_sandbox_import(self):
+        code = """
+try:
+    x = 1
+except Exception:
+    import os
+else:
+    import sys
+finally:
+    import subprocess
+"""
+        # The first import attempt (os) should be caught.
+        with self.assertRaisesRegex(InterpreterError, "Import of os is not allowed"):
+            self._evaluate(code, authorized_imports=[])
+
+
+if __name__ == "__main__":
+    unittest.main()