Skip to content

[Feature] XMLAdapter fails on nested models & repeated tags due to one-level regex parser #8481

Open
@Bhuvanesh09

Description

@Bhuvanesh09

What feature would you like to see?

TL;DR

  • XMLAdapter only handles flat key/value XML.
  • Nested models, lists or any repeated tags raise AdapterParseError.
  • Root cause: a greedy one–level regex (<(\w+)>(.*?)</\1>) with no recursion.
  • Effect: Structured-data extraction use-cases are broken on DSPy.
  • I am preparing a PR that rewrites the parser to perform proper XML traversal; details & full analysis are linked below.

Minimal Reproduction

from typing import List
from pydantic import BaseModel
import dspy

class Person(BaseModel):
    name: str
    age: int

class PersonExtraction(dspy.Signature):
    text: str = dspy.InputField()
    person: Person = dspy.OutputField()

sig = PersonExtraction()
lm_output = """
<person>
  <name>John Doe</name>
  <age>30</age>
</person>
"""

dspy.XMLAdapter(sig).parse(lm_output)   # Raises AdapterParseError

Expected result

{"person": Person(name="John Doe", age=30)}

Actual result

AdapterParseError: Failed to parse LM response …

Environment
DSPy            : 3.0.0b1
Python          : 3.11.
OS              : macOS (Apple Silicon)

Impact

  • Any signature with nested Pydantic models or List[...] cannot be parsed.
  • Few-shot examples generated by format_field_with_value teach the LM an incorrect JSON-in-XML format, compounding errors.
  • Users are pushed to JSONAdapter even when XML would be preferable.

Proposed Fix

Replace the one–level regex with an XML traversal (e.g., xml.etree.ElementTree) that:

  • Recursively walks the tree,
  • Builds Python containers (dict, list) matching the signature,
  • Delegates scalar conversion to parse_value.
  • Update format_field_with_value to emit canonical XML rather than JSON-in-XML.

Add unit tests covering:

  • Nested models,
  • Repeated tags → lists,
  • Mixed scalar & container fields.

I have an implementation that passes these tests and will open a draft PR once this issue is acknowledged.

Would you like to contribute?

  • Yes, I'd like to help implement this.
  • No, I just want to request it.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions