Skip to content

feat(adapter): rewrite XMLAdapter for nested-data support #8482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Bhuvanesh09
Copy link

@Bhuvanesh09 Bhuvanesh09 commented Jul 2, 2025

Closes #8481

TL;DR

  • Replaces regex parsing with xml.etree.ElementTree.
  • Supports nested Pydantic models, repeated tags → List, mixed data types.
  • Keeps all existing flat-structure behaviour (no breaking changes expected).

Motivation

XMLAdapter failed on any hierarchical XML (see #8481). Users were forced to switch to JSONAdapter, losing the readability benefits of XML. This PR brings feature-parity with JSONAdapter.


What changed

1. Parsing & Formatting

Area Old New
Parsing engine Regex <(\w+)>(.*?)</\1> ElementTree traversal
Formatting JSON-in-XML Canonical nested XML
Error handling Bare exceptions Explicit AdapterParseError with context

2. New helpers

  • _xml_to_dict(element) → Any
  • _dict_to_xml(data, tag) → str

3. Removed

  • _parse_field_value() – superseded by full XML mapping.

Backwards compatibility

  • Flat structures behave exactly as before (all original tests still pass).
  • No API signature changes; only internal behaviour differs.

Example (was failing, now passes)

class Address(BaseModel):
    street: str; city: str

class Person(BaseModel):
    name: str; age: int; address: Address

class Sig(dspy.Signature):
    text: str = dspy.InputField()
    person: Person = dspy.OutputField()

xml_out = """
<person><name>John</name><age>30</age>
  <address><street>Main</street><city>NYC</city></address>
</person>
"""

assert dspy.XMLAdapter(Sig()).parse(xml_out).person.name == "John"

Tests added

  • appropriate tests have been added at : tests/adapters/test_xml_adapter.py

Risks / limitations

  • ElementTree does not preserve attribute order; irrelevant for our use-case but worth noting.
  • Doesn’t yet support XML attributes (<tag attr="…">)

Happy to engage in conversations, and grateful for this chance to contribute to DSPy.

Bhuvanesh09 and others added 2 commits July 1, 2025 00:49
Refactors the  to provide robust support for complex data structures, including nested Pydantic models and lists.

The original adapter was limited to flat key-value pairs and used a brittle regex-based parsing approach. This commit replaces that implementation with a more resilient one based on Python's .

Key changes:
- **Recursive XML Parsing:** Implemented  to recursively parse nested XML into a Python dictionary, correctly handling repeated tags for lists.
- **Recursive XML Formatting:** Implemented  to serialize nested dictionaries and Pydantic models into well-formed XML strings, ensuring correct formatting for few-shot examples.
- **Pydantic Validation:** The  method now uses  for robust validation and type casting of the parsed XML against the .
- **Comprehensive Testing:** Added new unit tests for deeply nested models, empty lists, malformed XML, and a corrected end-to-end test with a  to validate the full workflow.
Copy link
Collaborator

@chenmoneygithub chenmoneygithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

This is essentially proposing a different approach of using XML to get structured output, right now we just wrap fields by XML tags, while the field values are still in JSON. The parse logic you wrote here deals with the case where the values are structured in XML.

There are two problems here:

  • The PR change doesn't include the instruction to prompt LM to generate XML for nested fields.
  • I don't really know if LM is good at following detailed XML rule, but I am fairly sure that it does a better job at producing JSON value according to JSON schema than XML, because that's what they are trained for.

With that, I don't feel we should proceed in the full XML path, but I will let @okhat to make the call since he has done some research on this before.

@Bhuvanesh09
Copy link
Author

@chenmoneygithub : Thanks for your quick response!

Regarding performance of XML vs JSON

  • In a lot of our internal tests, we have found that nesting XMLs and adhering to fully XML output structure actually work better than JSON is many contexts. Especially when we want to add thinking/ CoT process to an existing problem. This idea came through following anthropic's prompting guide and we were also pleasantly surprised.
  • Even if we assume that JSON is strictly superior to XML, then we possibly don't wish to mix XML and JSON as its happening now in XMLAdapter. This would only confuse the model. In case people want to use JSON, then can already use JSONAdapter. If we are giving an option for XMLParsing, we might as well give a proper implementation.
  • We wanted to port to DSPy for a more structured way for our genai application and a recurrent pattern we saw in our prompts across usecases is that we have xml based prompts. This is why we wanted to implement these changes to unblock rest of our teams.

Regarding Instruction to Prompt LM to generate nested XML.

  • This is solved in our usecase as soon as you include even 1 example input/output in fewshot prompting.
  • I understand that this might not be the usecase for everyone and thank you for the feedback. Including a dynamic prompt based on the output field which also recursively explains the nested structure should be straightforward. I'll do the necessary changes either tonight post work or tomorrow EOD.

Thanks once again for your time in going through the PR and recommending changes.


@okhat : Could you please give a green light for me to work further on this PR? I believe that a proper XML handling adapter would be a great value add for many usecases including ours.

Thank you for DSPy!

@chenmoneygithub
Copy link
Collaborator

@Bhuvanesh09 This is something pretty interesting while complex, if you are interested in this path, here are some guidelines:

  1. Pick 1-2 datasets and 3-5 models, and report the benchmark score between the current XMLAdapter and your proposed XMLAdapter.
  2. Share a github gist/colab/databricks notebook link to your benchmark script.
  3. A few screenshots or example to show that LM can output response in xml format correctly.

We have seen that mixing XML and JSON doing all right, so to avoid causing regression we need to collect numerical evidence that strict XML is beneficial.

@okhat
Copy link
Collaborator

okhat commented Jul 5, 2025

This is very interesting, @Bhuvanesh09 . Thanks @chenmoneygithub for discussing it with @Bhuvanesh09 .

Q: How will this handle Lists?

@Bhuvanesh09
Copy link
Author

This is very interesting, @Bhuvanesh09 . Thanks @chenmoneygithub for discussing it with @Bhuvanesh09 .

Q: How will this handle Lists?

Hi @okhat!

I'm currently working on collecting evidence to support my claims by curating a small custom dataset. Trying for minimal problems that focus on extracting structured information from natural language.

How lists are handled in this case.

Our XMLAdapter handles lists through repeated XML elements, which is the standard XML approach:

1. List Definition in Signature

class TaskList(dspy.Signature):
    topic: str = dspy.InputField(desc="Topic to generate tasks for")
    tasks: list[str] = dspy.OutputField(desc="List of 5 specific tasks")

2. Expected XML Output Structure

<tasks>Task 1 description</tasks>
<tasks>Task 2 description</tasks>
<tasks>Task 3 description</tasks>
...

3. XML Attributes Don't Affect Parsing

Our parser also correctly handles XML attributes like id="1", id="2" and ignores them during parsing:

<tasks id="1">Task 1 description</tasks>
<tasks id="2">Task 2 description</tasks>
<tasks id="3">Task 3 description</tasks>

Side Note: Earlier when we didn't use DSPy, making the LLM add attributes to the tags like above actually helped the model to adhere to the number of outputs we ask it, since it is implictly able to keep track. For instance, when we ask it to generate 5 summary points, then even weaker models are able to be more consistent in giving 5 summary points. It might be in future scope to add this to parser's prompt to suggest models to do the same within DSPy.

Since the fruitful discussion with @chenmoneygithub, I've made changes to the code and included better instructions for output formatting. In my very early experiments, the results look good for this new parser but I'm yet to compare it with the older one.

Hoping to wrap these experiments this weekend and share updates soon!

@Bhuvanesh09
Copy link
Author

Hi @chenmoneygithub and @okhat,

Thank you for the discussion on this PR. I've conducted a series of experiments to provide data-driven evidence for the proposed changes, focusing on how different models interact with the adapters.

TL;DR: The current XMLAdapter uses a complex "JSON-in-XML" prompting strategy that is only consistently understood by larger, more capable models (e.g., Qwen 4B). My ImprovedXMLParser uses a simpler, direct prompt for pure nested XML, making it friendlier and more reliable for smaller models. This results in a 100% parsing rate for the improved adapter across all tested models, while the legacy adapter's parsing rate was as low as 0-10% for models under 4B parameters.

The full experiment notebook and dataset are available for complete reproducibility: <gist_link>


The Experiment: Testing Adapter Robustness

My experiment was designed to test how effectively each adapter could elicit correct, structured output from various language models.

  • Dataset: The experiment uses the person_dataset.csv dataset.
  • Task: The task is to extract a person's name and their nested address (containing city and country) from a natural language sentence and format it into a consistent, nested XML structure.

The design rationale was to decouple the model's core NLU capabilities from its ability to adhere to a specific formatting schema. This allows us to see if a failure is due to the model not understanding the text or the adapter not providing clear instructions.


Experimental Results

The results clearly show that the ImprovedXMLParser is more reliable across a wider range of model sizes.

Parsing Accuracy (%)

The ImprovedXMLParser's direct prompting leads to a 100% parsing rate. The legacy adapter's more complex prompts are only consistently parsed when used with the 4B parameter model.

Model Legacy XMLAdapter ImprovedXMLParser
Qwen 3: 0.6B 10.00% 100.00%
Llama 3.2: 3B 0.00% 100.00%
Qwen 3: 1.7B 10.00% 100.00%
Qwen 3: 4B 100.00% 100.00%

Exact Accuracy (%)

The improved adapter's clarity also leads to higher final accuracy for the smaller models.

Model Legacy XMLAdapter ImprovedXMLParser
Qwen 3: 0.6B 10.00% 85.00%
Llama 3.2: 3B 0.00% 90.00%
Qwen 3: 1.7B 10.00% 100.00%
Qwen 3: 4B 100.00% 95.00%

The "Why": Prompt Complexity vs. Clarity

The difference in performance comes down to prompt complexity.

Current Legacy Adapter's behaviour:

The legacy adapter requires a high level of instruction-following capability by asking for a JSON object inside an XML tag. Only the strongest model tested (Qwen 4B) could handle this reliably.

<address>
{address}        # note: the value you produce must adhere to the JSON schema: {"type": "object", ...}
</address>

Example Failure:

image
(image taken from the notebook linked in the gist)

The prompt itself to the Old Adapter was of the form:
image

New Adapter's behaviour:

The ImprovedXMLParser lowers the barrier to entry. It provides simple, direct instructions to generate pure, nested XML, a task that smaller models can easily accomplish.

Example inspect_history of even the weakest model i.e. Qwen 0.6B being able to follow it:
image

Conclusion

The ImprovedXMLParser makes the XML feature more robust and accessible, especially for developers using smaller, more efficient models. By simplifying the instructions, it ensures reliable structured output without requiring a 4B+ parameter model. This change solves a key usability issue and makes the feature work as expected across a broader ecosystem of LLMs.
Note that none of these models were explicitly trained to give out nested XML structure. Even if these didn't perform as well as they have, there is a possible point to be made that having a consistent xml nest structure would allow people to finetune focused model which have the same ability. Then, it becomes a matter of choice whether they want to use nested XML, or structured JSON. Providing a clear, single-format approach for either nested XML or structured JSON may be more straightforward for both fine-tuning and zero-shot prompting, especially for smaller LMs.

I'm happy to discuss this further and make any additional changes.

Thanks a lot for taking the time to go through my PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] XMLAdapter fails on nested models & repeated tags due to one-level regex parser
3 participants