feat(adapter): rewrite XMLAdapter for nested-data support #8482

Bhuvanesh09 · 2025-07-02T18:42:47Z

TL;DR

Replaces regex parsing with xml.etree.ElementTree.
Supports nested Pydantic models, repeated tags → List, mixed data types.
Keeps all existing flat-structure behaviour (no breaking changes expected).

Motivation

XMLAdapter failed on any hierarchical XML (see #8481). Users were forced to switch to JSONAdapter, losing the readability benefits of XML. This PR brings feature-parity with JSONAdapter.

What changed

1. Parsing & Formatting

Area	Old	New
Parsing engine	Regex `<(\w+)>(.*?)</\1>`	`ElementTree` traversal
Formatting	JSON-in-XML	Canonical nested XML
Error handling	Bare exceptions	Explicit `AdapterParseError` with context

2. New helpers

_xml_to_dict(element) → Any
_dict_to_xml(data, tag) → str

3. Removed

_parse_field_value() – superseded by full XML mapping.

Backwards compatibility

Flat structures behave exactly as before (all original tests still pass).
No API signature changes; only internal behaviour differs.

Example (was failing, now passes)

class Address(BaseModel):
    street: str; city: str

class Person(BaseModel):
    name: str; age: int; address: Address

class Sig(dspy.Signature):
    text: str = dspy.InputField()
    person: Person = dspy.OutputField()

xml_out = """
<person><name>John</name><age>30</age>
  <address><street>Main</street><city>NYC</city></address>
</person>
"""

assert dspy.XMLAdapter(Sig()).parse(xml_out).person.name == "John"

Tests added

appropriate tests have been added at : tests/adapters/test_xml_adapter.py

Risks / limitations

ElementTree does not preserve attribute order; irrelevant for our use-case but worth noting.
Doesn’t yet support XML attributes (<tag attr="…">)

Happy to engage in conversations, and grateful for this chance to contribute to DSPy.

Refactors the to provide robust support for complex data structures, including nested Pydantic models and lists. The original adapter was limited to flat key-value pairs and used a brittle regex-based parsing approach. This commit replaces that implementation with a more resilient one based on Python's . Key changes: - **Recursive XML Parsing:** Implemented to recursively parse nested XML into a Python dictionary, correctly handling repeated tags for lists. - **Recursive XML Formatting:** Implemented to serialize nested dictionaries and Pydantic models into well-formed XML strings, ensuring correct formatting for few-shot examples. - **Pydantic Validation:** The method now uses for robust validation and type casting of the parsed XML against the . - **Comprehensive Testing:** Added new unit tests for deeply nested models, empty lists, malformed XML, and a corrected end-to-end test with a to validate the full workflow.

chenmoneygithub

Thanks for the PR!

This is essentially proposing a different approach of using XML to get structured output, right now we just wrap fields by XML tags, while the field values are still in JSON. The parse logic you wrote here deals with the case where the values are structured in XML.

There are two problems here:

The PR change doesn't include the instruction to prompt LM to generate XML for nested fields.
I don't really know if LM is good at following detailed XML rule, but I am fairly sure that it does a better job at producing JSON value according to JSON schema than XML, because that's what they are trained for.

With that, I don't feel we should proceed in the full XML path, but I will let @okhat to make the call since he has done some research on this before.

Bhuvanesh09 · 2025-07-03T04:48:58Z

@chenmoneygithub : Thanks for your quick response!

Regarding performance of XML vs JSON

In a lot of our internal tests, we have found that nesting XMLs and adhering to fully XML output structure actually work better than JSON is many contexts. Especially when we want to add thinking/ CoT process to an existing problem. This idea came through following anthropic's prompting guide and we were also pleasantly surprised.
Even if we assume that JSON is strictly superior to XML, then we possibly don't wish to mix XML and JSON as its happening now in XMLAdapter. This would only confuse the model. In case people want to use JSON, then can already use JSONAdapter. If we are giving an option for XMLParsing, we might as well give a proper implementation.
We wanted to port to DSPy for a more structured way for our genai application and a recurrent pattern we saw in our prompts across usecases is that we have xml based prompts. This is why we wanted to implement these changes to unblock rest of our teams.

Regarding Instruction to Prompt LM to generate nested XML.

This is solved in our usecase as soon as you include even 1 example input/output in fewshot prompting.
I understand that this might not be the usecase for everyone and thank you for the feedback. Including a dynamic prompt based on the output field which also recursively explains the nested structure should be straightforward. I'll do the necessary changes either tonight post work or tomorrow EOD.

Thanks once again for your time in going through the PR and recommending changes.

@okhat : Could you please give a green light for me to work further on this PR? I believe that a proper XML handling adapter would be a great value add for many usecases including ours.

Thank you for DSPy!

chenmoneygithub · 2025-07-03T20:37:26Z

@Bhuvanesh09 This is something pretty interesting while complex, if you are interested in this path, here are some guidelines:

Pick 1-2 datasets and 3-5 models, and report the benchmark score between the current XMLAdapter and your proposed XMLAdapter.
Share a github gist/colab/databricks notebook link to your benchmark script.
A few screenshots or example to show that LM can output response in xml format correctly.

We have seen that mixing XML and JSON doing all right, so to avoid causing regression we need to collect numerical evidence that strict XML is beneficial.

okhat · 2025-07-05T14:21:14Z

This is very interesting, @Bhuvanesh09 . Thanks @chenmoneygithub for discussing it with @Bhuvanesh09 .

Q: How will this handle Lists?

Bhuvanesh09 · 2025-07-05T15:00:33Z

This is very interesting, @Bhuvanesh09 . Thanks @chenmoneygithub for discussing it with @Bhuvanesh09 .

Q: How will this handle Lists?

Hi @okhat!

I'm currently working on collecting evidence to support my claims by curating a small custom dataset. Trying for minimal problems that focus on extracting structured information from natural language.

How lists are handled in this case.

Our XMLAdapter handles lists through repeated XML elements, which is the standard XML approach:

1. List Definition in Signature

class TaskList(dspy.Signature):
    topic: str = dspy.InputField(desc="Topic to generate tasks for")
    tasks: list[str] = dspy.OutputField(desc="List of 5 specific tasks")

2. Expected XML Output Structure

<tasks>Task 1 description</tasks>
<tasks>Task 2 description</tasks>
<tasks>Task 3 description</tasks>
...

3. XML Attributes Don't Affect Parsing

Our parser also correctly handles XML attributes like id="1", id="2" and ignores them during parsing:

<tasks id="1">Task 1 description</tasks>
<tasks id="2">Task 2 description</tasks>
<tasks id="3">Task 3 description</tasks>

Side Note: Earlier when we didn't use DSPy, making the LLM add attributes to the tags like above actually helped the model to adhere to the number of outputs we ask it, since it is implictly able to keep track. For instance, when we ask it to generate 5 summary points, then even weaker models are able to be more consistent in giving 5 summary points. It might be in future scope to add this to parser's prompt to suggest models to do the same within DSPy.

Since the fruitful discussion with @chenmoneygithub, I've made changes to the code and included better instructions for output formatting. In my very early experiments, the results look good for this new parser but I'm yet to compare it with the older one.

Hoping to wrap these experiments this weekend and share updates soon!

Bhuvanesh09 · 2025-07-06T19:38:01Z

Hi @chenmoneygithub and @okhat,

Thank you for the discussion on this PR. I've conducted a series of experiments to provide data-driven evidence for the proposed changes, focusing on how different models interact with the adapters.

TL;DR: The current XMLAdapter uses a complex "JSON-in-XML" prompting strategy that is only consistently understood by larger, more capable models (e.g., Qwen 4B). My ImprovedXMLParser uses a simpler, direct prompt for pure nested XML, making it friendlier and more reliable for smaller models. This results in a 100% parsing rate for the improved adapter across all tested models, while the legacy adapter's parsing rate was as low as 0-10% for models under 4B parameters.

The full experiment notebook and dataset are available for complete reproducibility: <gist_link>

The Experiment: Testing Adapter Robustness

My experiment was designed to test how effectively each adapter could elicit correct, structured output from various language models.

Dataset: The experiment uses the person_dataset.csv dataset.
Task: The task is to extract a person's name and their nested address (containing city and country) from a natural language sentence and format it into a consistent, nested XML structure.

The design rationale was to decouple the model's core NLU capabilities from its ability to adhere to a specific formatting schema. This allows us to see if a failure is due to the model not understanding the text or the adapter not providing clear instructions.

Experimental Results

The results clearly show that the ImprovedXMLParser is more reliable across a wider range of model sizes.

Parsing Accuracy (%)

The ImprovedXMLParser's direct prompting leads to a 100% parsing rate. The legacy adapter's more complex prompts are only consistently parsed when used with the 4B parameter model.

Model	Legacy XMLAdapter	ImprovedXMLParser
Qwen 3: 0.6B	10.00%	100.00%
Llama 3.2: 3B	0.00%	100.00%
Qwen 3: 1.7B	10.00%	100.00%
Qwen 3: 4B	100.00%	100.00%

Exact Accuracy (%)

The improved adapter's clarity also leads to higher final accuracy for the smaller models.

Model	Legacy XMLAdapter	ImprovedXMLParser
Qwen 3: 0.6B	10.00%	85.00%
Llama 3.2: 3B	0.00%	90.00%
Qwen 3: 1.7B	10.00%	100.00%
Qwen 3: 4B	100.00%	95.00%

The "Why": Prompt Complexity vs. Clarity

The difference in performance comes down to prompt complexity.

Current Legacy Adapter's behaviour:

The legacy adapter requires a high level of instruction-following capability by asking for a JSON object inside an XML tag. Only the strongest model tested (Qwen 4B) could handle this reliably.

<address>
{address}        # note: the value you produce must adhere to the JSON schema: {"type": "object", ...}
</address>

Example Failure:

(image taken from the notebook linked in the gist)

The prompt itself to the Old Adapter was of the form:

New Adapter's behaviour:

The ImprovedXMLParser lowers the barrier to entry. It provides simple, direct instructions to generate pure, nested XML, a task that smaller models can easily accomplish.

Example inspect_history of even the weakest model i.e. Qwen 0.6B being able to follow it:

Conclusion

The ImprovedXMLParser makes the XML feature more robust and accessible, especially for developers using smaller, more efficient models. By simplifying the instructions, it ensures reliable structured output without requiring a 4B+ parameter model. This change solves a key usability issue and makes the feature work as expected across a broader ecosystem of LLMs.
Note that none of these models were explicitly trained to give out nested XML structure. Even if these didn't perform as well as they have, there is a possible point to be made that having a consistent xml nest structure would allow people to finetune focused model which have the same ability. Then, it becomes a matter of choice whether they want to use nested XML, or structured JSON. Providing a clear, single-format approach for either nested XML or structured JSON may be more straightforward for both fine-tuning and zero-shot prompting, especially for smaller LMs.

I'm happy to discuss this further and make any additional changes.

Thanks a lot for taking the time to go through my PR.

Bhuvanesh09 and others added 2 commits July 1, 2025 00:49

chore: removed llm plans and discussions

2af167c

chenmoneygithub reviewed Jul 2, 2025

View reviewed changes

feat: improved parser with better format instructions

3d3a127

Bhuvanesh09 requested a review from chenmoneygithub July 8, 2025 04:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(adapter): rewrite XMLAdapter for nested-data support #8482

feat(adapter): rewrite XMLAdapter for nested-data support #8482

Uh oh!

Bhuvanesh09 commented Jul 2, 2025 •

edited

Loading

Uh oh!

chenmoneygithub left a comment

Uh oh!

Bhuvanesh09 commented Jul 3, 2025

Uh oh!

chenmoneygithub commented Jul 3, 2025

Uh oh!

okhat commented Jul 5, 2025

Uh oh!

Bhuvanesh09 commented Jul 5, 2025

Uh oh!

Bhuvanesh09 commented Jul 6, 2025

Uh oh!

Uh oh!

feat(adapter): rewrite XMLAdapter for nested-data support #8482

Are you sure you want to change the base?

feat(adapter): rewrite XMLAdapter for nested-data support #8482

Uh oh!

Conversation

Bhuvanesh09 commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

Motivation

What changed

1. Parsing & Formatting

2. New helpers

3. Removed

Backwards compatibility

Example (was failing, now passes)

Tests added

Risks / limitations

Uh oh!

chenmoneygithub left a comment

Choose a reason for hiding this comment

Uh oh!

Bhuvanesh09 commented Jul 3, 2025

Regarding performance of XML vs JSON

Regarding Instruction to Prompt LM to generate nested XML.

Uh oh!

chenmoneygithub commented Jul 3, 2025

Uh oh!

okhat commented Jul 5, 2025

Uh oh!

Bhuvanesh09 commented Jul 5, 2025

How lists are handled in this case.

1. List Definition in Signature

2. Expected XML Output Structure

3. XML Attributes Don't Affect Parsing

Uh oh!

Bhuvanesh09 commented Jul 6, 2025

The Experiment: Testing Adapter Robustness

Experimental Results

Parsing Accuracy (%)

Exact Accuracy (%)

The "Why": Prompt Complexity vs. Clarity

Current Legacy Adapter's behaviour:

New Adapter's behaviour:

Conclusion

Uh oh!

Uh oh!

Bhuvanesh09 commented Jul 2, 2025 •

edited

Loading