Clean extracted schema json #432

martinohanlon · 2025-10-14T11:15:11Z

Description

I am experimenting with different models and approaches.

An area I am struggling with is schema extraction and specifically models returning incorrect JSON which fails to be parsed and the process fails. This is feels particularly brittle.

Regardless of the rules in the prompt relating to JSON data models still return badly formatted JSON, particularly wrapping it in markdown e.g.

'''json
{
}
'''

# ' = backtick

This is particularly prevalent with gpt-4o and the OpenAI open source models e.g. gpt-oss-20b.

Dependent on the model I can include response parameters, "response_format": {"type": "json_object"} , but not all models support it.

I have experimented with including a step, here in the schema extraction, to cleanse the response removing known or common problems (e.g. the markdown format) from the json before loading it. This has worked really well.

Type of Change

Complexity

Complexity: Low

How Has This Been Tested?

Unit tests
E2E tests
Manual tests

Checklist

The following requirements should have been met (depending on the changes in the branch):

Documentation has been updated
Unit tests have been updated
E2E tests have been updated
Examples have been updated
New files have copyright header
CLA (https://neo4j.com/developer/cla/) has been signed
CHANGELOG.md updated if appropriate

tests/unit/experimental/components/test_schema.py

schema extract clean json

cd4fcd9

martinohanlon requested a review from a team as a code owner October 14, 2025 11:15

stellasia reviewed Oct 15, 2025

View reviewed changes

tests/unit/experimental/components/test_schema.py Outdated Show resolved Hide resolved

martinohanlon added 2 commits October 15, 2025 18:03

remove fixtures

5d7524f

updated CHANGELOG

c7eb2f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clean extracted schema json #432

Clean extracted schema json #432

Uh oh!

martinohanlon commented Oct 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Clean extracted schema json #432

Are you sure you want to change the base?

Clean extracted schema json #432

Uh oh!

Conversation

martinohanlon commented Oct 14, 2025

Description

Type of Change

Complexity

How Has This Been Tested?

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants