Skip to content

Conversation

martinohanlon
Copy link

Description

I am experimenting with different models and approaches.

An area I am struggling with is schema extraction and specifically models returning incorrect JSON which fails to be parsed and the process fails. This is feels particularly brittle.

Regardless of the rules in the prompt relating to JSON data models still return badly formatted JSON, particularly wrapping it in markdown e.g.

'''json
{
}
'''

# ' = backtick

This is particularly prevalent with gpt-4o and the OpenAI open source models e.g. gpt-oss-20b.

Dependent on the model I can include response parameters, "response_format": {"type": "json_object"} , but not all models support it.

I have experimented with including a step, here in the schema extraction, to cleanse the response removing known or common problems (e.g. the markdown format) from the json before loading it. This has worked really well.

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Documentation update
  • Project configuration change

Complexity

Complexity: Low

How Has This Been Tested?

  • Unit tests
  • E2E tests
  • Manual tests

Checklist

The following requirements should have been met (depending on the changes in the branch):

  • Documentation has been updated
  • Unit tests have been updated
  • E2E tests have been updated
  • Examples have been updated
  • New files have copyright header
  • CLA (https://neo4j.com/developer/cla/) has been signed
  • CHANGELOG.md updated if appropriate

@martinohanlon martinohanlon requested a review from a team as a code owner October 14, 2025 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants