Skip to content

Conversation

dan-and
Copy link

@dan-and dan-and commented Sep 27, 2025

Add PDF detection to skip processing PDF files in fetch and playwright scrapers. This prevents raw PDF binary data from being dumped into HTML/markdown fields.

Fixes #28

Add PDF detection to skip processing PDF files in fetch and playwright scrapers.
This prevents raw PDF binary data from being dumped into HTML/markdown fields.

Fixes devflowinc#28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] PDF Content Incorrectly Dumped into HTML/Markdown Fields During Web Craw

1 participant