-
Notifications
You must be signed in to change notification settings - Fork 12.4k
Open
Labels
help wantedNeeds help from the communityNeeds help from the communitymodelModel specificModel specific
Description
Request: Nougat OCR Integration
I suggest adding Nougat OCR into llama.cpp to enable the processing of scientific PDF documents.
This can act as a first step towards adding multimodal models to this project!
Implementation:
It seems that Nougat is based on standard transformer architecture (like Bart and Swin Transformer) and most of the work would be on figuring out how to add the image processing.
Let me know what you think!
P.S.: Love this repo! I hope to add my own retrieval-pretrained transformer at some point to this repo.
generalsvr, yhyu13, liuzl, PredatorIWD, mirek190 and 8 more
Metadata
Metadata
Assignees
Labels
help wantedNeeds help from the communityNeeds help from the communitymodelModel specificModel specific