-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
The document (and note) types now supports storing a plain text representation of a document. This is intended for two things:
- FTS
- OCR
- faster access
OGo Obj/C stores the document blobs in the filesystem, though the storage is technically pluggable, we might be able to refer out to e.g. a WebDAV server (or other document providers).
Columns:
text_content
text_content_type
(SMALLINT, 0=plain?, 1=markdown, 2=html, ..., should that be a MIME type?)text_content_object_version
(the version of the document the content relates to)
Those fields should be filled asynchronously, either using a queue or just by cron using a started. It could do various things:
- OCR PDF's and images, e.g. using Tesseract or MarkItDown
- Transcribe audio attachments, e.g. using Whisper
- Generate document thumbnails (where would we store them, as sub-documents, own column?)
- It could also create the ts_vector as part of the update of
text_content
, the related column would have to be created
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request