output base64 encoded images urls (strings) #183
Replies: 2 comments 17 replies
-
I really don't understand that: You can either embed images as base64 strings in the MD text, or store them separately as image files in a folder of your choice. All that could be offered is a callback function (to be provided by the programmer = you) to which the image (and associated metadata) is handed over. In order to streamline your process, I would never recommend to access the internet every time when that hypothetic callback function is invoked.
|
Beta Was this translation helpful? Give feedback.
-
But you can use |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
First, thanks a lot for the amazing work! One suggestion (I'm kinda a beginner here so hopefully I didn't miss something, which then this will be a question of how?):
I'm using pymupdf4llm to extract text with images from pdf files. I would like to avoid saving the images and then process them and encode them to be added to my DB or sent to the LMM (I'm using OpenAI GPT4 via AzureOpenAI API).
I'm aware that there is 'embed_images' boolean which can be used to embed images as base64 encoded strings (urls) to the extracted text. I see that when 'save_image' is called, data, containing the urls, is returned (data = f"data:image/{IMG_EXTENSION};base64," + data) and then added to the md text.
However, this is usually problematic and expensive, because the urls are usually very long and in my case at least, even for a small image it causes the max number of tokens to be hit.
Generally, I think according to MS AzureOpenAI service, it is better and much less expensive to send the encoded images urls using the type "image_url" along with the md text using the type "text".
Therefore, my suggestion is to give the option to return the (encoded) images urls along with the extracted text as separate outputs (e.g. the return of pymupdf4llm.to_markdown is a list of the md text and the images urls).
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions