Integration with Triton inference server #955
JaimeArboleda
started this conversation in
Ideas
Replies: 2 comments
-
@JaimeArboleda Thank you, always great to hear from the community. Two comments:
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Thanks a lot Peter. I will take a look at docling-serve, I did not know about it. It looks very promising. With respect to the second point, I was thinking about a similar thing but for models like tableformer, layout detector, EasyOCR and so on, in this cased served via Triton inference server. Does it make sense? However, maybe if docling-serve is well optimized it will be enough for us. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Question
First of all thank you for this great, great package. Kudos to IBM for creating and open sourcing it.
We are planning to have docling as a kind of service for converting every document in our organization. And we need to handle it efficiently, because we expect a lot of requests. I think now there is GPU support with vanilla pytorch. Would it be possible to, for example, serve the models that do the heavy work by a triton inference server, to increase the conversion speed?
I tried searching for triton and found nothing, so I guess the answer is no, but I would like to know if this idea is at least in the roadmap.
Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions