Best practices for extending/customizing the Haystack REST API? #3206
Replies: 2 comments 8 replies
-
Hi @nickchomey thanks for this discussion, there's plenty of interesting pointers here. Let's start with the simple ones. Forking vs. ExtendingThis is not peculiar to Haystack, not even Python - when you want to extend an existing project, either the project is flexible enough to let you do that (like in Wordpress) or you have to fork it and make changes to the core components yourself, taking care of pulling new code from the upstream and ensure your changes keep working. While forking can be handy for a POC or a quick, dirty fix, I would generally leave it as the last resort. In this case for example, FastAPI can be of great help as it supports natively adding new endpoints to existing applications (the support is limited though, so it might now work for you). If FastAPI is not enough for your use case, being Haystack an open source project you can consider contributing the feature yourself. Forking should really come last in my opinion :) Adding new endpoints to rest_api| Disclaimer: I didn't try this code, if this looks interesting we can get deeper. You could create a standalone FastAPI application in a different repo, like this: # file: myapi.py
import uvicorn
from fastapi import FastAPI
from rest_api.utils import get_app
app = FastAPI()
haystack_app = get_app()
my_api.mount("/haystack", haystack_app)
@my_api.get("/search")
def index(pipeline_id: int = 0):
return "Hello from custom search!"
if __name__ == "__main__":
uvicorn.run("myapi:app", host="127.0.0.1", port=8000) With your endpoints mounted on the root and the ones from $ curl http://localhost:8000/search
"Hello from custom search!"
$ curl http://localhost:8000/haystack/health
{"version":"1.6.1rc0","cpu":{"used":0.0},"memory":{"used":1.79},"gpus":[]} Serving multiple pipelinesNow this is interesting and I believe the method above would only bring you so far, Let me know if these points are enough to unblock you! |
Beta Was this translation helpful? Give feedback.
-
Ps. I just stumbled upon this tool that might be worth considering for any REST API work that you do. https://pinferencia.underneathall.app/0.2/ It uses fastapi and uvicorn, so it's the same foundation, and seems to be focused on serving inference models, so perhaps it would allow you to outsource this non-core feature to a purpose-built tool that you don't need to worry about developing or maintaining? Or perhaps it's too simple/doesn't fit in with Haystack pipeline stuff. I'm not knowledgeable enough to answer that, but figured I'd at least share this with you folks. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I would like to use the default REST API as a foundation for my application. By following tutorials, reading the documentation and just exploring the code, I more or less understand how to create custom endpoints, nodes, pipelines etc... in isolation. But, what I am uncertain about is how to actually insert them into the REST API in a way that avoids/minimizes conflicts when merging core Haystack code updates into my project code (the tutorials don't really seem to work/interface with the REST API).
I come from WordPress, where you NEVER touch any core code and instead use "hooks" to call/retrieve your custom code from the appropriate places within the core code, but I haven't yet found any analogue for this in Haystack's REST API. Therefore, I can't figure out how to achieve my goals without modifying core Haystack files, which will invite continual merge conflicts going forward...
I suspect that, at the very least, I could/should create my own
my_application.py
file that gets launched viagunicorn my_application:app...
I could easily add and modify environment variables from there, prior to calling get_app(), get_pipeline(), etc...e.g.
But it isn't clear to me how I can avoid modifying or altogether replacing other Core files.
For example, let's say I want to have two search endpoints that use different query pipelines. Currently there's just one
/query
endpoint, which loads the pipeline that is set byQUERY_PIPELINE_NAME
. In order to have two query endpoints, do I add an additional endpoint directly torest_api/controller/search.py
and have that point to a corresponding environment variable - e.g.QUERY_PIPELINE_NAME_2
? Or do I create a file such asrest_api/controller/search2.py
and then add that torest_api/utils.py
withrouter.include_router(search2.router...)
.Either way, I'm modifying a Core Haystack file.
I suppose I could do something in
my_application.py
likeBut that just seems clunky. It would be best if I could just "hook into", or otherwise insert, search2 into the existing
utils.py
module without modifying the file.Am I missing something very fundamental about how to work with a Python project in general, or Haystack in particular? Should I just be modifying the "core" files? Or should I be treating some (or any/all) of the core files - namely
application.py
andutils.py
- as templates from which to create my own versions?I expect that the answer to all of this is extremely basic/simple, so I would very much appreciate if someone could take a few minutes to point me in the right direction here. Once I can get an understanding of these conceptual/architectural things, I should be able to start making rapid progress with my application, as well as developing a Node for spaCy that I can contribute back to the Haystack project.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions