Replies: 3 comments 3 replies
-
If the server is creating a thread for every request and reusing the same llama objects in all of them ( |
Beta Was this translation helpful? Give feedback.
-
I found and fixed this bug in my local fork. The problem is that mutex handling in the chunked response path is wrong.
My fix was to detach the mutex from the unique_lock using ::release and manually unlock it when the chunked content provider finished. set_chunked_content_provider takes an optional parameter for a callback when the stream is finished for any reason. I hooked into that. This fix isn't technically safe since one should generally unlock a mutex from the thread that locked it. httplib processes a request from start to finish in the same thread, so that's working. Albeit this could break horribly if that behavior changes upstream. I'm seeing all manner of odd behavior in the llama.cpp server code. I'm gonna do further investigations to get to the bottom of what I'm seeing before I submit a PR cleaning things up. I'm in need of a production-quality server implementation wrapping llama.cpp. I'm on the fence on whether I should refactor what's here or make a custom protocol so I can more easily communicate between llama.cpp instances and a host application in Golang. But, I digress... the most troubling issue I'm seeing is that token generation behavior seemingly changes based on network performance. I've got a server deployed and, during testing, I used my mobile phone connected over the cell network. A significant percentage of the tokens generated were corrupted or lost to just that client. I'm not sold that it's a network issue. Indeed, it's smelling like another race condition. I'll do a deep dive on this today and figure out how best to resolve it. |
Beta Was this translation helpful? Give feedback.
-
My PR with the fixes: #2391 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi!
I opened two browser tabs and started two concurrent requests. I see ./server crashing 100% of times. Here is gdb bt from the generated core dump:
Is this happening only for me? I'm running latest code.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions