Taking in large prompts (5000 characters and up) #3704
Replies: 3 comments 10 replies
-
I regularly use large prompts like 10000 characters. You should describe issues that you are encountering in more detail. |
Beta Was this translation helpful? Give feedback.
6 replies
-
I will try that!
Will setting a batch size to 512 still work through all of the tokens? I am a little confused at how that function works under the hood!
…On Oct 20, 2023 at 10:26 PM -0400, shibe2 ***@***.***>, wrote:
Try batch size 512. When you get it working, experiment with different batch sizes.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
2 replies
-
I partially solved this issue by reimplementing the main.cpp file as a swift implementation. This seems to have solved taking prompts up to 4096 tokens. Anything past this context limit results in the LLM failing and returning garbage data. But for the purpose of this library it has been solved. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am working on a project that requires running text generation on prompts of over 5000 characters. Currently loading them into a single tokenization call results in the fail state created in #1881. I believe this is because I am attempting to load significantly more tokens than what is supposed to be loaded at once.
I have looked over the examples provided but I have not found any examples of loading really large prompts. I have done a little bit of research into loading large prompts but I am admittedly very new to this field and was wondering if I could receive some guidance specific to this project.
Through my research I have found techniques like chunking my input into smaller fragments, however I am not sure how I would implement this using the llama.cpp api. I already have actual string fragmentation completed but my question lies in how this would be sent to the model. Is this the right step forward or are there other resources / techniques that I should explore?
Beta Was this translation helpful? Give feedback.
All reactions