Training data format #3805
Unanswered
chiefMarlin
asked this question in
Q&A
Replies: 2 comments
-
I think you can use the train.cpp
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Sadly, just adding |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
Can someone give me some pointers on how the training data gets broken up ?
Example here https://github.com/ggerganov/llama.cpp/tree/master/examples/finetune reads in full blob of text from here
https://raw.githubusercontent.com/brunoklein99/deep-learning-notes/master/shakespeare.txt
But looking at the examples of fine tuning data on huggingface typically training data is supplied in json which is then split by objects that contain input/output.
So in my case i am trying to train mistral 7b model, does anyone know how i would store the data in the training file ?
Would splitting individual examples by newline be sufficient ?
Eg:
Beta Was this translation helpful? Give feedback.
All reactions