You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the autotrain-advanced video at 11:20 Abhishek states that for llm training, the name of the csv-file with the training data is the train-split.
So i thought that i could name my file training-data.csv and supply --train-split training-data.
This DID create a autotrain-folder with some files in it, but it then crashes with this error:
File "/Users/heiko/.pyenv/versions/3.10.13/lib/python3.10/site-packages/autotrain/trainers/common.py", line 87, in wrapper
return func(*args, **kwargs)
File "/Users/heiko/.pyenv/versions/3.10.13/lib/python3.10/site-packages/autotrain/trainers/clm/__main__.py", line 105, in train
train_data, valid_data = process_input_data(config)
File "/Users/heiko/.pyenv/versions/3.10.13/lib/python3.10/site-packages/autotrain/trainers/clm/__main__.py", line 42, in process_input_data
train_data = load_from_disk(config.data_path)[config.train_split]
File "/Users/heiko/.pyenv/versions/3.10.13/lib/python3.10/site-packages/datasets/dataset_dict.py", line 59, in __getitem__
return super().__getitem__(k)
KeyError: 'training-data'
the only way i could start autotrain was by not providing --train-split and naming my file train.csv.
I'm not sure if i misunderstand what --train-split is supposed to do, or if it is broken somehow.
It seems to currently be impossible to train an llm when providing --train-split. Is that expected behaviour?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
In the autotrain-advanced video at 11:20 Abhishek states that for llm training, the name of the csv-file with the training data is the train-split.
So i thought that i could name my file
training-data.csv
and supply--train-split training-data
.This DID create a autotrain-folder with some files in it, but it then crashes with this error:
the only way i could start autotrain was by not providing
--train-split
and naming my filetrain.csv
.I'm not sure if i misunderstand what
--train-split
is supposed to do, or if it is broken somehow.It seems to currently be impossible to train an llm when providing
--train-split
. Is that expected behaviour?Beta Was this translation helpful? Give feedback.
All reactions