How should I create a hierarchical dataset? #4854
-
Hello everybody, I'm trying to create a hierarchical dataset and I'd like to hear some opinions on how to do it best. Here's a tree structure diagram that shows what it looks like:
The dataset comprises a collection of EEG recordings from over fifty different people (subjects). Each subject in the dataset has two types of recordings: resting state recordings and motor imagery recordings. The resting state recordings are just a large 2D array, whereas the motor imagery recordings comprise many labeled EEG segments. What I've consideredAt first, I thought about segmenting the dataset using dataset configurations. However, I'd need to create a configuration for each QuestionHow should I go about creating a dataset like this with Huggingface
I'd like to emphasize that, during the training loop, I'll need to fetch the corresponding resting state recordings for any given motor imagery recording I access. In other words, whenever I fetch some motor imagery recording from |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi! IMO it makes the most sense to have one row per subject and, therefore, define the features as follows: features = Features(
{
"subject_name": Value("string"),
"rest": Array2D(...),
"motor_imagery_arrays": Array3D(shape=(None, ...), ...)}), # 3D to represent a list of 2D arrays; Sequence(Array2D(...)) is not supported at the moment
"motor_imagery_labels": Sequence(ClassLabel(...)),
}
) @lhoestq This is one instance where support for |
Beta Was this translation helpful? Give feedback.
Hi! IMO it makes the most sense to have one row per subject and, therefore, define the features as follows:
@lhoestq This is one instance where support for
Sequence(ArrayXD(...))
would be handy