create term hold-out scenario. 

Currently when making an `OBNBDataset` object the `splitter` allows the user to specify the _gene_ hold out method using different [partitions](https://github.com/krishnanlab/obnb/blob/main/src/obnb/label/split/partition.py) to create training/validation/testing splits. We would like to be able to hold out all labels associated with a _term_ to evaluate zero-shot learning. 

For now, we are envisioning the two splitting strategies below: 

1. Terms are used to create training, validation and testing splits. The first feature we would like to be able to use is to specify a list of term IDs that should be used for each split. 

2. First we would like to specify the list of term IDs for the training and testing splits, then use the existing methods of gene partitioning (`RandomRatioHoldout` or `RatioHoldout`) to determine which genes are used for training and validation 

![IMG_50FED136EA16-1](https://github.com/user-attachments/assets/e7d22e75-53e6-45f3-9fe5-3b05845cb447)

Eventually we would also like something similar to the `RandomRatioHoldout` to partition terms and ultimately `RatioHoldout` based on different term properties (similar to using `PubMedCount` to partition genes according to study bias).

My priority would be to get both strategies working with a user specified list of term IDs for each split before moving on to the `RandomRatioHoldout` for terms. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

create term hold-out scenario. #498

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

create term hold-out scenario. #498

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions