Skip to content

create term hold-out scenario.  #498

@kmanpearl

Description

@kmanpearl

Currently when making an OBNBDataset object the splitter allows the user to specify the gene hold out method using different partitions to create training/validation/testing splits. We would like to be able to hold out all labels associated with a term to evaluate zero-shot learning.

For now, we are envisioning the two splitting strategies below:

  1. Terms are used to create training, validation and testing splits. The first feature we would like to be able to use is to specify a list of term IDs that should be used for each split.

  2. First we would like to specify the list of term IDs for the training and testing splits, then use the existing methods of gene partitioning (RandomRatioHoldout or RatioHoldout) to determine which genes are used for training and validation

IMG_50FED136EA16-1

Eventually we would also like something similar to the RandomRatioHoldout to partition terms and ultimately RatioHoldout based on different term properties (similar to using PubMedCount to partition genes according to study bias).

My priority would be to get both strategies working with a user specified list of term IDs for each split before moving on to the RandomRatioHoldout for terms.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions