Skip to content

Add losslessmegacode dataset #1

@rombodawg

Description

@rombodawg

I have created a pretty extensive dataset which you have missing from bagel, considering this is suppose to have "everything"

The filtered version is here:
https://huggingface.co/datasets/rombodawg/LosslessMegaCodeTrainingV3_Tiny

For the full unfiltered version use this one if you want to filter and dedupe it yourself:
https://huggingface.co/datasets/rombodawg/LosslessMegaCodeTrainingV3_1.6m_Evol

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions