generated from opentensor/bittensor-subnet-template
-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Labels
discussionIn DiscussionIn Discussion
Description
The current dataset being used for evaluation has the following properties:
- Fetch 2048 samples from historical synthetic dataset
- Fetch 2048 samples from latest 48 hours of synthetic data
- Evaluate on the 4096 samples gathered.
While this setup has been effective at preventing blatant overfitting, there are a few limitations. Mainly, the changing nature of the dataset lends itself to a higher degree of variance when it comes to scoring.
This issue aims to discuss potential solutions and the details of the implementation.
Some potential suggestions from miners:
- Utilize an epoch based dataset that is fixed in nature. A fixed dataset will reduce variance but requires some safeguards against overfitting
- Re evaluate existing models given some cycle. Implementation for this will also depend on interactions with emissions and re-calculating models in the case of "fluke" scores
More recently, the sample size for evaluation has been adjusted as seen here to aid in reducing variance, but there may be other implementations that may be effective as well
Metadata
Metadata
Assignees
Labels
discussionIn DiscussionIn Discussion