-
Notifications
You must be signed in to change notification settings - Fork 52
Description
Feature Summary
Allow declaration of constraints within columns or across columns that are adhered to in synthesis.
Problem and Solution
Currently, I believe it is probabilisticly possible that during data synthesis
- an end_time gets sampled before a start_time
- a country + city combination does not match
- increments are not adhering to a standard (e.g. increments of 10m)
- a number is outside of a sensible range (e.g. negative age)
I have moved to mostly ai from SDV due to the permissive license and promising performance and SDV offers column constraints for this:
https://docs.sdv.dev/sdv/concepts/constraint-augmented-generation-cag/predefined-constraints
I think some of these are less troublesome than others and can generally be avoided by oversampling and only selecting items that adhere to the constraints. However, it would be nice to offer this as a convenience functionality in the mostly SDK already - and maybe there even is a more elegant way to let these flow into the modeling already via the loss function or similar approaches.
Thanks for all the beautiful work and for offering under the current license!
Potential Alternatives
No response
Additional Context
No response