Skip to content

[FEATURE]: Constraints support between columns #532

@spreeni

Description

@spreeni

Feature Summary

Allow declaration of constraints within columns or across columns that are adhered to in synthesis.

Problem and Solution

Currently, I believe it is probabilisticly possible that during data synthesis

  • an end_time gets sampled before a start_time
  • a country + city combination does not match
  • increments are not adhering to a standard (e.g. increments of 10m)
  • a number is outside of a sensible range (e.g. negative age)

I have moved to mostly ai from SDV due to the permissive license and promising performance and SDV offers column constraints for this:
https://docs.sdv.dev/sdv/concepts/constraint-augmented-generation-cag/predefined-constraints

I think some of these are less troublesome than others and can generally be avoided by oversampling and only selecting items that adhere to the constraints. However, it would be nice to offer this as a convenience functionality in the mostly SDK already - and maybe there even is a more elegant way to let these flow into the modeling already via the loss function or similar approaches.

Thanks for all the beautiful work and for offering under the current license!

Potential Alternatives

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions