-
Notifications
You must be signed in to change notification settings - Fork 370
Open
Labels
bugSomething isn't workingSomething isn't workingfeature:samplingRelated to generating synthetic data after a model is builtRelated to generating synthetic data after a model is built
Milestone
Description
Problem Description
If a user tries to conditionally sample before fitting the model, the error message surface to the user isn't very helpful.
While we improved the error message when a user does this with regular sampling, we didn't for when they try to conditionally sample.
Current error message:
# Scenario 1 (output_file_path not provided)
NotFittedError: Error: Sampling terminated. No results were saved due to unspecified "output_file_path".
# Scenario 2 (output_file_path provided)
sdv.data_processing.errors.NotFittedError: Error: Sampling terminated. Partial results are stored in C:\User\GaussianCopulaSample.csv.
Code to Reproduce
from sdv.datasets.demo import download_demo
from sdv.single_table import GaussianCopulaSynthesizer
from sdv.sampling import Condition
data, metadata = download_demo(
modality='single_table',
dataset_name='fake_hotel_guests'
)
synthesizer = GaussianCopulaSynthesizer(metadata)
condition0 = Condition(num_rows=454, column_values={"room_type":"BASIC"})
condition1 = Condition(num_rows=455, column_values={"room_type": "DELUXE"})
synthesizer.sample_from_conditions(max_tries_per_batch=100000, batch_size=1000, conditions=[condition0, condition1])
Expected Behavior
Instead of attempting to sample, the single table synthesizer should check first whether the synthesizer has been fitted. If it has not been fitted, we should proactively show a SamplingError
explaining to the user what they must do.
SamplingError: This synthesizer has not been fitted. Please fit your synthesizer first before
conditionally sampling synthetic data.
npatki
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingfeature:samplingRelated to generating synthetic data after a model is builtRelated to generating synthetic data after a model is built