-
Notifications
You must be signed in to change notification settings - Fork 370
Description
Problem Description
In the process of using synthetic data, there may be several different SDV synthesizers that are created. Each synthesizer could have a different AI algorithm, for example, GaussianCopulaSynthesizer, CTGANSynthesizer, etc. At a later point, I may want to load in the saved synthesizer files into SDV in order to sample synthetic data from them.
When loading a synthesizer, I'm expected to call the class name's load function with the filename. For example:
synthesizer = GaussianCopulaSynthesizer.load(filepath='synthesizer/synthesizer_v1.pkl')
But the problem is that I may not know the exact class that was used to create/fit the synthesizer -- either because (a) I don't remember/haven't kept track of it, or (b) it was provided to me by someone else.
It would be nice if there were a single, streamlined way to load in an already-saved synthesizer, that does not require me to know which exact synthesizer class it is.
Expected behavior
Create a function called load_synthesizer
in the utils
module. This functions should behave exactly like the <synthesizer_class>.load
function today -- except it should work with any SDV synthesizer class.
from sdv.utils import load_synthesizer
synthesizer = load_synthesizer(filepath='synthesizer/synthesizer_v1.pkl')
Backwards compatibility: We should keep the individual load
functions for each synthesizer, but raise a FutureWarning
that recommends using the utils.load_synthesizer
function instead.
FutureWarning: The 'load' function be deprecated in future versions of SDV. Please use 'utils.load_synthesizer' instead.
How do I know what kind of synthesizer it is? If the user needs to know which type of synthesizer this is, they can always print it out.
print(synthesizer)
<sdv.single_table.copulas.GaussianCopulaSynthesizer object at 0x7bd96997d0e0>
Workaround
If someone is stuck on this, you can just use cloudpickle
to load in any synthesizer object right now. Once loaded, sampling should work no matter what kind of synthesizer this is.
import cloudpickle
filepath = 'synthesizer/synthesizer_v1.pkl'
with open(filepath, 'rb') as f:
synthesizer = cloudpickle.load(f)
synthetic_data = synthesizer.sample(num_rows=10)
Additional Context
Confusingly, all the load
functions do the same thing right now anyways regardless of the class name. So you could technically use any class to load in any synthesizer... and it should still work.
from sdv.multi_table import HMASynthesizer
synthesizer = HMASynthesizer.load('synthesizers/synthesizer_v1.pkl')
print(synthesizer)
<sdv.single_table.copulas.GaussianCopulaSynthesizer object at 0x7bd96997d0e0>