Skip to content

Error when sampling using HMA after changing the default distributions and grandparent - parent - child table relationship #2606

@rwedge

Description

@rwedge

Environment Details

Please indicate the following details about the environment in which you found the bug:

  • SDV version: 1.24
  • Python version: 3.12
  • Operating System: darwin

Error Description

When creating an HMA Synthesizer with a dataset that includes a grandparent-parent-child relationship, changing the default distribution of the table synthesizers to 'norm' or 'uniform' causes an error when sampling

Steps to reproduce

Example

data = {
    'child': pd.DataFrame({'id': range(10), 'parent_id': range(10)}),
    'parent': pd.DataFrame({
        'id': range(10),
        'grandparent_id': range(10),
        'categories': list(np.random.choice(['T', 'F'], size=10)),
    }),
    'grandparent': pd.DataFrame({'id': range(10)}),
}
metadata = Metadata.load_from_dict({
    'tables': {
        'child': {
            'primary_key': 'id',
            'columns': {'id': {'sdtype': 'id'}, 'parent_id': {'sdtype': 'id'}},
        },
        'parent': {
            'primary_key': 'id',
            'columns': {
                'id': {'sdtype': 'id'},
                'grandparent_id': {'sdtype': 'id'},
                'categories': {'sdtype': 'categorical'},
            },
        },
        'grandparent': {'primary_key': 'id', 'columns': {'id': {'sdtype': 'id'}}},
    },
    'relationships': [
        {
            'parent_table_name': 'parent',
            'child_table_name': 'child',
            'parent_primary_key': 'id',
            'child_foreign_key': 'parent_id',
        },
        {
            'parent_table_name': 'grandparent',
            'child_table_name': 'parent',
            'parent_primary_key': 'id',
            'child_foreign_key': 'grandparent_id',
        },
    ],
    'METADATA_SPEC_VERSION': 'V1',
})
synthesizer = HMASynthesizer(metadata)

synthesizer.set_table_parameters('parent', {'default_distribution': 'norm'})
synthesizer.fit(data)
synthesizer.sample(1)

Traceback

sdv/multi_table/base.py:675: in sample
    sampled_data = self._sample(scale=scale)
sdv/sampling/hierarchical_sampler.py:324: in _sample
    self._sample_children(table_name=table, sampled_data=sampled_data, scale=scale)
sdv/sampling/hierarchical_sampler.py:211: in _sample_children
    self._add_child_rows(
sdv/sampling/hierarchical_sampler.py:101: in _add_child_rows
    sampled_rows = self._sample_rows(child_synthesizer, num_rows)
sdv/sampling/hierarchical_sampler.py:76: in _sample_rows
    return synthesizer._sample_batch(round(num_rows), keep_extra_columns=True)
sdv/single_table/base.py:960: in _sample_batch
    sampled, num_valid = self._sample_rows(
sdv/single_table/base.py:863: in _sample_rows
    raw_sampled = self._sample(num_rows)
sdv/single_table/copulas.py:194: in _sample
    return self._model.sample(num_rows, conditions=conditions)
../Copulas/copulas/utils.py:50: in wrapper
    return function(self, *args, **kwargs)
../Copulas/copulas/multivariate/gaussian.py:299: in sample
    output[column_name] = univariate.percent_point(cdf)
../Copulas/copulas/univariate/base.py:593: in percent_point
    return self.MODEL_CLASS.ppf(U, **self._params)
>       args, loc, scale = self._parse_args(*args, **kwds)
E       TypeError: _parse_args() got an unexpected keyword argument 'a'

../miniconda3/envs/sdv/lib/python3.12/site-packages/scipy/stats/_distn_infrastructure.py:2293: TypeError

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions