Skip to content

Patsy dmatrices compatibility #5323

Open
@kavvkon

Description

@kavvkon

Generating a dataframe with features generated by dmatrices may include unallowed characters that result to ValueError.

MRE modified from patsy documentation

>>> from patsy import dmatrices, demo_data
>>>data = demo_data("a", "b", "x1", "x2", "y", "z column")

>>>y,X = dmatrices("y ~ x1 + a:x2", data, return_type='dataframe')
>>>X
   Intercept        x1  a[a1]:x2  a[a2]:x2
0        1.0  1.764052 -0.103219 -0.000000
1        1.0  0.400157  0.410599  0.000000
2        1.0  0.978738  0.000000  0.144044
3        1.0  2.240893  0.000000  1.454274
4        1.0  1.867558  0.761038  0.000000
5        1.0 -0.977278  0.121675  0.000000
6        1.0  0.950088  0.000000  0.443863
7        1.0 -0.151357  0.000000  0.333674

This results to :

model = xgb.XGBRegressor()
model = model.fit(X, y)

ValueError: feature_names may not contain [, ] or <`

What is the real reason that these characters are not allowed ? It is not the case with the rest of scikit-learn compat packages

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions