Open
Description
Generating a dataframe with features generated by dmatrices may include unallowed characters that result to ValueError.
MRE modified from patsy documentation
>>> from patsy import dmatrices, demo_data
>>>data = demo_data("a", "b", "x1", "x2", "y", "z column")
>>>y,X = dmatrices("y ~ x1 + a:x2", data, return_type='dataframe')
>>>X
Intercept x1 a[a1]:x2 a[a2]:x2
0 1.0 1.764052 -0.103219 -0.000000
1 1.0 0.400157 0.410599 0.000000
2 1.0 0.978738 0.000000 0.144044
3 1.0 2.240893 0.000000 1.454274
4 1.0 1.867558 0.761038 0.000000
5 1.0 -0.977278 0.121675 0.000000
6 1.0 0.950088 0.000000 0.443863
7 1.0 -0.151357 0.000000 0.333674
This results to :
model = xgb.XGBRegressor()
model = model.fit(X, y)
ValueError: feature_names may not contain [, ] or <`
What is the real reason that these characters are not allowed ? It is not the case with the rest of scikit-learn compat packages