Skip to content

Discussion: is get_features_name_out something missing or on purpose? #1540

@MarieSacksick

Description

@MarieSacksick

Hello :) !

I am trying to access the features names outside at the end of a pipeline where I use a GapEncoder and a DropCol, because I need them to plot feature importance with feature names beside.

My question is: is it meant to be implemented at some point? Should I use a workaround instead of get_features_name_out?

code snippets:

from skrub.datasets import fetch_employee_salaries
from sklearn.compose import make_column_transformer
from skrub import GapEncoder

datasets = fetch_employee_salaries()
X, y = datasets.X[:500], datasets.y[:500]

preprocessing = make_column_transformer(
    (GapEncoder(n_components=100), "division"),
)

preprocessing.fit(X)
preprocessing.get_feature_names_out()

and for dropcols:

from skrub.datasets import fetch_employee_salaries
from sklearn.compose import make_column_transformer
from skrub import DropCols
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

datasets = fetch_employee_salaries()
X, y = datasets.X[:500], datasets.y[:500]

model = make_pipeline(DropCols(['gender', 'department', 'department_name', 'division',
       'assignment_category', 'employee_position_title', 'date_first_hired']), LinearRegression())

model.fit(X, y)
model.get_feature_names_out()

They both output an error.
Thank you :)!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions