Handling NaNs from ElementProperty

When using an `ElementProperty` featurizer, some elemental data may not be present, e.g., the bulk modulus of Ga is not in the Ga `Element` of pymatgen. In such a case, the featurizer will return a NaN.
There are different ways to handle such a case, and for now it is left to the user to handle it in whatever way they prefer. Basically, the possible approaches there would be:
- ignore such a feature entirely, which can be a pity if only a small fraction of the dataset presents a NaN for this feature
- replace the feature by a constant value, either one that is completely outside the range of possibilities, or the mean of the feature over the dataset. The former has the advantage that it is simple to set in place, while for the latter the mean of the feature should be stored somewhere to be re-used in the case the feature is NaN for a new prediction (that may not have access to the original dataset).
I believe that these two possibilities could be implemented in matminer as some kind of post-processing step, that could be used by the user or not. This is arguable because it could be left to the user to handle these. 

 I see another possibility that could be implemented in matminer and that the user has no easy way to do. The `ElementProperty` could, when a value for an element is not found in the data, replace it by the mean of the values for all other elements. This is different from using the mean of the feature over the total dataset, in the sense that it is not biased by the dataset (the user could want one or the other), and that nothing is to be done for new predictions since this treatment is internal to the featurizer. The `ElementProperty` featurizer would not return NaNs for missing-data reasons. This could be triggered with an optional argument, e.g., `ElementProperty(missing_is_mean=True)`. 

If you think this is a good addition to matminer, I would be happy to submit a PR with whichever solution you think is best. I would actually be in favor of implementing all of them to leave the choice to the user, but make the users life easier. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handling NaNs from ElementProperty #898

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Handling NaNs from ElementProperty #898

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions