Explaining variable importance to a physicist #458
-
Hi all, I'm a spatial ecologist on a multidisciplinary project with some atmospheric scientists, where I'm using biomod2 to bring in niche models to look at insect response to weather systems. I've read the CRAN documentation and Thuiller et al. (2009) and I'm pretty sure I understand what it's doing, but I want to check my understanding with the community before I send this paper for review. So the variable importance metric checks the response of predictions (probability of presence) to the variable of interest, after having fixed the values of every other variable (eg. median, mean). Then another random variable is 'shuffled' with the variable of interest, and the response of predictions to this variable is then correlated with the response to the original, using Pearson's. The correlation coefficient is then subtracted from one to give the var.imp. So the more unique the response to the independent variable, the higher its importance. But what does 'shuffle' mean in this case? Thank you for any help you can provide. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
Beta Was this translation helpful? Give feedback.
-
Hello Samuel,
Yes 🙂 Let's say your data was a
The idea is, if looking at variable C, to randomize the values contained in column
When calculating response curves, then it is a bit different in the sense that you don't keep the exact values of your variables but only their ranges and mean (or median for example) values.
Is it clearer that way ? 👀 Maya |
Beta Was this translation helpful? Give feedback.
Hello Samuel,
I'll try and help make things clearer 🙂
For response curves, you want to check how the predicted value evolves in function of one variable.
So when moving along the range of your variable of interest, the other variables are fixed to an average or median value so the variation you see in predicted value is only due to the variation in your variable of interest.
➡️ Here, the principle is the same only in the sense that it is to check the effect of one variable on predicted values.
➡️ But what we want to check here is more : how much a variable has an impact over the predictions.
Here is a …