Replies: 6 comments
-
Thanks for the suggestions! I will incorporate them into my code. MAPE. Because MAPE is calculated on original data, it's more straightforward (10% means the model prediction is fluctuating 10% above or below the real sales). Plug in coefficients of control variables, calculate contribution of each factor and scale to real data. If a media channel is all 0's, I would drop it instead of passing it to the model. You know this channel will contribute 0, why not save some efforts :) In my code, it's okay to have some weeks' cum_effect as 0, but not all zeros. In the dataset, the first a few weeks of mdip_so are 0's, and the log1p transformation works. You mentioned you were doing MMM at campaign level, I would leave a flag on it. MMM is usually done at nationwide, weekly level, and requires 2-3 years' consecutive data. It's okay to have control variables with both positive and negative values, since the control model is additive, no need for log1p transformation. Media variables are non-negative in nature. For normalization, I use mean centralization because 1. I want the model to focus on the trend, not the absolute number; 2. avoid negative values for log1p. BTW, seasonality variables don't have negative values, they are 0 or 1. It's their effects may be either positive or negative. Please see the model specification in section 1.1. I'm building a multiplicative MMM, assuming media effects are multiplicative. In order to transform the multiplicative formula to a linear regression problem, take log on both sides. |
Beta Was this translation helpful? Give feedback.
-
Hi Sibyl! 1 3 Again, thanks a lot Sibyl! |
Beta Was this translation helpful? Give feedback.
-
1
3 |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot Sibyl!!! I don't quite get what you mean by "don't want to extent the lag to cover this issue", what do you want your transformed data to be like? -> In this case, if you want to avoid to have 0 due to long sequence of 0 in your Media real data vector, I think you can set the Lag to "∞" (or better, the length of your df): you can have a cumulative effect always > 0 where from week to week that it gets smaller and smaller, but I think it is not a good approach. |
Beta Was this translation helpful? Give feedback.
-
I agree it's not a good approach. The "Adstock with Varying Length" plot shows that the impact of length is minor, setting it to be 8 weeks/12 weeks/infinite makes little difference. If a channel's spending/impression is trivial, its model result is not trustworthy, needs to be further tested. |
Beta Was this translation helpful? Give feedback.
-
Moved this to discussion. The conversation remains open. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Sibyl,
Hoping you are well.
I have some theoretical questions about the model and some suggestions for the code.
About Marketing Mix Model (model 2):
If I have to benchmark different models, which metrics do you suggest to use? Every time I check for the MAPE and RMSE are always pretty similar.
How can we evaluate the impact of each control variable on the baseline?
In your dataset, your Media is often ON. What happens if we have interspersed Media values with many 0s from campaign to campaign? In this case if our lag_effect willl not sufficient to cover this gap, the cumulative effect will be 0.
For this last problem, I decided to change the STAN code, it works, but I am not sure if it is the right approach.
Here you can find the changed code:
“””// adstock, mean-center, log1p transformation
row_vector[max_lag] lag_weights;
for (nn in 1:N) {
for (media in 1 : num_media) {
for (lag in 1 : max_lag) {
lag_weights[max_lag-lag+1] <- pow(decay[media], (lag - 1 - peak[media]) ^ 2);
}
cum_effect <- Adstock(sub_col(X_media, nn, media, max_lag), lag_weights);
if (cum_effect == 0) {
X_media_adstocked[nn, media] <- 0;
}
else {
X_media_adstocked[nn, media] <- log1p(cum_effect/mu_mdip[media]);
}
}
X <- append_col(X_media_adstocked, X_ctrl);
}
}
“””
Instead of this modification I made, do you suggest other solutions? For example, creating a vector of cumulative effect, transform it to avoid 0 and then pass it through the formula (log1p(cum_effect/mu_mdip[media), could be effective?
How can we deal if we have variables that are negative and positive as, e.g., seasonality (variation from negative to positive). Do you think a transformation as (X-X.min())/(X.max()-X.min()) can be a right approach, or do you suggest to keep the mean transformation?
Why do we need to apply a log_mean_center transformation in the second model?
I think the undermentioned parts can improve the code to avoid some errors.
Control Model (model 1)
I think this solution can help, if we want to use only one variable for one specific beta, otherwise problems can arise.
I suggest transforming every control variable X in the matrix form to obtain the shape.
For example:
pos_vars = pos_vars = [col for col in base_vars if col not in seas_cols] -> here we have more than one variables
X1 = np.matrix(df_ctrl[pos_vars].values).reshape(len(df),(len([pos _vars]) if type(pos_vars) == str else len(pos_vars))))
pn_vars = seas_cols[1] -> we have only one control variable inside
X2 = np.matrix(df_ctrl[pn_vars].values).reshape(len(df),(len([pn_vars]) if type(pn_vars) == str else len(pn_vars)))) –> to mantain coherence shape later with the ctrl_data dictionary
ctrl_data = {
'N': len(df_ctrl),
'K1': X1.shape[1], --> instead of len(pos_vars)
'K2': X2.shape[1], --> instead of len(pn_vars)
'X1': X1,
'X2': X2,
'y': df_ctrl[base_sales'].values,
'max_intercept': min(df_ctrl['total_volume'])
}
In addition, in every sm.sampling() I would add n_jobs=-1 to run faster the code (if it can be helpful).
As always, Sibyl, thank you very much for your help and for the code you published. You are giving a big help to everyone needs it.
Best regards
Beta Was this translation helpful? Give feedback.
All reactions