-
Notifications
You must be signed in to change notification settings - Fork 1
Edits round 1 #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Edits round 1 #12
Conversation
Changed base branch to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @tgerke - thanks a lot for the initial rounds of edits! See comments inline.
library(risks) # provides riskratio(), riskdiff(), postestimation functions | ||
library(dplyr) # For data handling | ||
library(broom) # For tidy() model summaries | ||
data(breastcancer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we say somewhere what package breastcancer
comes from?
```{r allmodels2} | ||
tidy(fit_all) %>% | ||
select(-statistic, -p.value) %>% | ||
tidy(fit_all) |> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about introducing a dependency on R 4.1. I still see scientists use 4.0 and earlier.
data <- data |> | ||
dplyr::mutate(.clusterid = dplyr::row_number()) |> | ||
dplyr::rename(outc = dplyr::one_of(!!yvar)) |> | ||
tidyr::uncount(outc + 1) |> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The see the elegance in this approach! However, in the spirit of my comment in #10, it would be nice not to rely on tidyverse functions down the road. Let us focus on testing and bug fixes for now.
dplyr::mutate(outc = 0) %>% | ||
dplyr::rename(!!yvar := "outc")) | ||
|
||
data <- data |> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see earlier comment about base R pipe
yvar <- as.character(all.vars(formula)[1]) | ||
|
||
#TODO the following assumes that outc is coded as 0/1. We should throw an | ||
# error when this is not true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100%! How about here:
Lines 216 to 217 in f48f0a0
...) { | |
implausible <- 0.99999 |
... | ||
) | ||
#TODO seeing a pattern: let's make returning a non-converged object | ||
# an internal function that can be used in all cases like this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right - such a function just exists under different names because it returns slightly different object types depending on what model was supposed to be fit.
risks/R/zestimate_risk-utils.R
Lines 68 to 117 in f48f0a0
# Exception handlers | |
possibly_estimate_poisson <- ext_possibly( | |
.f = estimate_poisson, | |
otherwise = return_failure(family = list(family = "poisson"), | |
classname = "robpoisson")) | |
possibly_estimate_duplicate <- ext_possibly( | |
.f = estimate_duplicate, | |
otherwise = return_failure(family = list(family = "binomial"), | |
classname = "duplicate")) | |
possibly_estimate_glm <- ext_possibly( | |
.f = estimate_glm, | |
otherwise = return_failure(family = list(family = "binomial"), | |
classname = NULL)) | |
possibly_estimate_glm_startp <- ext_possibly( | |
.f = estimate_glm, | |
otherwise = return_failure(family = list(family = "binomial"), | |
classname = "glm_startp")) | |
possibly_estimate_glm_startd <- ext_possibly( | |
.f = estimate_glm, | |
otherwise = return_failure(family = list(family = "binomial"), | |
classname = "glm_startd")) | |
possibly_estimate_logbin <- ext_possibly( | |
.f = estimate_logbin, | |
otherwise = return_failure(family = list(family = "binomial", link = "log"), | |
classname = "logbin")) | |
possibly_estimate_addreg <- ext_possibly( | |
.f = estimate_addreg, | |
otherwise = return_failure(family = list(family = "binomial", link = "identity"), | |
classname = "addreg")) | |
possibly_estimate_logistic <- ext_possibly( | |
.f = estimate_logistic, | |
otherwise = return_failure(family = list(family = "binomial", link = "logit"), | |
classname = "logistic")) | |
possibly_estimate_margstd_boot <- ext_possibly( | |
.f = estimate_margstd_boot, | |
otherwise = return_failure(family = list(family = "binomial", link = "logit"), | |
classname = "margstd_boot")) | |
possibly_estimate_margstd_delta <- ext_possibly( | |
.f = estimate_margstd_delta, | |
otherwise = return_failure(family = list(family = "binomial", link = "logit"), | |
classname = "margstd_delta")) |
# TODO the below else is a great example of why they are best avoided | ||
# in favor of explicit "if" conditions. I needed to jump way up to find | ||
# that this is the "else" that happens when estimand is not equal to "rr". | ||
# What else can it be? Let's just spell it clearly here instead of else. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Either "rr"
or "rd"
.
) | ||
} | ||
|
||
#TODO I think duplicate is only valid for RRs? If so we should add a check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happening here already:
Lines 182 to 195 in f48f0a0
riskdiff <- function( | |
formula, | |
data, | |
approach = c( | |
"auto", | |
"all", | |
"robpoisson", | |
"glm", | |
"glm_startp", | |
"glm_cem", | |
"glm_cem_startp", | |
"margstd_boot", | |
"margstd_delta", | |
"legacy"), |
and
Lines 223 to 234 in f48f0a0
if(link == "log") | |
possible_approaches <- as.character(as.list( | |
args(risks::riskratio))$approach)[-1] | |
else | |
possible_approaches <- as.character(as.list( | |
args(risks::riskdiff))$approach)[-1] | |
if(!(approach[1] %in% possible_approaches)) | |
stop(paste0( | |
"Approach '", approach[1], "' is not implemented. ", | |
"Available are: ", | |
paste(possible_approaches, sep = ", ", collapse = ", "), | |
".")) |
#TODO I think duplicate is only valid for RRs? If so we should add a check | ||
if(approach[1] == "duplicate") { | ||
#TODO I think this is supposed to be assigning to the object "fit" or did I | ||
# lose something in the refactor? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 236 to 237 in f48f0a0
fit <- switch( | |
EXPR = approach[1], |
if(approach[1] == "glm_cem") { | ||
#TODO why not just require logbin and addreg as imports? They seem pretty | ||
# fundamental to key functions in this package. Then these error messages could | ||
# be removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally making way through the code base. I'll open the PR and do this iteratively, so that you can review changes per file as we go.