The aim of mut2
is to provide functions that make mutation models
reversible and evaluate and exemplify the resulting mutation models
To get the last version, install from GitHub as follows:
# First install devtools if needed
if(!require(devtools)) install.packages("devtools")
# Install mut2 from GitHub
devtools::install_github("thoree/mut2")
We first load some libraries:
library(mut2, quietly = T)
library(forrel, quietly = T)
library(pedmut, quietly = T)
library(norSTR, quietly = T)
A simulation experiment was performed to check if the recommendation of
PR transformation could be verified. We used the function
mut::distLRR()
. Allele frequencies and mutation parameters were
simulated as explained in the documentation with specific values shown
below.
nA = 5 #No of alleles. Allele freqs simulated uniformly
verb = F
log = T #log2 transformation of LRR
nsim = 1000
rate = c(0.0001, 0.01) #Value simulated uniformly on this interval, also below
rate2 = 10^{c(-7,-5)}
range = c(0.05,0.5)
mutMod = "equal"
seed = 1729
BA = distLRR(nAls = nA, verbose = verb, method = "BA", mutMod = mutMod,
log2 = log, nsim = nsim, rate = rate, seed = seed)
MH = distLRR(nAls = nA, verbose = verb, method = "MH", mutMod = mutMod,
log2 = log, nsim = nsim, rate = rate, seed = seed)
PR = distLRR(nAls = nA, verbose = verb, method = "PR", mutMod = mutMod,
log2 = log, nsim = nsim, rate = rate, seed = seed)
tab1 = rbind(BA,MH, PR)
mutMod = "stepwise"
BA = distLRR(nAls = nA, verbose = verb, method = "BA", mutMod = mutMod,
log2 = log, nsim = nsim, rate = rate, rate2 = rate2,
range = range, seed = seed)
MH = distLRR(nAls = nA, verbose = verb, method = "MH", mutMod = mutMod,
log2 = log, nsim = nsim, rate = rate, rate2 = rate2,
range = range, seed = seed)
PR = distLRR(nAls = nA, verbose = verb, method = "PR", mutMod = mutMod,
log2 = log, nsim = nsim, rate = rate, rate2 = rate2,
range = range, seed = seed)
tab2 = rbind(BA,MH, PR)
tab = rbind(tab1, tab2)
mutMod | transform | mu | sd | min | pMin | max | pMax |
---|---|---|---|---|---|---|---|
equal | BA | 0.0000420 | 0.0109724 | -0.3472051 | 1.52e-05 | 0.3154393 | 2.32e-05 |
equal | MH | 0.0000487 | 0.0118155 | -0.3432213 | 1.53e-05 | 0.3259195 | 2.32e-05 |
equal | PR | 0.0000303 | 0.0093190 | -0.3632738 | 1.30e-05 | 0.2930584 | 2.05e-05 |
stepwise | BA | 0.0000996 | 0.0170487 | -0.7842868 | 1.35e-05 | 0.7072154 | 1.13e-05 |
stepwise | MH | 0.0001064 | 0.0177835 | -0.7603802 | 1.35e-05 | 0.7845368 | 1.13e-05 |
stepwise | PR | 0.0000460 | 0.0113343 | -0.9413081 | 2.00e-06 | 0.5677689 | 5.70e-06 |
Summary statistics for log2(LRR)
THe PR transformations is superior in most respects: The expected log2(LRR), mu, is closest to zero, the standard deviation SD and the max is smallest. However, the minimum values is also smallest and this is the only point disfavoring PR. However, overall, the exoeriment indicates that PR remains the transformation of choice.
We state that, for the expected heterozygosity, “typical values for multi-allelic forensic markers are in the range (0.60, 0.95)”. This is based on the updated version of the database NorwegianFrequencies:
db = norSTR::norwayDB
H = lapply(db, function(x) 1 - sum(x^2))
range(H)
#> [1] 0.6180022 0.9490541
The mutation matrix is seen not to be stationary from
p = c("1" = 0.2, "2" = 0.8)
M = mutationMatrix("equal", alleles = names(p), rate = 0.003, afreq = p)
M
#> 1 2
#> 1 0.997 0.003
#> 2 0.003 0.997
#>
#> Model: Equal
#> Rate: 0.003
#> Frequencies: 0.2, 0.8
#>
#> Bounded: Yes
#> Stationary: No
#> Reversible: No
#> Lumpable: Always
#> Overall rate: 0.003
The `PR’ transformations preserves the expected mutation rate as claimed
MPR = makeReversible(M, method = 'PR')
MPR
#> 1 2
#> 1 0.992500 0.007500
#> 2 0.001875 0.998125
#>
#> Model: Custom
#> Rate: 0.003
#> Frequencies: 0.2, 0.8
#>
#> Bounded: Yes
#> Stationary: Yes
#> Reversible: Yes
#> Lumpable: Always
#> Overall rate: 0.003
The f value 0.0075/0.003 = 2.5 can alternatively be found from
fratio(M, MPR)
#> [1] 2.5
The mutation rates for the two other transformations are smaller
MMH = makeReversible(M, method = 'MH')
MBA = makeReversible(M, method = 'BA')
c("rateMH2" = attributes(MMH)$rate, "rateBA" = attributes(MBA)$rate)
#> rateMH2 rateBA
#> 0.003 0.003
The last two transformations can be adjusted to have the original mutation rate (output omitted)
MMHstar = adjustRate(MMH, newrate = 0.003)
MBAstar = adjustRate(MBA, newrate = 0.003)
For a SNP marker, the PR-reversible model coincides with the stationary model and therefore we can also find the transformation from
pedmut:::makeStationary(M)
#> 1 2
#> 1 0.992500 0.007500
#> 2 0.001875 0.998125
#>
#> Model: Custom
#> Rate: 0.003
#> Frequencies: 0.2, 0.8
#>
#> Bounded: Yes
#> Stationary: Yes
#> Reversible: Yes
#> Lumpable: Always
#> Overall rate: 0.003
which is an implementation of the PM transformation in Familias (output omitted)
MPM = Familias:::FamiliasLocus(frequencies = p,
allelenames = names(p),
MutationModel = "Equal",
MutationRate = 0.003,
Stabilization = "PM")
We next consider the tables label{tab:comparison1} - label{tab:comparison3} , referred to in Section `Comparing mutation…’.
Table1 = tabfRatio(db = db, rate = 0.001, mutmodel = "equal", relabel = F,
stationary = T, nr = 1)
Table2 = tabfRatio(db = db, rate = 0.001, mutmodel = "onestep", relabel = T,
stationary = F, nr = 2)
Table3 = tabfRatio(db = db, rate = 0.001, rate2 = 0.00001, range = 0.1,
mutmodel = "stepwise", relabel = F, stationary = T, nr = 3)
Regarding “the transformation BA has smallest f for all markers”
table(apply(Table1[,1:3], 1, function(x) (1:3)[which.min(x)]))
Regarding minimum value on diagonal
apply(Table1[,4:6], 2, function(x) min(x))
Regarding “All transformed matrices are bounded”, this is generally true for PR, but not necessarily for the others as adjustment is performed. Below we extract the columns indicating boundedness and show that all are indeed bounded in this case:
foo = tabfRatio(db = db, rate = 0.001, mutmodel = "equal", relabel = F,
stationary = T, nr = 0)
foo = foo[, c("bMH", "bBA")]
table(foo)
Regarding f-values
table(apply(Table2[,1:3], 1, function(x) (1:3)[which.min(x)]))
Regarding minimum diagonal values
apply(Table2[,4:6], 2, function(x) min(x))
Regarding boundedness
foo = tabfRatio(db = db, rate = 0.001, mutmodel = "onestep", relabel = T,
stationary = F, nr = 0)
foo = foo[foo$int == "Y",]
foo = foo[, c("bMH", "bBA")]
table(foo)
foo[foo[,1] == "N",]
int = Table3[Table3$int == "Y",] #integer
f-values and minima on diagonal:
table(apply(int[,1:4], 1, function(x) (1:4)[which.min(x)]))
apply(int[,5:8], 2, function(x) min(x))
Regarding boundedness, here’s a list with at least one of three unbounded:
foo = tabfRatio(db = db, rate = 0.001, rate2 = 0.00001, range = 0.1,
mutmodel = "stepwise", relabel = F, stationary = T, nr = 0)
foo = foo[foo$int == "Y",]
foo = foo[, c("bMH", "bBA", "bDA")]
foo[foo[,1] == "N" | foo[,2] == "N" | foo[,3] == "N" ,]
Code for Figure:
M = lapply(db, function(x){
mat = mutationMatrix("equal",
alleles = names(x),
rate = 0.01,
rate2 = NULL,
range = NULL,
afreq = x)
mat
})
R = lapply(M, function(x){
mat = findReversible(x, method = "MH")
mat = adjustReversible(x, mat)
mat
})
#
db = getFreqDatabase(KLINK::halfsib[[1]])
# Increase nsim
TableLRR = tabLRR(db = db, nsim = 10, seed = NULL, maks = FALSE)
saveRDS(TableLRR, "TableLRR.rds")
navn = c("MH|M", "MH|R", "BA|M", "BA|R", "PR|M", "PR|R","MH|M", "MH|R", "BA|M", "BA|R", "PR|M", "PR|R")
q99 = apply(log2(TableLRR), 2, function(x) quantile(x, probs = 0.99))
q01 = apply(log2(TableLRR) , 2, function(x) quantile(x, probs = 0.01))
pdf("C:\\Users\\theg\\OneDrive - Norwegian University of Life Sciences\\manuscripts\\mutation\\v6\\figures\\TableLRR.pdf", width = 6, height = 6)
boxplot(log2(TableLRR), horizontal = F, names = navn, ylab = "log2(D)")
abline(v = 6.5)
text(1:12, q99, "-", col = "red", cex = 2)
text(1:12, q01, "-", col = "red", cex = 2)
dev.off()