# 5Lab 1 (with solution)

## 5.1 Exercise

Suppose $Y_{im} \stackrel{\text{ind}}{\sim}\mathop{\mathrm{Poisson}}\left( \mu^*(x_{im}) \right) \quad (i = 1, \ldots, n; m = 1, \ldots, M)\,,$ where \begin{aligned} \mu^*(x_{im}) & = 8 \exp \left(w(x_{im})\right) \,, \\ x_{im} & = x_i = -10 + 20 \frac{i-1}{n-1} \, , \\ w(x) & = 0.001 \left( 100 + x + x^2 + x^3 \right) \, . \end{aligned}

Consider the following simulation study. For $$b = 1, \ldots, B$$:

• Generate $Y_{im} \stackrel{\text{ind}}{\sim}\mathop{\mathrm{Poisson}}(\mu(x_{im})) \quad (i=1,\ldots,n; m = 1, \ldots, M)$

• Compute the AIC and BIC of the candidate models $Y_{im} \stackrel{\text{ind}}{\sim}\mathop{\mathrm{Poisson}}(\mu(x_{im})), \quad \mu(x_{im}) = \exp \left(\sum_{j=1}^p \beta_j x^{j-1}_{im} \right) \,,$ for $$p = 1, \ldots, p_{\max}$$.

You can carry out the simulation study for $$n = 200$$, $$M = 3$$, $$p_{\max} = 20$$, and $$B = 100$$ with the following code:

B <- 100
n <- 200
M <- 3
pmax <- 20

w <- function(x) {
0.001 * (100 + x + x^2 + x^3)
}

mu <- function(x) {
8 * exp(w(x))
}

## Covariates
x <- rep(seq(from = -10, to = 10, length = n), each = M)

## Objects to hold AICs and BICs
aics <- bics <- matrix(NA, nrow = B, ncol = pmax)

set.seed(20240416)
for (b in 1:B) {
## Simulate responses
y <- rpois(n = M * n, lambda = mu(x))
## Fit intercept-only model and compute AIC, BIC
mod <- glm(y ~ 1, family = poisson)
aics[b, 1] <- AIC(mod)
bics[b, 1] <- BIC(mod)
## Fit remaining models and compute AIC, BIC
for(p in 2:pmax) {
modp <- glm(y ~ poly(x, p - 1), family = poisson)
aics[b, p] <- AIC(modp)
bics[b, p] <- BIC(modp)
}
}

The number of times each $$p$$ has been selected by AIC and BIC is

AICorder <- apply(aics, 1, which.min)
BICorder <- apply(bics, 1, which.min)
table(AICorder)
AICorder
4  5  6  7  8  9 10 13
78 10  3  5  1  1  1  1 
table(BICorder)
BICorder
4  5
98  2 

The AIC and BIC values for each sample as a function of $$p$$ are

par(mfrow = c(1, 2))
matplot(x = 1:pmax, t(aics), xlab = "p", ylab = "AIC", type = "l", lty = 1, col = gray(0.9))
matplot(x = 1:pmax, t(bics), xlab = "p", ylab = "BIC", type = "l", lty = 1, col = gray(0.9))
1. Modify the code above to investigate the performance of AIC and BIC as model selection criteria for $$n \in \{25, 50, 100, 1000\}$$.

[Suggestion: write a function that carries out the simulation for user-supplied $$n$$, $$M$$, $$p_{\max}$$, $$B$$, and the function $$w(\cdot)$$.]

2. Use $w(x) = \frac{1.2}{1 + \exp(-x)} \, ,$ in the model we simulate from. How do AIC and BIC perform when the candidate models do not include the simulation model?

3. What information criterion would you use to estimate $$\Delta$$ in (1.2) and why? Use the simulation results to obtain a simulation-based estimate of $$\Delta$$ as a function of $$p$$, for $$n \in \{25, 50, 100, 1000\}$$, and each choice of $$w(\cdot)$$.

Then, estimate $$\Delta$$ directly by its definition using out-of-sample log-likelihood.

[Hint: for the out-of-sample log-likelihood you can do sum(dpois(y_plus, mu_hat, log = TRUE)), where mu_hat are the fitted means from a training sample and y_plus is an independent sample of the same size and at exactly the same $$x$$ values as the training sample.]

## 5.2 Solution

Since we are planning to explore various experimental conditions for the simulation study, it is a good idea to write a general function that allows us to vary $$n$$, $$M$$, $$p_{\max}$$, $$B$$, and the function $$w(.)$$.

run_simulation <- function(n, M = 3, pmax = 20, B = 100,
w =  function(x) 0.001 * (100 + x + x^2 + x^3)) {
mu <- function(x) {
8 * exp(w(x))
}
x <- rep(seq(from = -10, to = 10, length = n), each = M)
## Objects to hold AICs and BICs
aics <- bics <- matrix(NA, nrow = B, ncol = pmax)
for (b in 1:B) {
## Simulate responses
y <- rpois(n = M * n, lambda = mu(x))
## Fit intercept-only model and compute AIC, BIC
mod <- glm(y ~ 1, family = poisson)
aics[b, 1] <- AIC(mod)
bics[b, 1] <- BIC(mod)
## Fit remaining models and compute AIC, BIC
for(p in 2:pmax) {
modp <- glm(y ~ poly(x, p - 1), family = poisson)
aics[b, p] <- AIC(modp)
bics[b, p] <- BIC(modp)
}
}
list(AIC = aics, BIC = bics)
}

We can also write functions that take as input the output of run_simulation() and summarize the results, as is done in the code provided.

## Returns the number of times each $p$ has been selected by the
## criterion used to compute ics
get_selection_counts <- function(ics) {
ic_order <- apply(ics, 1, which.min)
table(ic_order)
}
## Plots the information criterion values in ics for each sample as a
## function of $p$ are
plot_ic <- function(ics, ylab = "IC", main = NULL) {
matplot(x = 1:pmax, t(ics), xlab = "p", ylab = ylab, main = main, type = "l", lty = 1, col = gray(0.9))
}

So, for the simulation study with for $$n = 200$$, $$M = 3$$, $$p_{\max} = 20$$, and $$B = 100$$, we can now simply do (using the same seed)

set.seed(20240416)

res <- run_simulation(n = 200, M = 3, pmax = 20, B = 100, w = function(x) 0.001 * (100 + x + x^2 + x^3))

Then, for example, the results for AIC are

get_selection_counts(res$AIC) ic_order 4 5 6 7 8 9 10 13 78 10 3 5 1 1 1 1  plot_ic(res$AIC, ylab = "AIC")

The code chunk below will run the required simulations and store the results in the list res_true.

set.seed(1)

ns <- c(25, 50, 100, 1000)
res_true <- as.list(numeric(length(ns)))
for (j in 1:length(ns)) {
res_true[[j]] <- run_simulation(n = ns[j], M = 3, pmax = 20, B = 100,
w = function(x) 0.001 * (100 + x + x^2 + x^3))
}
names(res_true) <- paste("n =", ns)

Let’s investigate the behaviour of AIC.

sapply(res_true, function(x) get_selection_counts(x$AIC)) $n = 25
ic_order
4  5  6  7  8  9 10 18
67 11 11  2  5  1  1  2

$n = 50 ic_order 4 5 6 7 8 9 11 13 70 7 10 5 3 1 3 1$n = 100
ic_order
4  5  6  7  8  9 18
72 11  8  5  2  1  1

$n = 1000 ic_order 4 5 6 7 8 9 10 11 69 11 10 4 2 2 1 1  AIC behaves similarly for all $$n$$. In all cases, the correct (cubic) model is preferred most of the time, but the probability of it being selected does not tend to one as $$n$$ grows. This is also apparent in the the AIC vs $$p$$ plots par(mfrow = c(2, 2)) for (j in 1:length(ns)) { plot_ic(res_true[[j]]$AIC, ylab = "AIC", main = paste("n =", ns[j]))
abline(v = 4, lty = 3)
}

Let’s explore the behaviour of BIC.

sapply(res_true, function(x) get_selection_counts(x$BIC)) $n = 25
ic_order
2  3  4  5  6
1  1 93  4  1

$n = 50 ic_order 4 5 99 1$n = 100
ic_order
4  5
97  3

$n = 1000 ic_order 4 100  In this example, the set of candidate models includes the model we simulate from, and BIC selects the correct model more and more often as $$n$$ increases. This is also apparent in the BIC vs $$p$$ plots. par(mfrow = c(2, 2)) for (j in 1:length(ns)) { plot_ic(res_true[[j]]$BIC, ylab = "BIC", main = paste("n =", ns[j]))
abline(v = 4, lty = 3)
}
1. The code chunk below will run the required simulations and store the results in the list res_other.
set.seed(1)

ns <- c(25, 50, 100, 1000)
res_other <- as.list(numeric(length(ns)))
for (j in 1:length(ns)) {
res_other[[j]] <- run_simulation(n = ns[j], M = 3, pmax = 20, B = 100,
w = function(x) 1.2 / (1 + exp( - x)))
}
names(res_other) <- paste("n =", ns)

Let’s investigate the behaviour of AIC.

sapply(res_other, function(x) get_selection_counts(x$AIC)) $n = 25
ic_order
4  5  6  7  8  9 10 11 12 13 14 15 16 20
9  2 33  8 19  5  7  3  4  2  2  4  1  1

$n = 50 ic_order 4 6 7 8 9 10 11 12 13 14 15 16 17 18 20 1 31 5 22 10 8 5 4 1 3 3 3 1 2 1$n = 100
ic_order
6  7  8  9 10 11 12 13 16 17 18
15  3 33 10 28  2  3  1  1  2  2

$n = 1000 ic_order 10 11 12 13 14 15 16 17 18 20 13 5 33 6 21 5 8 1 7 1  As $$n$$ increases, AIC tends to select increasingly complex models, which provide a better approximation to the true distribution which generated the data, which is not a polynomial model. On the other hand, BIC prefers simpler models to AIC, although it still tends to prefer more complex models as $$n$$ increases in this case. sapply(res_other, function(x) get_selection_counts(x$BIC))
$n = 25 ic_order 4 5 6 7 8 10 13 41 4 43 5 5 1 1$n = 50
ic_order
4  5  6  7  8  9 10
18  2 59  9 10  1  1

$n = 100 ic_order 6 7 8 10 55 6 35 4$n = 1000
ic_order
8 10 11 12 13
13 76  3  7  1 
1. We have shown that $\bar\ell(\hat\theta) - p / (n M) = -\frac{1}{2nM} \{2 (p - \ell(\hat\theta))\} = -\frac{1}{2nM} AIC$ is a bias-corrected estimator of $$\Delta$$.

So, $\tilde{\Delta} = - \frac{1}{2nM} \mathop{\mathrm{E}}(AIC) \, ,$ should be approaching $$\Delta$$ as $$n$$ increases. We can estimate $$\tilde{\Delta}$$ empirically by transforming the AIC values we obtained in the simulations, and taking averages for every $$p$$.

A direct, simulation-based estimator of $$\Delta$$ can be obtained by its definition, using independent samples from the ones that are used for estimation. The code chunk below provides a function that returns a simulation-base estimator of $$\Delta$$ as a function of $$p$$.

estimate_Delta <- function(n, M = 3, pmax = 20, B = 100,
w =  function(x) 0.001 * (100 + x + x^2 + x^3)) {
mu <- function(x) {
8 * exp(w(x))
}
x <- rep(seq(from = -10, to = 10, length = n), each = M)
## Object to hold out-of-sample log-likelihood
oo_ll <- matrix(NA, nrow = B, ncol = pmax)
for (b in 1:B) {
## Simulate responses
y <- rpois(n = M * n, lambda = mu(x))
y_plus <- rpois(n = M * n, lambda = mu(x))
## Fit intercept-only model and compute out-of-sample log-likelihood
mod <- glm(y ~ 1, family = poisson)
oo_ll[b, 1] <- sum(dpois(y_plus, fitted(mod), log = TRUE))
## Fit remaining models and compute out-of-sample log-likelihood
for(p in 2:pmax) {
modp <- glm(y ~ poly(x, p - 1), family = poisson)
oo_ll[b, p] <- sum(dpois(y_plus, fitted(modp), log = TRUE))
}
}
colMeans(oo_ll / (n * M))
}

The estimates of $$\Delta$$ for $$w(x) = 0.001 \left( 100 + x + x^2 + x^3 \right)$$ and $$w(x) = 1.2/\{1 + \exp(-x)\}$$ are

set.seed(1)

ns <- c(25, 50, 100, 1000)
Delta_true <- Delta_other <- as.list(numeric(length(ns)))
for (j in 1:length(ns)) {
Delta_true[[j]] <- estimate_Delta(n = ns[j], M = 3, pmax = 20, B = 100,
w = function(x) 0.001 * (100 + x + x^2 + x^3))
Delta_other[[j]] <- estimate_Delta(n = ns[j], M = 3, pmax = 20, B = 100,
w = function(x) 1.2 / (1 + exp( - x)))

}
names(Delta_true) <- names(Delta_other) <- paste("n =", ns)

For $$w(x) = 0.001 \left( 100 + x + x^2 + x^3 \right)$$, we get

cols <- hcl.colors(2)
par(mfrow = c(2, 2))
for (j in 1:length(ns)) {
Delta_t <- - res_true[[j]]$AIC / (2 * ns[j] * M) plot_ic(Delta_t, ylab = "-AIC / (2nM)", main = paste("n =", ns[j])) lines(1:pmax, colMeans(Delta_t), col = cols[1]) lines(1:pmax, Delta_true[[j]], col = cols[2]) } and for $$w(x) = 1.2/\{1 + \exp(-x)\}$$, we get par(mfrow = c(2, 2)) for (j in 1:length(ns)) { Delta_t <- - res_other[[j]]$AIC / (2 * ns[j] * M)
plot_ic(Delta_t, ylab = "-AIC / (2nM)", main = paste("n =", ns[j]))
lines(1:pmax, colMeans(Delta_t), col = cols[1])
lines(1:pmax, Delta_other[[j]], col = cols[2])
}

The purple piecewise linear functions are the simulation-based estimates of $$\Delta$$ based on AIC, and the yellow piecewise linear functions are the simulation-based estimates of $$\Delta$$ based on its definition. In the current setting, we can see that the two simulation-based estimates are almost identical in value.

The fact that BIC does not estimate $$\Delta$$ can now be made directly apparent. For example, for $$w(x) = 1.2/\{1 + \exp(-x)\}$$,

par(mfrow = c(2, 2))
for (j in 1:length(ns)) {
plot_ic(- res_other[[j]]\$BIC / (2 * ns[j] * M), ylab = "-BIC / (2nM)", main = paste("n =", ns[j]))
lines(1:pmax, Delta_other[[j]], col = cols[1])
abline(v = 4, lty = 3)
}