It is one of those times that I am really excited and somewhat obsessed with the current research project I am working on. It is an old idea that after many years going around my head, it seems to be coming close to fruition. This is not the reason I am writing though.

Part of that research project involves heavy usage of the Bartlett relations (or Bartlett identities). The Bartlett relations (see Bartlett 1953, sec. 2) are at the core of much of statistical theory and practice. They explain the way that the null expectations of products of log-likelihood derivatives are linked to each other.

For example, you might have heard or been taught that the variance of the maximum likelihood estimator (MLE) can be estimated using the inverse of the expected value of the negative second derivative of the log-likelihood at the maximum likelihood estimates. This is simply due to the, so-called, second Bartlett identity according to which the latter expectation is equal to the variance of the first derivative of the log-likelihood, whose inverse is, in turn, the variance to use based on standard results about the limiting distribution of the MLE.

Let’s introduce some notation. Suppose that Mθ is a continuous family of distributions indexed by a parameter θ, and write logf(y;θ) for the log-likelihood about θ. Then, assuming that we can interchange integration over the sample space and differentiation with respect to the parameter (and a few other assumptions, that are not of essence to report here), taking derivatives on both sides of f(y;θ)dy=1 we obtain that EMθ(ddθlogf(Y;θ))=0, where EMθ is expectation with respect to Mθ. This is the first Bartlett identity. Taking derivatives once more gives EMθ(d2dθ2logf(Y;θ))+EMθ{(ddθlogf(Y;θ))2}=0. This is the second Bartlett identity and is exactly what justifies the use of the inverse of the expected value of the negative second derivative of the log-likelihood at the maximum likelihood estimates as an estimate of the variance of the MLE. We can keep differentiating to get higher-order Bartlett relations.

Denote νrst=EMθ(drdθrlogf(Y;θ)dsdθslogf(Y;θ)dtdθtlogf(Y;θ))r,s,t,{1,2,}. Then, the first Bartlett identity is ν1=0 and the second is ν11+ν2=0. You may be wondering now whether there is any general rule that allows computing higher order Bartlett relations quickly. The answer is yes! Skovgaard (1986) showed that the derivative of any expectation of products of log-likelihood derivatives is simply the sum of all moments that result by increment of each unique available index by 1, and then completion of the set of available indices with 1. For example, ddθν11=ν12+ν21+ν111=2ν12+ν111. We can now compute Bartlett relations of arbitrary order! But, the expressions become cumbersome quickly, and collecting terms is quickly a daunting exercise (remember I need to go to quite high orders for this project I am working on). There are books with those relations but I wanted something that I can reproduce on my laptop and also extend.

So, I wrote an R function (differentiate_moment) that implements the differentiation rule I just mentioned. It takes as input the vector of indices and applies Skovgaard’s theorem to return a list with the indices of each term in the sum that is the derivative. The following code chunk illustrates

differentiate_moment <- function(index = 1) {
    n_terms <- length(index) + 1
    terms <- as.list(rep(NA, n_terms))
    for (j in seq.int(n_terms - 1)) {
        current_index <- index
        current_index[j] <- index[j] + 1
        terms[[j]] <- current_index
    }
    terms[[n_terms]] <- c(index, 1)
    lapply(terms, function(x) sort(x))
}

(der <- differentiate_moment(c(1,1)))
## [[1]]
## [1] 1 2
## 
## [[2]]
## [1] 1 2
## 
## [[3]]
## [1] 1 1 1

I actually wrote a function so that I can get the derivatives ready for input in a LaTeX-capable document.

prettify <- function(terms, equals = NA) {
    terms <- lapply(terms, paste, collapse = "")
    tab <- table(unlist(terms))
    freq <- as.character(unname(tab))
    freq[freq == 1] <- ""
    str <- paste(paste0(freq, "\\nu_{", names(tab), "}"), collapse = " + ")
    if (is.na(equals)) {
        paste("$$", str, "$$", sep = "")
    }
    else {
        paste("$$", paste(str, "=", equals), "$$", sep = "")
    }
}
cat(prettify(der))

ν111+2ν12

Now that we have the means to differentiate null moments, the computation of the Bartlett relations can happen in an elegant way using a recursion structure! The code chunk below implements this recursion.

bartlett_identity <- function(order = 1, index_list = list(1)) {
    if (order == 1) {
        return(index_list)
    }
    out <- lapply(index_list, differentiate_moment)
    if (order > 1) {
        order <- order - 1
        bartlett_identity(unlist(out, recursive = FALSE), order = order)
    }
    else {
        unlist(out, recursive = FALSE)
    }
}

With bartlett_identiy we can now compute all Bartlett relations up to a pre-specified order. The chunk of code below does it up to order 6

for (o in seq.int(6))
    cat(prettify(bartlett_identity(order = o), equals = 0), "\n")

ν1=0 ν11+ν2=0 ν111+3ν12+ν3=0 ν1111+6ν112+4ν13+3ν22+ν4=0 ν11111+10ν1112+10ν113+15ν122+5ν14+10ν23+ν5=0 ν111111+15ν11112+20ν1113+45ν1122+15ν114+60ν123+6ν15+15ν222+15ν24+10ν33+ν6=0

I am sure there are faster and more advanced ways for deriving a Bartlett relation of given order (see, for example, the excellent book by McCullagh 1987), but the above was just neat to develop and… fun! The code above can also be further optimised by simply grouping the terms that are the same at the level of differentiate_moment.

References

Bartlett, M. S. 1953. “Approximate Confidence Intervals.II. More Than One Unknown Parameter.” Biometrika 40 (3/4): 306–17. http://www.jstor.org/stable/2333349.
McCullagh, P. 1987. Tensor Methods in Statistics. London: Chapman; Hall.
Skovgaard, Ib. 1986. “A Note on the Differentiation of Cumulants of Log Likelihood Derivatives.” International Statistical Review 54 (1): 29–32. http://www.jstor.org/stable/1403256.