Bibliography
This section provides a curated list of a few landmark papers and books on statistical modelling, covering topics in the current set of notes, and beyond (extra topics include missing data, EM algorithms, mixture models, learning under sparsity, causal inference, graphical models). Of course, given the breadth of the field and the speed at which it it developing, any list like the one below can hardly provide fair coverage, and will be missing many old and recent core developments. Its purpose it to be a helpful starting point for engaging more with key ideas and developments in statistical modelling.
Akaike, H. (1973). Information theory and an extension of the maximum
likelihood principle. In B. N. Petrov & F. Czáki (Eds.), Second
international symposium on information theory (pp. 267–281).
Akademiai Kiadó. https://doi.org/10.1007/978-1-4612-0919-5_38
Albert, J. H. (2007). Bayesian computation with r.
Springer-Verlag. https://doi.org/10.1007/978-0-387-92298-0
Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and
polychotomous response data. Journal of the American Statistical
Association, 88, 669–679. https://doi.org/10.2307/2290350
Best, N., & Thomas, A. (2000). Bayesian graphical models and
software for GLMs. In D. K. Dey, S. K. Ghosh, & B. K.
Mallick (Eds.), Generalized linear models: A Bayesian
perspective (pp. 387–406). Marcel Dekker. https://doi.org/10.1201/9781482293456
Breslow, N. E., & Clayton, D. G. (1993). Appproximate inference in
generalised linear mixed models. Journal of the American Statistical
Association, 88, 9–25. https://doi.org/10.2307/2290687
Burnham, K. P., & Anderson, D. R. (2002). Model selection and
multi-model inference: A practical information theoretic approach
(Second). Springer. https://doi.org/10.1007/978-1-4757-2917-7
Candès, E., & Tao, T. (2007). The Dantzig selector:
Statistical estimation when p
is much larger than n (with
discussion). Annals of Statistics, 35, 2313–2404. https://doi.org/10.1214/009053606000001523
Claeskens, G., & Hjort, N. L. (2008). Model selection and model
averaging. Cambridge University Press. https://doi.org/10.1017/CBO9780511790485
Cowell, R. G., Dawid, A. P., Lauritzen, S. L., & Spiegelhelter, D.
J. (1999). Probabilistic networks and expert systems.
Springer-Verlag. https://doi.org/10.1007/b97670
Crowder, M. J., & Hand, D. J. (1990). Analysis of repeated
measures. Chapman; Hall/CRC. https://doi.org/10.1201/9781315137421
Davison, A. C. (2003). Statistical models. Cambridge University
Press. https://doi.org/10.1017/CBO9780511815850
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum
likelihood from incomplete data via the EM algorithm (with
discussion). Journal of the Royal Statistical Society Series B,
39, 1–38. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x
Diggle, P. J., Heagerty, P., Liang, K.-Y., & Zeger, S. (2002).
Analysis of longitudinal data (2nd ed.). Oxford University
Press. https://global.oup.com/academic/product/analysis-of-longitudinal-data-9780199676750
Draper, D. (1995). Assessment and propagation of model uncertainty (with
discussion). Journal of the Royal Statistical Society Series B,
57, 45–97. https://doi.org/10.1111/j.2517-6161.1995.tb02015.x
Efron, B. (1975). Defining the curvature of a statistical problem (with
applications to second order efficiency). The Annals of
Statistics, 3(6), 1189–1242. https://doi.org/10.1214/aos/1176343282
Fahrmeir, L., Kneib, T., Lang, S., & Marx, B. (Eds.). (2013).
Regression: Models, methods and applications.
Springer. https://doi.org/10.1007/978-3-642-34333-9
Fan, J., & Li, R. (2001). Variable selection via nonconcave
penalized likelihood and its oracle properties. Journal of the
American Statistical Association, 96, 1348–1360.
Fan, J., & Lv, J. (2008). Sure independence screening for ultrahigh
dimensional feature space (with Discussion). Journal of
the Royal Statistical Society Series B, 70, 849–911. https://doi.org/10.1198/016214501753382273
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A.,
& Rubin, D. B. (2004). Bayesian data analysis (3rd ed.).
Chapman; Hall/CRC. https://doi.org/10.1201/b16018
Gelman, A., Hill, J., & Vehtari, A. (2020). Regression and other
stories. Cambridge University Press. https://doi.org/10.1017/9781139161879
Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. (Eds.). (1996).
Markov chain monte carlo in practice. Chapman & Hall.
Green, P. J., Hjørt, N. L., & Richardson, S. (Eds.). (2003).
Highly structured stochastic systems. Chapman & Hall/CRC.
10.1201/b14835
Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical
learning with sparsity: The Lasso and
generalizations. Chapman; Hall/CRC. https://doi.org/10.1201/b18401
Hoeting, J. A., Madigan, D., Raftery, A. E., & Volinsky, C. T.
(1999). Bayesian model averaging: A tutorial (with discussion).
Statistical Science, 14, 382–417. https://doi.org/10.1214/ss/1009212519
Jamshidian, M., & Jennrich, R. I. (1997). Acceleration of the
EM algorithm by using quasi-Newton methods.
Journal of the Royal Statistical Society Series B, 59,
569–587. https://doi.org/10.1111/1467-9868.00083
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of
the American Statistical Association, 90, 773–795. https://doi.org/10.1080/01621459.1995.10476572
Liang, K., & Zeger, S. L. (1986). Longitudinal data analysis using
generalized linear models. Biometrika, 73(1), 13–22.
https://doi.org/10.1093/biomet/73.1.13
Linhart, H., & Zucchini, W. (1986). Model selection. Wiley.
https://doi.org/10.1007/978-3-642-04898-2_373
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis
with missing data (2nd ed.). Wiley. https://doi.org/10.1002/9781119013563
Marin, J.-M., & Robert, C. P. (2007). Bayesian core:
A practical approach to computational bayesian
statistics. Springer-Verlag. https://doi.org/10.1007/978-0-387-38983-7
McCullagh, P. (2002). What is a statistical model? The Annals of
Statistics, 30(5), 1225–1310. https://doi.org/10.1214/aos/1035844977
McCullagh, P., & Nelder, J. A. (1989). Generalized linear
models (2nd ed.). Chapman & Hall. https://doi.org/10.1201/9780203753736
McCulloch, C. E., Searle, S. R., & Neuhaus, J. M. (2008).
Generalized, linear, and mixed models (2nd ed.). Wiley. https://doi.org/10.1002/0471722073
McLachlan, G. J., & Krishnan, T. (2008). The EM
algorithm and extensions (2nd ed.). Wiley. https://doi.org/10.1002/9780470191613
McQuarrie, A. D. R., & Tsai, C.-L. (1998). Regression and time
series model selection. World Scientific. https://doi.org/10.1142/3573
Meng, X.-L., & van Dyk, D. (1997). The EM algorithm —
an old folk-song sung to a fast new tune (with discussion). Journal
of the Royal Statistical Society Series B, 59, 511–567. https://doi.org/10.1111/1467-9868.00082
Nelder, J. A., Lee, Y., & Pawitan, Y. (2017). Generalized linear
models with random effects: A unified approach via h-likelihood (2nd ed.).
Chapman; Hall/CRC. https://doi.org/10.1201/9781315119953
Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear
models. Journal of the Royal Statistical Society: Series A
(General), 135(3), 370–384. https://doi.org/10.2307/2344614
O’Hagan, A., & Forster, J. J. (2004). Kendall’s advanced theory
of statistics. Volume 2B:
Bayesian inference (Second). Hodder Arnold. https://www.wiley.com/en-us/Kendall%27s+Advanced+Theory+of+Statistic+2B-p-9780470685693
Oakes, D. (1999). Direct calculation of the information matrix via the
EM algorithm. Journal of the Royal Statistical Society
Series B, 61, 479–482. https://doi.org/10.1111/1467-9868.00188
Ogden, H. (2021). On the error in laplace approximations of
high-dimensional integrals. Stat, 10(1), e380. https://doi.org/10.1002/sta4.380
Ogden, H. E. (2017). On asymptotic validity of naive inference with an
approximate likelihood. Biometrika, 104(1), 153–164.
https://doi.org/10.1093/biomet/asx002
Peters, J., Janzing, D., & Schlkopf, B. (2017). Elements of
causal inference: Foundations and learning algorithms. The MIT
Press. https://mitpress.mit.edu/books/elements-causal-inference
Pinheiro, J., & Bates, D. M. (2002). Mixed effects models in
S and S-PLUS. New York:Springer-Verlag.
https://doi.org/10.1007/b98882
Raftery, A. E., Madigan, D., & Hoeting, J. A. (1997). Bayesian model
averaging for linear regression models. Journal of the American
Statistical Association, 92, 179–191. https://doi.org/10.1080/01621459.1997.10473615
Richardson, S., & Green, P. J. (1997). On bayesian analysis of
mixtures with an unknown number of components (with discussion).
Journal of the Royal Statistical Society Series B, 59,
731–792. https://doi.org/10.1111/1467-9868.00095
Rissanen, J. (1987). Stochastic complexity (with discussion).
Journal of the Royal Statistical Society, Series B,
49, 223–239. https://doi.org/10.1111/j.2517-6161.1987.tb01694.x
Schwartz, G. (1978). Estimating the dimension of a model. Annals of
Statistics, 6, 461–464. https://doi.org/10.1214/aos/1176344136
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Linde, A. van
der. (2002). Bayesian measures of model complexity and fit (with
discussion). Journal of the Royal Statistical Society, Series
B, 64, 583–639. https://doi.org/10.1111/1467-9868.00353
Tanner, M. A. (1996). Tools for statistical inference: Methods for
the exploration of posterior distributions and likelihood functions
(Third). Springer. https://doi.org/10.1007/978-1-4612-4024-2
Venables, W., & Ripley, B. D. (2002). Modern applied statistics
with S (4th ed.). Springer-Verlag. https://doi.org/10.1007/978-0-387-21706-2
Wedderburn, R. W. M. (1974). Quasi-Likelihood Functions,
Generalized Linear Models, and the Gauss-Newton
Method. Biometrika, 61(3), 439. https://doi.org/10.2307/2334725
Wood, S. N. (2017). Generalized additive models: An introduction
with r (2nd ed.). Chapman; Hall/CRC. https://doi.org/10.1201/9781315370279