class: center, middle, inverse, title-slide .title[ # Asymptotics, Statistical computing, Clustering, Software ] .author[ ###
Ioannis Kosmidis
Professor in Statistics
ikosmidis.com
ikosmidis
ikosmidis_
] .institute[ ### University of Warwick ] .date[ ###
ACIIS, 11 November 2022
Slides at ikosmidis.com/talks/
] --- <!-- 11 November 2022 --> <style type="text/css"> .huge .remark-code { /*Change made here*/ font-size: 200% !important; } .large .remark-code { /*Change made here*/ font-size: 120% !important; } .small .remark-code { /*Change made here*/ font-size: 70% !important; } .tiny .remark-code { /*Change made here*/ font-size: 50% !important; } .tiny-tex { /*Change made here*/ font-size: 50% !important; } .note { color: #478EC1; font-size: 120% !important; font-weight: bold; } </style> ## Theory and methods of learning .note[Penalized- and pseudo-likelihood theory and methods] bias reduction in M-estimation, variance correction, recovering asymptotic performance of learning tasks in realistic asymptotic settings --- For `\(A(\theta) = O_p(k)\)`, define $$ \tilde\theta \longleftarrow \psi(\theta) + A(\theta) = 0 $$ Find `\(A(\theta)\)` such that `\(\tilde\theta\)` has better asymptotic properties than `\(\hat\theta \longleftarrow \psi(\theta) = 0\)`, aiming to improve its finite sample properties .note[Instances of better asymptotic properties] .center[ | Property | `\(\hat\theta\)` | `\(\tilde\theta\)` | |-------------|--------------------------|-------------------------| | Consistency | inconsistent | consistent | | Mean bias | `\(O(n^{-1})\)` | `\(O(n^{-2})\)` | | Median bias | `\(1/2 + O(n^{-1/2})\)` | `\(1/2 + O(n^{-3/2})\)` | | Variance | `\(i(\theta) + O(n^{-2})\)` | `\(i(\theta) + O(n^{-3})\)` | | Existence | possibly on the boundary | always exists | ] Recent works [Sterzinger and K. (2022)](#bib-sterzingerkosmidis2022) <br> [K. and Firth (2021)](#bib-kosmidisfirth2021) <br> [K. and Lunardon (2021)](#bib-kosmidislunardon2021) <br> [K., Kenne Pagui, and Sartori (2020)](#bib-kosmidiskennepaguisartori2020) --- ## Theory and methods of learning .note[Penalized- and pseudo-likelihood theory and methods] bias reduction in M-estimation, variance correction, recovering asymptotic performance of learning tasks in realistic asymptotic settings .note[Statistical computing and algorithms for regression problems] regression methods and scalability of algorithms --- #### Mean and median bias reduction for GLMs in high-dimensional datasets $$ \beta^{(j+1)} = (X^\top W^{(j)} X)^{-1} X^\top W^{(j)} (\underset{\text{median BR}}{\underbrace{\overset{\text{mean BR}}{\overbrace{\underset{\text{ML}}{\underbrace{z^{(j)}}} + \phi^{(j)}\xi^{(j)}}} + \phi^{(j)}X u^{(j)}}}) \label{zietkiewicz:update} $$ #### Algorithms for consistent model selection in regression <br> e.g. <br> [Zietkiewicz and K. (2022)](#bib-zietkiewiczkosmidis2022) <br> [K., Kenne Pagui, and Sartori (2020)](#bib-kosmidiskennepaguisartori2020) --- ## Theory and methods of learning .note[Penalized- and pseudo-likelihood theory and methods] bias reduction in M-estimation, variance correction, recovering asymptotic performance of learning tasks in realistic asymptotic settings .note[Statistical computing and algorithms for regression problems] regression methods and scalability of algorithms .note[Models for modern data-analytic challenges] marked spatio-temporal point processes ([Narayanan, K., and Dellaportas, 2022](#bib-narayanankosmidisdellaportas2022)) <br> time-varying networks ([Bartlett, K., and Silva, 2021](#bib-bartlettkosmidissilva2021)) <br> rankings ([Turner, van Etten, Firth et al., 2020](#bib-turneretal2021)) --- ## Theory and methods of learning .note[Penalized- and pseudo-likelihood theory and methods] bias reduction in M-estimation, variance correction, recovering asymptotic performance of learning tasks in realistic asymptotic settings .note[Statistical computing and algorithms for regression problems] regression methods and scalability of algorithms .note[Models for modern data-analytic challenges] marked spatio-temporal point processes ([Narayanan, K., and Dellaportas, 2022](#bib-narayanankosmidisdellaportas2022)) <br> time-varying networks ([Bartlett, K., and Silva, 2021](#bib-bartlettkosmidissilva2021)) <br> rankings ([Turner, van Etten, Firth et al., 2020](#bib-turneretal2021)) .note[Methods for clustering/classification] copula-based mixture models for clustering and classification ([K. and Karlis, 2016](#bib-kosmidiskarlis2016)) .right[.huge[[ikosmidis.com/research](https://ikosmidis.com/research)]] --- ## Interdisciplinary applied work .note[Sports science] modelling of high-frequency in-game events in team sports <br> uncovering the links between human behaviour, health, fitness and overall well-being .note[Neuroimaging] regression methods for brain lesions from MRI / summarization and visualization of effects .note[Genetics] infering changes in genomic network structures .note[Finance] modelling the dynamics of financial indicators with structural dependencies .note[Earthquake engineering] assessment of the vulnerability of the built environment from post-hazard survey data .right[.huge[[ikosmidis.com/research](https://ikosmidis.com/research)]] --- ## Software development
.right[.huge[[ikosmidis.com/software](https://ikosmidis.com/software)]] --- ## References I <a name=bib-bartlettkosmidissilva2021></a>[Bartlett, T. E., I. K., and R. Silva](#cite-bartlettkosmidissilva2021) (2021). "Two-way sparsity for time-varying networks with applications in genomics". In: _The Annals of Applied Statistics_ 15.2, pp. 856 - 879. DOI: [10.1214/20-AOAS1416](https://doi.org/10.1214%2F20-AOAS1416). URL: [https://doi.org/10.1214/20-AOAS1416](https://doi.org/10.1214/20-AOAS1416). <a name=bib-kosmidisfirth2021></a>[K., I. and D. Firth](#cite-kosmidisfirth2021) (2021). "Jeffreys-Prior Penalty, Finiteness and Shrinkage in Binomial-Response Generalized Linear Models". In: _Biometrika_ 108.1, pp. 71-82. DOI: [10.1093/biomet/asaa052](https://doi.org/10.1093%2Fbiomet%2Fasaa052). <a name=bib-kosmidiskarlis2016></a>[K., I. and D. Karlis](#cite-kosmidiskarlis2016) (2016). "Model-based clustering using copulas with applications". In: _Statistics and Computing_, pp. 1079-1099. DOI: [10.1007/s11222-015-9590-5](https://doi.org/10.1007%2Fs11222-015-9590-5). URL: [http://dx.doi.org/10.1007/s11222-015-9590-5](http://dx.doi.org/10.1007/s11222-015-9590-5). <a name=bib-kosmidiskennepaguisartori2020></a>[K., I., E. C. Kenne Pagui, and N. Sartori](#cite-kosmidiskennepaguisartori2020) (2020). "Mean and median bias reduction in generalized linear models". In: _Statistics and Computing (to appear)_ 30, pp. 43-59. <a name=bib-kosmidislunardon2021></a>[K., I. and N. Lunardon](#cite-kosmidislunardon2021) (2021). "Empirical bias-reducing adjustments to estimating functions". In: _arXiv:2001.03786_. URL: [https://arxiv.org/abs/2001.03786](https://arxiv.org/abs/2001.03786). --- ## References II <a name=bib-narayanankosmidisdellaportas2022></a>[Narayanan, S., I. K., and P. Dellaportas](#cite-narayanankosmidisdellaportas2022) (2022). "Flexible marked spatio-temporal point processes with applications to event sequences from association football". In: _Discussion paper in the Journal of the Royal Statistical Society: Series C (Accepted)_. URL: [https://arxiv.org/abs/2103.04647](https://arxiv.org/abs/2103.04647). <a name=bib-sterzingerkosmidis2022></a>[Sterzinger, P. and I. K.](#cite-sterzingerkosmidis2022) (2022). "Maximum softly-penalized likelihood for mixed effects logistic regression". In: _arXiv:2206.02561_. URL: [https://arxiv.org/abs/2206.02561](https://arxiv.org/abs/2206.02561). <a name=bib-turneretal2021></a>[Turner, H. L., J. van Etten, D. Firth, et al.](#cite-turneretal2021) (2020). "Modelling rankings in R: the PlackettLuce package". In: _Computational Statistics_, pp. 1027–-1057. URL: [https://doi.org/10.1007/s00180-020-00959-3](https://doi.org/10.1007/s00180-020-00959-3). <a name=bib-zietkiewiczkosmidis2022></a>[Zietkiewicz, P. and I. K.](#cite-zietkiewiczkosmidis2022) (2022). "Mean and median bias reduction in generalized linear models with large data sets". In: _Proceedings of the 36th International Workshop Statistical Modelling July 18-22, 2022 - Trieste, Italy_. Ed. by N. Torelli, R. Bellio and V. Muggeo. Heidelberg: EUT Edizioni Università di Trieste. URL: [http://hdl.handle.net/10077/33741](http://hdl.handle.net/10077/33741).