# detectseparation R package is on CRAN Tweet

**detectseparation** provides pre-fit and post-fit methods for detecting separation and infinite maximum likelihood estimates in generalized linear models with categorical responses.

## Pre-fit methods

The pre-fit methods apply on binomial-response generalized liner models such as logit, probit and cloglog regression, and can be directly supplied as fitting methods to the `glm()`

function. They solve the linear programming problems for the detection of separation developed in Konis (2007), using **ROI** or **lpSolveAPI**.

For example, the code chunk below checks for data separation for the logistic regression model for the endometrial data set that has been considered in Heinze and Schemper (2002)

```
library("detectseparation")
data("endometrial", package = "detectseparation")
(endo_sep <- glm(HG ~ NV + PI + EH, data = endometrial,
family = binomial("logit"),
method = "detect_separation"))
## Implementation: ROI | Solver: lpsolve
## Separation: TRUE
## Existence of maximum likelihood estimates
## (Intercept) NV PI EH
## 0 Inf 0 0
## 0: finite value, Inf: infinity, -Inf: -infinity
```

The maximum likelihood estimate for the coefficient of `NV`

is \(+\infty\), while the maximum likelihood estimates for all other parameters are finite. In fact, the maximum likelihood estimates and the corresponding estimated standard errors are

```
endo_ml <- update(endo_sep, method = "glm.fit")
coef(summary(endo_ml))[, 1:2] + coef(endo_sep)
## Estimate Std. Error
## (Intercept) 4.3045178 1.63729861
## NV Inf Inf
## PI -0.0421834 0.04433196
## EH -2.9026056 0.84555156
```

## Post-fit methods

The post-fit methods apply to models with categorical responses, including binomial-response generalized linear models and multinomial-response models, such as baseline category logits and adjacent category logits models; for example, the models implemented in the **brglm2** package. The post-fit methods successively refit the model with increasing number of iteratively reweighted least squares iterations, and monitor the ratio of the estimated standard error for each parameter to what it has been in the first iteration. According to the results in Lesaffre and Albert (1989), divergence of those ratios indicates data separation. For example,

`plot(check_infinite_estimates(endo_ml))`

## More information

See the package vignettes and the package’s GitHub page for more information.

## References

Heinze, G., and M. Schemper. 2002. “A Solution to the Problem of Separation in Logistic Regression.” *Statistics in Medicine* 21: 2409–19.

Konis, Kjell. 2007. “Linear Programming Algorithms for Detecting Separated Data in Binary Logistic Regression Models.” PhD thesis, University of Oxford. https://ora.ox.ac.uk/objects/uuid:8f9ee0d0-d78e-4101-9ab4-f9cbceed2a2a.

Lesaffre, E., and A. Albert. 1989. “Partial Separation in Logistic Discrimination.” *Journal of the Royal Statistical Society. Series B (Methodological)* 51 (1): 109–16. http://www.jstor.org/stable/2345845.