library(tidyverse)
library(carData)
theme_set(theme_bw())
RAdelaide 2025
July 10, 2025
\[ \text{logit}(\pi) = \log \frac{\pi}{1-\pi} \]
carData
install.packages(c("car", "carData"))
car
is the Companion to Applied Regression (not broom brooms)titanic_tbl |>
count(survived, sex, passengerClass) |>
pivot_wider(names_from = survived, values_from = n) |>
mutate(
prob = yes / (no + yes)
)
# A tibble: 6 × 5
sex passengerClass no yes prob
<fct> <fct> <int> <int> <dbl>
1 female 1st 5 128 0.962
2 female 2nd 11 92 0.893
3 female 3rd 80 72 0.474
4 male 1st 98 53 0.351
5 male 2nd 135 23 0.146
6 male 3rd 290 59 0.169
family = binomial
glm
we are fitting binomial probabilities
Call:
glm(formula = survived ~ sex + age + passengerClass, family = binomial,
data = TitanicSurvival)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.522074 0.326702 10.781 < 2e-16 ***
sexmale -2.497845 0.166037 -15.044 < 2e-16 ***
age -0.034393 0.006331 -5.433 5.56e-08 ***
passengerClass2nd -1.280570 0.225538 -5.678 1.36e-08 ***
passengerClass3rd -2.289661 0.225802 -10.140 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1414.62 on 1045 degrees of freedom
Residual deviance: 982.45 on 1041 degrees of freedom
(263 observations deleted due to missingness)
AIC: 992.45
Number of Fisher Scoring iterations: 4
titanic_tbl |>
## Setting type = "response" will transform back to [0,1]
mutate(prob_survival = predict(titanic_glm, type = "response")) |>
arrange(desc(prob_survival))
# A tibble: 1,046 × 6
name survived sex age passengerClass prob_survival
<chr> <fct> <fct> <dbl> <fct> <dbl>
1 Allison, Miss. Helen Lorai… no fema… 2 1st 0.969
2 Carter, Miss. Lucile Polk yes fema… 14 1st 0.954
3 Madill, Miss. Georgette Al… yes fema… 15 1st 0.953
4 Hippach, Miss. Jean Gertru… yes fema… 16 1st 0.951
5 Lines, Miss. Mary Conover yes fema… 16 1st 0.951
6 Maioni, Miss. Roberta yes fema… 16 1st 0.951
7 Dick, Mrs. Albert Adrian (… yes fema… 17 1st 0.950
8 Penasco y Castellana, Mrs.… yes fema… 17 1st 0.950
9 Astor, Mrs. John Jacob (Ma… yes fema… 18 1st 0.948
10 Marvin, Mrs. Daniel Warner… yes fema… 18 1st 0.948
# ℹ 1,036 more rows
titanic_tbl |>
mutate(prob_survival = predict(titanic_glm, type = "response")) |>
arrange(desc(prob_survival)) |>
mutate(
## Calculate the True Positive Rate as we step through the predictions
TPR = cumsum(survived == "yes") / sum(survived == "yes"),
## And the False Positive Rate as we step through in order
FPR = cumsum(survived == "no") / sum(survived == "no")
) |>
ggplot(aes(FPR, TPR)) +
geom_line() +
geom_abline(slope = 1)
list(
base_model = glm(
survived ~ sex + age + passengerClass, data = titanic_tbl, family = binomial
),
interactions = glm(
survived ~ (sex + age + passengerClass)^2, data = titanic_tbl, family = binomial
)
) |>
lapply(\(x) mutate(titanic_tbl, pi = predict(x, type = "response"))) |>
bind_rows(.id = "model") |>
arrange(model, desc(pi)) |>
mutate(
TPR = cumsum(survived == "yes") / sum(survived == "yes"),
FPR = cumsum(survived == "no") / sum(survived == "no"),
.by = model
) |>
ggplot(aes(FPR, TPR, colour = model)) +
geom_line() +
geom_abline(slope = 1)
InsectSprays
provides counts of insects after applying insecticides A-F
Call:
glm(formula = count ~ spray, family = poisson, data = InsectSprays)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.67415 0.07581 35.274 < 2e-16 ***
sprayB 0.05588 0.10574 0.528 0.597
sprayC -1.94018 0.21389 -9.071 < 2e-16 ***
sprayD -1.08152 0.15065 -7.179 7.03e-13 ***
sprayE -1.42139 0.17192 -8.268 < 2e-16 ***
sprayF 0.13926 0.10367 1.343 0.179
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 409.041 on 71 degrees of freedom
Residual deviance: 98.329 on 66 degrees of freedom
AIC: 376.59
Number of Fisher Scoring iterations: 5
family = quasipoisson
offset
R
is super-clumsylme4
and the function lmer()
nlme
\(\implies\) no longer supported(1 | Worker)
Machine
is specified as per usual for fixed effectsLinear mixed model fit by REML ['lmerMod']
Formula: score ~ Machine + (1 | Worker)
Data: Machines
REML criterion at convergence: 286.9
Scaled residuals:
Min 1Q Median 3Q Max
-2.7249 -0.5233 0.1328 0.6513 1.7559
Random effects:
Groups Name Variance Std.Dev.
Worker (Intercept) 26.487 5.147
Residual 9.996 3.162
Number of obs: 54, groups: Worker, 6
Fixed effects:
Estimate Std. Error t value
(Intercept) 52.356 2.229 23.485
MachineB 7.967 1.054 7.559
MachineC 13.917 1.054 13.205
Correlation of Fixed Effects:
(Intr) MachnB
MachineB -0.236
MachineC -0.236 0.500
lmerTest
before fittinglibrary(lmerTest)
machines_lmer <- lmer(score ~ Machine + (1 | Worker) , data = Machines)
summary(machines_lmer) |> coef()
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 52.355556 2.229312 5.833185 23.48507 5.328296e-07
MachineB 7.966667 1.053882 46.000000 7.55935 1.329432e-09
MachineC 13.916667 1.053882 46.000000 13.20514 2.933231e-17
lme4
has implemented glmer()
glmmADMB
and glmmTMB
nlme
) was a member of R Core but has left the R
community