?Distributions
RAdelaide 2024
July 10, 2024
R
has it’s origins as a statistical analysis language (i.e. S
)R
comes with nearly every distributionDistribution | Density | Area Under Curve | Quantile | Random |
---|---|---|---|---|
Normal | dnorm() |
pnorm() |
qnorm() |
rnorm() |
T | dt() |
pt() |
qt() |
rt() |
Uniform | dunif() |
punif() |
qunif() |
runif() |
Exponential | dexp() |
pexp() |
qexp() |
rexp() |
\(\chi^2\) | dchisq() |
pchisq() |
qchisq() |
rchisq() |
Binomial | dbinom() |
pbinom() |
qbinom() |
rbinom() |
Poisson | dpois() |
ppois() |
qpois() |
rpois() |
We’ll use the pigs
dataset from earlier
When comparing the means of two vectors
\[ H_0: \mu_{1} = \mu_{2} \\ H_A: \mu_{1} \neq \mu_{2} \]
We could use two vectors (i.e. x
& y
)
Is this a paired test?
R
formula method: len~supp
Did this give the same results?
\[ H_0: \text{No association between groups and outcome}\\ H_A: \text{Association between groups and outcome} \]
Can anyone remember when we shouldn’t use a \(\chi^2\) test?
t.test()
, wilcox.test()
chisq.test()
, fisher.test()
shapiro.test()
, bartlett.test()
binomial.test()
, poisson.test()
kruskal.test()
, ks.test()
htest
Objectshtest
print.htest()
to display resultsnames()
to see what other values are returnedWe are trying to estimate a line with slope & intercept
\[ y = ax + b \]
Or
\[ y = \beta_0 + \beta_1 x \]
Linear Regression always uses the R
formula syntax
y ~ x
: y
is a function of x
lm()
~ 0 + ...
)supp == VC
reduces the length of the teethAn alternative way to write the above in R
is:
Which model should we choose?
Are we happy with our model assumptions?
mfrow()
stands for multi-frame rowbinomial
data, \(\pi\) is the probability of successTRUE/FALSE
or 0/1
Rows: 400
Columns: 4
$ admit <dbl> 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1…
$ gre <dbl> 380, 660, 800, 640, 520, 760, 560, 400, 540, 700, 800, 440, 760,…
$ gpa <dbl> 3.61, 3.67, 4.00, 3.19, 2.93, 3.00, 2.98, 3.08, 3.39, 3.92, 4.00…
$ rank <dbl> 3, 3, 1, 4, 4, 2, 1, 2, 3, 2, 4, 1, 1, 2, 1, 3, 4, 3, 2, 1, 3, 2…
Call:
glm(formula = admit ~ gre + gpa + rank, family = "binomial",
data = admissions)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.449548 1.132846 -3.045 0.00233 **
gre 0.002294 0.001092 2.101 0.03564 *
gpa 0.777014 0.327484 2.373 0.01766 *
rank -0.560031 0.127137 -4.405 1.06e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 499.98 on 399 degrees of freedom
Residual deviance: 459.44 on 396 degrees of freedom
AIC: 467.44
Number of Fisher Scoring iterations: 4
\[ \text{logit}(\pi) = \log(\frac{\pi}{1-\pi}) \]
R
can remove terms as required using Akaike’s Information Criterion (AIC)
step()
Mixed effects models include:
May need to nest results within a biological sample, include day effects etc.
Here we have the change in Blood pressure within the same 5 rabbits
MDL
If fitting within one rabbit \(\implies\) use lm()
To nest within each rabbit we:
lmer()
from lme4
(1|Animal)
This gives \(t\)-statistics, but no \(p\)-value
Why?
glmmTMB
and glmmADMB
lm()
for standard regression modelsglm()
for generalised linear modelslmer()
for linear mixed-effects modelsMASS
p.adjust()
takes the argument method = ...
Also the package multcomp
is excellent but challenging
t()
S
reverse compatability)prcomp()
is a list with class prcomp
pca$x
pca$rotation
In this dataset:
-
+
ggrepel::geom_label_repel()
)