If you have a lot of predictor variables, you should consider using global F tests. Here is a toy example that shows why.
We start by picking a number of observations and the number of parameters in our model, \( p \). We then generate \( p-1 \) independent covariates, plus a column of 1s for the design matrix.
nObs <- 1000
p <- 100
x <- matrix(rnorm(nObs * p), nrow = nObs)
x <- data.frame(1, x)
colNames <- paste0("x", 1:p)
colnames(x) <- colNames
Now we will generate our $y$s completely independently of all of our covariates. None of our \( x \) variables are associated with our outcome!
y <- rnorm(nObs)
But if we fit a linear model that assumes that there ARE relationships bewteen our outcome and all of our \( x \) variables, do we see any individually significant \( \betas \)? If so, how many are significant and are these indiciative of real associations?
fmla <- formula(paste0("y ~ 0 +", paste(colNames, collapse = "+")))
mlr1 <- lm(fmla, data = x)
coefs <- summary(mlr1)$coef
sum(coefs[, "Pr(>|t|)"] < 0.05)
## [1] 2
Alternatively, we could use a Global \( F \)-test to test whether any of the \( x \) variables add significant explanatory power to our model. What conclusion do we draw from this test?
mlr0 <- lm(y ~ x1, data = x)
anova(mlr0, mlr1)
## Analysis of Variance Table
##
## Model 1: y ~ x1
## Model 2: y ~ 0 + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 + x11 +
## x12 + x13 + x14 + x15 + x16 + x17 + x18 + x19 + x20 + x21 +
## x22 + x23 + x24 + x25 + x26 + x27 + x28 + x29 + x30 + x31 +
## x32 + x33 + x34 + x35 + x36 + x37 + x38 + x39 + x40 + x41 +
## x42 + x43 + x44 + x45 + x46 + x47 + x48 + x49 + x50 + x51 +
## x52 + x53 + x54 + x55 + x56 + x57 + x58 + x59 + x60 + x61 +
## x62 + x63 + x64 + x65 + x66 + x67 + x68 + x69 + x70 + x71 +
## x72 + x73 + x74 + x75 + x76 + x77 + x78 + x79 + x80 + x81 +
## x82 + x83 + x84 + x85 + x86 + x87 + x88 + x89 + x90 + x91 +
## x92 + x93 + x94 + x95 + x96 + x97 + x98 + x99 + x100
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 999 998
## 2 900 894 99 104 1.06 0.34
Do the results from the Global \( F \)-test and the individual \( \beta \) \( t \)-tests agree? Why?