Global F tests and Family-Wise Error Rates

Code for Biostatistics Methods 2, UMass-Amherst, Spring 2014

by Nicholas Reich

If you have a lot of predictor variables, you should consider using global F tests. Here is a toy example that shows why.

We start by picking a number of observations and the number of parameters in our model, \( p \). We then generate \( p-1 \) independent covariates, plus a column of 1s for the design matrix.

nObs <- 1000
p <- 100
x <- matrix(rnorm(nObs * p), nrow = nObs)
x <- data.frame(1, x)
colNames <- paste0("x", 1:p)
colnames(x) <- colNames

Now we will generate our $y$s completely independently of all of our covariates. None of our \( x \) variables are associated with our outcome!

y <- rnorm(nObs)

But if we fit a linear model that assumes that there ARE relationships bewteen our outcome and all of our \( x \) variables, do we see any individually significant \( \betas \)? If so, how many are significant and are these indiciative of real associations?

fmla <- formula(paste0("y ~ 0 +", paste(colNames, collapse = "+")))
mlr1 <- lm(fmla, data = x)
coefs <- summary(mlr1)$coef
sum(coefs[, "Pr(>|t|)"] < 0.05)
## [1] 2

Alternatively, we could use a Global \( F \)-test to test whether any of the \( x \) variables add significant explanatory power to our model. What conclusion do we draw from this test?

mlr0 <- lm(y ~ x1, data = x)
anova(mlr0, mlr1)
## Analysis of Variance Table
## 
## Model 1: y ~ x1
## Model 2: y ~ 0 + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 + x11 + 
##     x12 + x13 + x14 + x15 + x16 + x17 + x18 + x19 + x20 + x21 + 
##     x22 + x23 + x24 + x25 + x26 + x27 + x28 + x29 + x30 + x31 + 
##     x32 + x33 + x34 + x35 + x36 + x37 + x38 + x39 + x40 + x41 + 
##     x42 + x43 + x44 + x45 + x46 + x47 + x48 + x49 + x50 + x51 + 
##     x52 + x53 + x54 + x55 + x56 + x57 + x58 + x59 + x60 + x61 + 
##     x62 + x63 + x64 + x65 + x66 + x67 + x68 + x69 + x70 + x71 + 
##     x72 + x73 + x74 + x75 + x76 + x77 + x78 + x79 + x80 + x81 + 
##     x82 + x83 + x84 + x85 + x86 + x87 + x88 + x89 + x90 + x91 + 
##     x92 + x93 + x94 + x95 + x96 + x97 + x98 + x99 + x100
##   Res.Df RSS Df Sum of Sq    F Pr(>F)
## 1    999 998                         
## 2    900 894 99       104 1.06   0.34

Do the results from the Global \( F \)-test and the individual \( \beta \) \( t \)-tests agree? Why?