January 27, 2016

Graphics in R

Choices of graphics platforms

You have three central choices for making graphics in R:

  • "Base graphics"
  • ggplot2
  • lattice

Let there be data….

library(NHANES)
data(NHANES)
str(NHANES)
## Classes 'tbl_df', 'tbl' and 'data.frame':    10000 obs. of  76 variables:
##  $ ID              : int  51624 51624 51624 51625 51630 51638 51646 51647 51647 51647 ...
##  $ SurveyYr        : Factor w/ 2 levels "2009_10","2011_12": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Gender          : Factor w/ 2 levels "female","male": 2 2 2 2 1 2 2 1 1 1 ...
##  $ Age             : int  34 34 34 4 49 9 8 45 45 45 ...
##  $ AgeDecade       : Factor w/ 8 levels " 0-9"," 10-19",..: 4 4 4 1 5 1 1 5 5 5 ...
##  $ AgeMonths       : int  409 409 409 49 596 115 101 541 541 541 ...
##  $ Race1           : Factor w/ 5 levels "Black","Hispanic",..: 4 4 4 5 4 4 4 4 4 4 ...
##  $ Race3           : Factor w/ 6 levels "Asian","Black",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ Education       : Factor w/ 5 levels "8th Grade","9 - 11th Grade",..: 3 3 3 NA 4 NA NA 5 5 5 ...
##  $ MaritalStatus   : Factor w/ 6 levels "Divorced","LivePartner",..: 3 3 3 NA 2 NA NA 3 3 3 ...
##  $ HHIncome        : Factor w/ 12 levels " 0-4999"," 5000-9999",..: 6 6 6 5 7 11 9 11 11 11 ...
##  $ HHIncomeMid     : int  30000 30000 30000 22500 40000 87500 60000 87500 87500 87500 ...
##  $ Poverty         : num  1.36 1.36 1.36 1.07 1.91 1.84 2.33 5 5 5 ...
##  $ HomeRooms       : int  6 6 6 9 5 6 7 6 6 6 ...
##  $ HomeOwn         : Factor w/ 3 levels "Own","Rent","Other": 1 1 1 1 2 2 1 1 1 1 ...
##  $ Work            : Factor w/ 3 levels "Looking","NotWorking",..: 2 2 2 NA 2 NA NA 3 3 3 ...
##  $ Weight          : num  87.4 87.4 87.4 17 86.7 29.8 35.2 75.7 75.7 75.7 ...
##  $ Length          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ HeadCirc        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Height          : num  165 165 165 105 168 ...
##  $ BMI             : num  32.2 32.2 32.2 15.3 30.6 ...
##  $ BMICatUnder20yrs: Factor w/ 4 levels "UnderWeight",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ BMI_WHO         : Factor w/ 4 levels "12.0_18.5","18.5_to_24.9",..: 4 4 4 1 4 1 2 3 3 3 ...
##  $ Pulse           : int  70 70 70 NA 86 82 72 62 62 62 ...
##  $ BPSysAve        : int  113 113 113 NA 112 86 107 118 118 118 ...
##  $ BPDiaAve        : int  85 85 85 NA 75 47 37 64 64 64 ...
##  $ BPSys1          : int  114 114 114 NA 118 84 114 106 106 106 ...
##  $ BPDia1          : int  88 88 88 NA 82 50 46 62 62 62 ...
##  $ BPSys2          : int  114 114 114 NA 108 84 108 118 118 118 ...
##  $ BPDia2          : int  88 88 88 NA 74 50 36 68 68 68 ...
##  $ BPSys3          : int  112 112 112 NA 116 88 106 118 118 118 ...
##  $ BPDia3          : int  82 82 82 NA 76 44 38 60 60 60 ...
##  $ Testosterone    : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ DirectChol      : num  1.29 1.29 1.29 NA 1.16 1.34 1.55 2.12 2.12 2.12 ...
##  $ TotChol         : num  3.49 3.49 3.49 NA 6.7 4.86 4.09 5.82 5.82 5.82 ...
##  $ UrineVol1       : int  352 352 352 NA 77 123 238 106 106 106 ...
##  $ UrineFlow1      : num  NA NA NA NA 0.094 ...
##  $ UrineVol2       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ UrineFlow2      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Diabetes        : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ DiabetesAge     : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ HealthGen       : Factor w/ 5 levels "Excellent","Vgood",..: 3 3 3 NA 3 NA NA 2 2 2 ...
##  $ DaysPhysHlthBad : int  0 0 0 NA 0 NA NA 0 0 0 ...
##  $ DaysMentHlthBad : int  15 15 15 NA 10 NA NA 3 3 3 ...
##  $ LittleInterest  : Factor w/ 3 levels "None","Several",..: 3 3 3 NA 2 NA NA 1 1 1 ...
##  $ Depressed       : Factor w/ 3 levels "None","Several",..: 2 2 2 NA 2 NA NA 1 1 1 ...
##  $ nPregnancies    : int  NA NA NA NA 2 NA NA 1 1 1 ...
##  $ nBabies         : int  NA NA NA NA 2 NA NA NA NA NA ...
##  $ Age1stBaby      : int  NA NA NA NA 27 NA NA NA NA NA ...
##  $ SleepHrsNight   : int  4 4 4 NA 8 NA NA 8 8 8 ...
##  $ SleepTrouble    : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 1 1 1 ...
##  $ PhysActive      : Factor w/ 2 levels "No","Yes": 1 1 1 NA 1 NA NA 2 2 2 ...
##  $ PhysActiveDays  : int  NA NA NA NA NA NA NA 5 5 5 ...
##  $ TVHrsDay        : Factor w/ 7 levels "0_hrs","0_to_1_hr",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ CompHrsDay      : Factor w/ 7 levels "0_hrs","0_to_1_hr",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ TVHrsDayChild   : int  NA NA NA 4 NA 5 1 NA NA NA ...
##  $ CompHrsDayChild : int  NA NA NA 1 NA 0 6 NA NA NA ...
##  $ Alcohol12PlusYr : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 2 2 2 ...
##  $ AlcoholDay      : int  NA NA NA NA 2 NA NA 3 3 3 ...
##  $ AlcoholYear     : int  0 0 0 NA 20 NA NA 52 52 52 ...
##  $ SmokeNow        : Factor w/ 2 levels "No","Yes": 1 1 1 NA 2 NA NA NA NA NA ...
##  $ Smoke100        : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 1 1 1 ...
##  $ Smoke100n       : Factor w/ 2 levels "Non-Smoker","Smoker": 2 2 2 NA 2 NA NA 1 1 1 ...
##  $ SmokeAge        : int  18 18 18 NA 38 NA NA NA NA NA ...
##  $ Marijuana       : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 2 2 2 ...
##  $ AgeFirstMarij   : int  17 17 17 NA 18 NA NA 13 13 13 ...
##  $ RegularMarij    : Factor w/ 2 levels "No","Yes": 1 1 1 NA 1 NA NA 1 1 1 ...
##  $ AgeRegMarij     : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ HardDrugs       : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 1 1 1 ...
##  $ SexEver         : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 2 2 2 ...
##  $ SexAge          : int  16 16 16 NA 12 NA NA 13 13 13 ...
##  $ SexNumPartnLife : int  8 8 8 NA 10 NA NA 20 20 20 ...
##  $ SexNumPartYear  : int  1 1 1 NA 1 NA NA 0 0 0 ...
##  $ SameSex         : Factor w/ 2 levels "No","Yes": 1 1 1 NA 2 NA NA 2 2 2 ...
##  $ SexOrientation  : Factor w/ 3 levels "Bisexual","Heterosexual",..: 2 2 2 NA 2 NA NA 1 1 1 ...
##  $ PregnantNow     : Factor w/ 3 levels "Yes","No","Unknown": NA NA NA NA NA NA NA NA NA NA ...

Graphics in R: Base graphics

plot(NHANES$Weight, NHANES$Height)

A simple ggplot2 graphic

library(ggplot2)
qplot(Weight, Height, data=NHANES)

A simple lattice graphic

library(lattice)
xyplot(Height ~ Weight, data=NHANES) 

The mosaic package can translate between them

library(mosaic)
mplot(NHANES)

{mplot()} loads a very nice graphical user interface for creating graphics, and spits out the code for ggplot or lattice graphics.

Try it out! Practice saving a graphic from RStudio.

Why I use ggplot2

  • ``easy'' to create both simple and complex graphics
  • unified syntax
  • attractive defaults (except for colors)
  • actively developed and improved

Quick tour of ggplot2 syntax

## defaults to scatterplot
qplot(Weight, Height, data=NHANES) 

Quick tour of ggplot2 syntax

## defaults to scatterplot
## same as default
qplot(Weight, Height, data=NHANES, geom="point") 

Quick tour of ggplot2 syntax

qplot(Weight, Height, data=NHANES, geom="smooth") 

Quick tour of ggplot2 syntax

qplot(Weight, Height, data=NHANES, 
      geom=c("smooth", "point")) 

Quick tour of ggplot2 syntax

qplot(Weight, Height, data=NHANES, 
      geom=c("smooth", "point"), alpha=I(.01)) 

ggplot2 building blocks

What is a "geom"?

What are "aesthetics"?

Aesthetics define a mapping between data and the display.

Thinking about aesthetics

Each geom has a different set of aesthetics.

What aesthetics might we need for {geom_point}?

Aesthetics for geom_point

{geom_point} understands the following aesthetics

  • x (required)
  • y (required)
  • alpha
  • color
  • fill
  • shape
  • size

Aesthetics for geom_line

What aesthetics might we need for {geom_line}?

Aesthetics for geom_line

{geom_line} understands the following aesthetics

  • x (required)
  • y (required)
  • alpha
  • color
  • linetype
  • size

A more involved example

qplot(Weight, Height, data=NHANES, 
      geom="smooth", se=FALSE,
      color=Race1,
      facets=.~Gender) 

qplot() vs ggplot

qplot(Weight, Height, data=NHANES, 
      geom="smooth", se=FALSE,
      color=Race1,
      facets=.~Gender) 
ggplot(NHANES) +
        geom_smooth(aes(x=Weight, y=Height, color=Race1), se=FALSE) +
        geom_point(aes(x=Weight, y=Height, color=Race1), alpha=I(.01)) +
        facet_grid(.~Gender)

How to get help with ggplot

Try it yourself!

Experiment with mplot - VERY useful!

library(mosaic)
mplot(NHANES)

Work on creating a data visualization with a teammate using the NHANES dataset.

By 3:30pm, you should have posted a final version of your data graphic on Piazza, along with the code you used to create it. A few of them will be chosen and discussed by the whole class.