Unless stated explicitly, the assignments below will not be “collected”. However, you should always expect a quiz on the assigned homework.

Due week 13

Due week 12

Due week 10

  • Read Matloff chapter 3.3 (on the apply family of functions in R).
  • Complete two modules from the swirl R Programming course: 7: lapply and sapply 8: vapply and tapply
  • Submit a “final” version of your writeup via piazza. A few specific requirements:
    • Submit a pdf file, no more than 4 pages.
    • Be sure to include your name and the date on the first page of the report.
    • Keep the report succinct. Remove unnecessary messages and output from the report using the echo=FALSE, warnings=FALSE, and/or messages=FALSE options in your code chunks. Do show packages that you load (if any) and enough of your code so that someone could reproduce your analysis.
    • Keep tidy section headers, you should have at least three sections, entitled “Introduction”, “Data Analysis” (including your discussion of tidy data, data visualizations, and summary statistics), and “Permutation Test” (see below).
    • Run a permutation test on one aspect of your data. It does not necessarily need to be a linear regression as shown in class. You could examine correlation between two variables, a regression coefficient, a contingency table association, or another method of your choosing. You are welcome to post a question to instructors on Piazza about the suitability of your test, although this is not required. The primary goal here is to gain experience running a simulation, although your statistical test must be appropriate for your data (i.e. no linear regressions on a binary outcome variable and no calculating correlations on binary data). Your report should show (1) the code you used to run the simulation, (2) a graphic showing the distribution of permuted values, (3) a p-value calculated based on your comparison of the estimate from the real data with the permutation distribution.

Due week 9

  • Read Hadley Wickham’s paper on tidy data.
  • Read Chapter 7.1 “Control Statements” in Matloff. This provides an introduction to for and while loops, and also if/then statements in R.
  • Add the following two components to your writeup that you began last week and resubmit the new document via Piazza:
    • A discussion of whether your dataset is “tidy” or not. Define the variables (only the key ones of them if you have a lot in your dataset), and label the variables as fixed or measured. Also define what an “observation” means in the context of your dataset. Discuss what, if any, of the rules of tidy data are violated in your dataset.
    • A timed comparison of different ways to create a new column in your dataset. This kind of comparison is often refferred to as code “profiling”. You should use this script as a starting point for the code to insert into your writeup. The main idea here is to compare the time it takes to create a new column using mutate(), versus using a vectorized calculation not in mutate, versus a for loop to create each item separately. When you have your results, in addition to including them in a new version of your writeup, enter them into the Google Doc spreadsheet that I created for tabulating our results.

Due week 8

  • Install and complete the “Getting and Cleaning Data” course (four modules) from swirl. To install the course, you can run the following commands in R:

library(swirl)

install_from_swirl("Getting_and_Cleaning_Data")

swirl()

  • Use your dataset from the previous week and create an RMarkdown writeup about your dataset. Include the description that you wrote from the previous week’s assignment, and also include some descriptive statistics calculated in your document, as well as two figures that illustrate key features of your dataset.

Due week 7

Due week 6

  • Watch the five ggplot2 videos at this playlist
  • Read Framingham Heart Study dataset documentation
  • Download the FHS dataset (instructions will be received via email), make a data graphic of your choosing from this dataset. Post the graphic on Piazza by Monday afternoon (Oct 6th) at 5pm.

Due week 4

Due week 3

4: Missing Values

5: Subsetting Vectors

6: Matrices and Data Frames

Due week 2

1: Basic Building Blocks

2: Sequences of Numbers

3: Vectors