For this project, you will work with a partner to reproduce a published data analysis (blog, academic paper, etc…). You will need to find an analysis that has made both data and code publicly available. The instructor must approve the dataset. There will be two central deliverables for this project:

  1. The reproduced analysis, in the form of a “blog post” report (i.e. an html page).
  2. A second, “follow-up” report that provides some additional, original data analysis and visualizations that you create. The report will be created as a single file, but each individual should take charge of creating one piece of the story that this report tells.

Timeline of the project

  • Thurs 2/4: teams assigned
  • Tues 2/9: dataset proposals due
  • Thurs 2/18: draft of the reproducible analysis due at classtime, brief (<2 min) in-class presentation describing technologies/methods used, and challenges faced so far.
  • Tues 2/23: draft of the follow-up report due at 5pm.
  • Tues 3/1: Final reports due in class, 5-10 minute in-class presentation for each report.

Handing in files for this project

To hand in the draft of your reproducible analysis, one member of your team will need to use RMarkdown to create an HTML document. Note that when you create/”knit” the file in RStudio, you will see a little “publish” button on the top right corner of the file. Click this button. You will be prompted to create an “RPubs” account (please do so) and then the file can be shared via a public URL. Once the file has been uploaded, verify the link and send the link to the instructors/TAs via a Piazza message. Please note, if you would rather not use your full names on the documents, please either just use your first names or your team name.