Final Project

A data analyst is often assigned projects that are open-ended, based on a messy data set (or a collection of messy data sets), and deal with content far outside the analyst’s comfort zone. The final project of the course is meant to simulate a real-life, on-the-job analysis project as closely as possible.

Project Description:

In 2021, a group of researchers published a report entitled “Maple Reproduction and Sap Flow at Harvard Forest since 2011.” Their project entailed collecting data from a group of maple trees in Harvard Forest in central Massachusetts over a several-year period. The data was meant to describe the trees’ seed production and other dynamics. Visit this web site for a full citation for their report, a summary of their project, a map of Harvard Forest, and (most importantly for you) their data.

The question you are to try to answer is the one posed in the research abstract: Does the non-masting red maple species exhibit muted dynamics compared to the masting sugar maple species?

Instructions:

Instead of a numbered list of objectives (like the ones given in previous projects), below are a few suggestions to keep in mind as you work. This is to give you some flexibility in the way you answer the question and to give you the chance to fully display what you’ve learned in the course.

  • Spend some time learning about the context of the problem.
  • Familiarize yourself with each of the data sets, making sure to carefully read the “metadata,” which describes the content of the data sets.
  • Be ready to use the full range of R capabilities you’ve picked up during the course: visualization, transformation, cleaning, programming, and modeling. This is your chance to show what you can do with data. (But don’t try to use everything you’ve learned, only what is relevant.)
  • Overall, approach this project as though you were a professional data analyst preparing a report for your manager.

Guidelines:

You are to prepare an R Markdown report which adheres to all of the guidelines listed in the projects from Section 1.10, Section 2.7, and Section 5.5. Also:

  • Per the request of the researchers, include the citation found on the web site in your introduction or as a footnote or end reference.

Grading Rubric:

  • Context: Does your report describe and explain the background context enough to make your analysis understandable? Is it clear that you know what your data is about? (10 points)
  • Narrative: Is it clear what you’re trying to accomplish with your project? Do you maintain a readable narrative throughout that tells the story of your analysis? (20 points)
  • R Knowledge: Do you display a strong command of R in your work? Do you use the right functions, techniques, and practices throughout? (30 points)
  • Learning Outcomes: Do you make a convincing case that you’ve achieved the learning outcomes for this course, as stated on the course syllabus? (30 points)
  • Professionalism: Does your report look nice? Do you provide insights based on your analysis? Is your code clear and readable? Do you adhere to the guidelines listed above? Would your manager be pleased with your work? (20 points)