Final Project

A data analyst is often assigned projects that are open-ended, based on a messy data set (or a collection of messy data sets), and deal with content far outside the analyst’s comfort zone. The final project of the course is meant to simulate a real-life, on-the-job analysis project as closely as possible.

Project Description:

In 2021, a group of researchers published a report entitled “Maple Reproduction and Sap Flow at Harvard Forest since 2011.” Their project entailed collecting data from a group of maple trees in Harvard Forest in central Massachusetts over a several-year period. The data was meant to describe the trees’ seed production and other dynamics. Visit this web site for a full citation for their report, a summary of their project, a map of Harvard Forest, and (most importantly for you) their data.

The question you are to try to answer is the one posed in the research abstract: Does the non-masting red maple species exhibit muted dynamics compared to the masting sugar maple species?


Instead of a numbered list of objectives (like the ones given in previous projects), below are a few suggestions to keep in mind as you work. This is to give you some flexibility in the way you answer the question and to give you the chance to fully display what you’ve learned in the course.

  • Spend some time learning about the context of the problem.
  • Familiarize yourself with each of the data sets, making sure to carefully read the “metadata,” which describes the content of the data sets.
  • Be ready to use the full range of R capabilities you’ve picked up during the course: visualization, transformation, cleaning, programming, and modeling. This is your chance to show what you can do with data. (But don’t try to use everything you’ve learned, only what is relevant.)
  • Overall, approach this project as though you were a professional data analyst preparing a report for your manager.


  • Include a title, your name, and the date in the heading of your R Markdown file.
  • Also include a preliminary code chunk in which you load the libraries you’ll need, and briefly say what each one is for.
  • Begin with an introductory paragraph containing, at least, a description of the data sets (including what they contain, their size, and their source) and nicely displayed data tables using the datatable function. You should also include the citation found on the web site as a footnote or end reference. (If the data set is too big to display when you try to knit your .Rmd file, you don’t have to display it.)
  • Clearly describe what you’ll be doing with the data, and include any questions you’ll be trying to answer.
  • Follow the introduction and problem statement with the body of your report which will contain your work, broken into relevant section headings.
  • Carefully describe any preliminary cleaning you do to prepare the data for your analysis.
  • The compiled report should show all of your R code and its output, but it should not show any warning or error messages.
  • The body should also include text that provides a running narrative that guides the reader through your work. This should be thorough and complete, but it should avoid large blocks of text.
  • All graphics should look nice, be easy to read, and be fully labelled.
  • You should include insightful statements about the meaning you’re extracting from your transformations, models, graphics, and statistical calculations.
  • End with an overall concluding statement which summarizes your findings. Be sure to disclose any shortcomings of your analysis.

Grading Rubric:

  • Context: Does your report describe and explain the background context enough to make your analysis understandable? Is it clear that you know what your data is about? (10 points)
  • Narrative: Is it clear what you’re trying to accomplish with your model? Do you maintain a readable narrative throughout that tells the story of your analysis? (20 points)
  • R Knowledge: Do you display a strong command of R in your work? Do you use the right functions, techniques, and practices throughout? (30 points)
  • Learning Outcomes: Do you make a convincing case that you’ve achieved the learning outcomes for this course, as stated on the course syllabus? (30 points)
  • Professionalism: Does your report look nice? Do you provide insights based on your analysis? Is your code clear and readable? Are all graphics fully and correctly labeled? Would your manager be pleased with your work? (20 points)