Lab #02: Data Visualization

due Sun, Jan 31 11:59 PM

Goals

Getting started

Clone the repo & start a new RStudio project

Update the YAML header with your name and today’s date. Then, knit the document and make sure the resulting PDF file has the correct date. Stage, commit, and push your changes.

Car Talk

The data we will examine is loaded automatically with the tidyverse. It is called mpg and contains fuel economy and characteristics of cars from the Environmental Protection Agency (EPA) from http://fueleconomy.gov.

To begin, familiarize yourself with the dataset by reading the documentation. Remember, you can pull up the documentation by running ?mpg in the console.

All plots should follow the best visualization practices we have discussed in lecture. Plots should include an informative title, axes should be labeled, and careful consideration should be given to aesthetic choices.

In addition, code and narrative should not exceed the 80 character limit. To help police this, add a vertical line at 80 characters by clicking “Tools” \(\rightarrow\) “Global Options” \(\rightarrow\) “Code” \(\rightarrow\) “Display”, then set “Margin Column” to 80 and click “Apply”.

Your assignment should have at least three meaningful commits and all code chunks should have meaningful names.

  1. Generate a scatterplot of city miles per gallon (cty) versus highway miles per gallon (hwy) with points colored by class.

  2. Note that there are only so many possibilities of highway and city miles per gallon, so some of the points are on top of each other. Using geom_jitter() or a position = argument in geom_point(), add a small amount of random variation to each point. Briefly comment on the differences between the plots you constructed in 1 and 2. What are the advantages and disadvantages of each?

  3. Examine the relationship between city and highway miles per gallon, with a separate plot for each type of drive train (drv).

  4. Create side-by-side boxplots of city miles per gallon for each class. Briefly comment on what you notice.

  5. Create a segmented bar chart with one bar per class, each bar going from 0 - 1, with the fill determined by the type of drive train (drv). What do you notice?

  6. Recreate the plot below. The functions theme_bw() and labs() will be helpful. The size of the points is 0.50.

Submission

Once you are fully satisfied with your lab, Knit to PDF to create a PDF document.

Before you wrap up the assignment, make sure all documents are updated on your GitHub repo. we will be checking these to make sure you have been practicing how to commit and push changes.

Remember – you must turn in a PDF file to the Gradescope page before the submission deadline for full credit.

Once your work is finalized in your GitHub repo, you will submit it to Gradescope. Your assignment must be submitted on Gradescope by the deadline to be considered “on time”.

Be sure to identify which problems are on each page using Gradescope.