1 Software installation
Welcome to the Data Challenge Lab. Our hope is that this will be a fine learning experience for you. Please complete the steps below before our first meeting. They will prepare you to use the latest versions of Git, R, and RStudio.
1.1 Mac computers
- Check that you have the program Applications > Utilities > XQuartz.
- If not, or if have an older version, download from the XQuartz download site and install.
- Xcode command line tools
- Run Applications > Utilities > Terminal.
- At the prompt, enter the following command followed by a return:
xcode-select --install
- If the command line tools are not already installed, you will be prompted with a dialog box. Click on Install.
1.3 Git
Git is the software we will be using to access GitHub. Follow the instructions below to download and install the latest version. Note: to avoid problems, do not install any of the Git GUI clients.
- Mac
- Download from Git download site.
- Click on the downloaded .dmg file.
- Right-click on the .pkg file and select Open.
- Follow installation instructions to install.
- Windows
- Download from Git download site and install.
- Installation should include the Git Bash emulator.
1.4 R
We will be using some new R packages that require the latest version of R. Follow the instructions below to download and install this version.
- If you already have R installed:
- Before installing a new version of R, save the names of your installed packages with the command
saveRDS(rownames(installed.packages()), file = "~/packages.rds")
- Before installing a new version of R, save the names of your installed packages with the command
- Download the latest version of R from the R download site and install.
- If you already had R installed:
- Restart R to launch the new installed version.
- Check to see if you now have the same packages with
if (!require("dplyr")) install.packages("dplyr")
dplyr::setequal(rownames(installed.packages()), readRDS("~/packages.rds"))
- If
TRUE
, skip to last step - If
FALSE
, install the missing packages from CRAN withinstall.packages(dplyr::setdiff(readRDS("~/packages.rds"), rownames(installed.packages())))
- If
dplyr::setequal(rownames(installed.packages()), readRDS("~/packages.rds"))
- If
TRUE
, skip to last step - If
FALSE
, install the missing packages not from CRAN. You can see the ones that are still missing withdplyr::setdiff(readRDS("~/packages.rds"), rownames(installed.packages()))
- If
- After you have installed all the packages you previously had, you can remove the file with the names of your installed packages with
unlink("~/packages.rds")
1.5 RStudio
If you don’t have RStudio, or don’t have the latest version:
- Download and install the latest version of RStudio Desktop from its download site.
RStudio is constantly being updated with new features. It’s a good idea to update when RStudio indicates that a new version is available.
1.6 R packages
The power of R comes from its thousands of packages, such as the tidyverse. Follow the instructions below to install some of the packages we will be using.
- Open up RStudio.
- At the R prompt in the console in the lower-left corner, install the following packages with the command
install.packages(c("tidyverse", "assertthat", "babynames", "broom", "compare", "DBI", "dbplyr", "devtools", "fivethirtyeight", "flexdashboard", "fs", "ggrepel", "googlesheets4", "hexbin", "httr", "janitor", "jsonlite", "knitr", "leaflet", "nycflights13", "RPostgreSQL", "rvest", "scales", "tidycensus", "usethis", "vroom", "yaml"))
- Here’s a description of the packages installed:
tidyverse
is a collection of packages, some of which we will study in depth.assertthat
is for making assertions within code.babynames
is a dataset of baby names.broom
is for working with models.compare
will be used in exercises and challenges to compare solutions.DBI
is for working with databases.dbplyr
is for working with databases.devtools
is a set of useful R development functions.fivethirtyeight
is a dataset from the 538 blog.flexdashboard
is for creating dashboards.fs
is for file system operations.ggrepel
is for automatically positioning labels in plots.googlesheets4
is for Google Sheets data.hexbin
is for plots with hexagonal bins.httr
is for web APIs.janitor
is for examining and manipulating data.jsonlite
is for JSON data.knitr
is for document generation.leaflet
is for interactive maps.nycflights13
is a dataset we will be using for the early exercises.RPostgreSQL
is for working with the PostgreSQL database.rvest
is for web scraping.scales
is for scaling functions for visualization.tidycensus
is for downloading U.S. Census data.usethis
is for workflow tasks such as configuring your computer.vroom
is for quickly reading and writing files.yaml
is for YAML data.
- Install the Data Challenge Lab (DCL) package.
remotes::install_github("stanford-datalab/dcl")
- Install a package with U.S. boundaries for maps
remotes::install_github("dcl-docs/ussf")
1.7 Update R packages
- Open RStudio.
- Hit the Packages tab in the pane in the lower-right corner, and then the Update button.
- Hit the Select All button, and then the Install Updates button.
- If you are prompted to restart your R session, select Yes.
- After the installations complete, your R packages will now be up to date.
- It’s a good idea to periodically repeat this procedure to update your packages.
1.8 Configure RStudio
- Carefully perform the following configuration steps:
- Tools (on RStudio menu bar) > Global Options…
- General > Basic
- Workspace
- Restore .RData into workspace at startup: unchecked
- Save workspace to .RData on exit: Never
- History
- Always save history (even when not saving .RData): unchecked
- Other
- Automatically notify me of updates to RStudio: checked
- Workspace
- General > Advanced
- OS Integration
- Rendering engine:
- Mac: Auto-detect (recommended)
- Windows: Desktop OpenGL
- Rendering engine:
- OS Integration
- Code
- Editing
- Insert spaces for tab: checked
- Tab width: 2
- Use native pipe operator, |> (requires R 4.1+): checked
- Display
- General
- Show margin: checked
- Margin column: 80
- General
- Saving
- General
- Ensure that source files end with newline: checked
- General
- Diagnostics
- R Diagnostics
- Show diagnostics for R: checked
- Enable diagnostics with R function calls: checked
- Provide R style diagnostics (e.g. whitespace): checked
- R Diagnostics
- Editing
- R Markdown
- Basic
- R Markdown
- Show output inline for all R Markdown documents: checked
- R Markdown
- Advanced
- Display
- Show inline toolbar for R code chunks: checked
- Display
- Basic
- General > Basic
- Tools (on RStudio menu bar) > Global Options…