Event box

Intermediate R: Cleaning and Reshaping Data (2 Parts)

Intermediate R: Cleaning and Reshaping Data (2 Parts) In-Person / Online

This 2-part workshop series for intermediate R programmers focuses on how to load and prepare data for analysis. We'll explore how to screen a data set for potential problems with its structure and data types, as well as how to correct these issues. For example, it is increasingly popular to to use datasets loaded from text files or scraped from the web, but these data often have formatting features that need additional processing before they can be used. Thus, we'll take a deep dive into R’s “stringr” package for text processing. Dates and times are another kind of data that can be difficult to work with, and we'll cover the basics of using the “lubridate” package for processing temporal data. You’ll also learn how to reshape data with structural problems and how to combine linked data sets.

This workshop is NOT an introduction to R and is intended for motivated intermediate to advanced learners from all domains at UC Davis who want to hone their R skills. Please make sure you meet the prerequisites before registering as we will be unable to answer introductory R questions during this session. (Want to brush up on R? Check out our R Basics 4-part introductory series in DataLab's workshop archive.)

The 2024 workshop dates are Tuesday, March 5 (Part 1) and Thursday, March 7 (Part 2). When you register please select both dates.

After completing this workshop series, learners should be able to:

  • Inspect data files to determine how best to load them into R;  
  • Identify and convert features to appropriate data types;
  • Use the “stringr” package clean and extract data from text;
  • Use regular expressions to describe patterns in text;
  • Use the “lubridate” package to parse dates and times;
  • Identify features of a "tidy" dataset;
  • Describe the advantages and disadvantages of tidy data;
  • Use the "tidyr" package to reshape data;
  • Describe what a join is;
  • Compare differences beween an inner join and left join;
  • Use the "dplyr" package to join two data sets on a common column;

Prerequisites

Participants must have taken DataLab’s “R Basics” workshop series and/or have prior experience using R, be comfortable with basic R syntax, and have it pre-installed and running on their laptops. This workshop involves live coding and requires active participation.

Can't make it to this training? Check out upcoming workshop schedule. Recordings of prior similar workshops are also available in DataLab's training archive.

Date:
Tuesday, March 5, 2024 Show more dates
Time:
10:00am - 12:00pm
Time Zone:
Pacific Time - US & Canada (change)
Location:
DataLab Classroom (Shields Library room 360) (Map )
Campus:
Davis Campus
Categories:
  DataLab Workshop  
Registration has closed.

Event Organizer

Profile photo of UC Davis DataLab
UC Davis DataLab

More events like this...