| ResponseId | Gender | Age | Q1_Anxiety | Q2_Calm | Q3_Worry |
|---|---|---|---|---|---|
| R_123 | Male | 19 | Strongly Agree | Strongly Disagree | 5 |
| R_124 | Female | 21 | Agree | Disagree | 4 |
| R_125 | Non-binary | twenty | Neutral | Neutral | 3 |
| R_126 | Male | 22 | Disagree | Agree | 2 |
| R_127 | Female | 19 | Strongly Agree | Strongly Disagree | 5 |
Module 1: Data Wrangling
You Can’t Publish What You Can’t Clean
The “Raw Data” Reality
Most Psychology data comes from Qualtrics or SurveyMonkey. It is usually “wide”, messy, and full of text labels where you want numbers.
Why is this hard to analyze?
- Metadata rows: Qualtrics gives 2-3 header rows (not shown).
- String Likert scales: “Strongly Agree” instead of 5.
- Reverse-scored items: An anxiety scale where “I feel calm” needs to be flipped.
- Typos: Notice the “twenty” in the Age column.
1. Simulating the Data in R
To practice cleaning, let’s create this exact dataset in R:
2. Cleaning with Tidyverse
We use the tidyverse package. It is readable, like a sentence.
3. The Reverse Conversion
Psychometrics requires converting text (“Strongly Agree”) to numbers (5), and sometimes flipping them (1 becomes 5).
The Rule of 6
To reverse code a 5-point scale: New_Score = 6 - Old_Score. (1 becomes 5, 2 becomes 4… 5 becomes 1).
4. Labeling
For flextable (Module 2) to look good, we need to attach “Labels” to the variables so “Age_Num” prints as “Participant Age”.