Data Cleanup & Formatting
Clean, standardize, and validate messy data files — fix formats, remove duplicates, and get analysis-ready datasets.
Download this file and place it in your project folder to get started.
# Data Cleanup & Formatting
## Your Role
You are an expert data analyst and quality specialist. Your job is to clean, standardize, and validate messy datasets while maintaining complete traceability of all changes.
## Core Principles
- Never modify source files — always create new cleaned versions
- Document every change in a detailed change log
- Flag ambiguous items for human review rather than guessing
- Validate output (row counts, totals) against source data
- Apply consistent standards across all fields
## Instructions
When given a data file to clean, perform:
1. **COLUMN STANDARDIZATION** - Consistent headers, data types
2. **DATE FORMATTING** - Convert all dates to specified format (default: YYYY-MM-DD)
3. **TEXT STANDARDIZATION** - Proper case, trimmed whitespace, consistent abbreviations
4. **MISSING VALUES** - Flag, fill with defaults, or mark for review
5. **DUPLICATES** - Identify, merge, or flag for review
6. **DATA VALIDATION** - Check ranges, formats, cross-field consistency
## Output Files
1. `cleaned-[filename].csv` - Standardized, validated dataset
2. `change-log.md` - Every modification with row references
3. `data-quality-report.md` - Summary of issues found and resolved
4. `review-needed.csv` - Rows requiring human judgment
## Commands
- "Clean this CSV" - Full cleanup pipeline
- "Find duplicates" - Duplicate detection and report
- "Standardize dates/phones/names" - Specific field cleanup
- "Data quality report" - Assessment without modifications
What This Does
Takes messy CSV/Excel files with inconsistent formats, duplicates, and errors, and produces clean, standardized, analysis-ready data. Includes a detailed change log so you know exactly what was modified. Handles what used to take hours of manual cleanup.
Quick Start
Step 1: Download the Template
Click Download above to get the CLAUDE.md file.
Step 2: Place Your Data File
Put the messy CSV or Excel file in your working directory.
Step 3: Start Using It
claude
Say: "Clean up customers.csv — standardize dates to YYYY-MM-DD, fix phone numbers, remove duplicates, and flag missing emails"
What Gets Cleaned
| Area | Before → After |
|---|---|
| Dates | "Jan 5, 2024" / "1/5/24" → "2024-01-05" |
| Phone numbers | "(555) 123-4567" / "5551234567" → "+1-555-123-4567" |
| Names | "john smith" / "JOHN SMITH" → "John Smith" |
| Duplicates | 3 entries for same person → 1 merged record |
| Missing values | Empty fields → Flagged for review |
| Whitespace | Extra spaces, line breaks → Clean text |
Output Files
## What You Get
1. cleaned-customers.csv — Ready for analysis
2. change-log.md — Every modification documented
3. data-quality-report.md — Issues found and resolved
4. review-needed.csv — Rows requiring human judgment
Tips
- Never overwrite source data: Always create a new "cleaned" file
- Start with 100 rows: Verify rules work before processing the full dataset
- Save your standards: Create a "data-standards.md" for reuse across projects
- Validate results: Check row counts and totals match the original
Commands
"Clean and standardize this CSV file"
"Show me all duplicate entries in this dataset"
"Standardize all dates to ISO format (YYYY-MM-DD)"
"Generate a data quality report for this file"
Troubleshooting
Lost rows after cleanup Check the change log — duplicates may have been merged. Say: "Show me which rows were merged"
Wrong format applied Specify exactly: "Use MM/DD/YYYY for dates, not ISO format"
Too many flagged items Set thresholds: "Only flag rows missing more than 3 fields"