What It Does
This automated “Governance Engine” acts as a firewall for HR data, ensuring 100% integrity before data enters the analytics warehouse. It uses a “Chaos Engineering” approach, intentionally injecting errors to prove the system’s ability to catch them.
The Problem It Solves
HRIS migrations and analytics projects often fail due to “dirty data”—human errors like typos (e.g., negative ages), impossible tenure dates, or orphaned records. Manual auditing of thousands of rows is slow, expensive, and error-prone.
How It Works
The pipeline is built in Python using Pandas. It implements a two-stage process:
- Chaos Monkey: Intentionally corrupts a sample of clean data with common HR errors (typos, logic conflicts).
- Audit Engine: A strict validation layer that must catch 100% of the injected errors to pass the build.
Key Features
- Chaos Monkey Simulation: Stresses the system by injecting random logic errors (e.g., “Start Date > End Date”).
- Strict Schema Validation: Enforces business logic rules (e.g., “Director level must have >5 years tenure”).
- Executive Health Scorecard: Automatically generates a “Data Health” report with actionable cleanup tasks.
Results / Impact
Achieved 100% data trust for the People Analytics team by catching critical data quality issues before they polluted the dashboard, saving an estimated 40 hours of manual cleanup per month.
Tech Stack
| Layer | Technology |
|---|---|
| Logic | Python / Pandas |
| Testing | PyTest (Chaos Monkey) |
| Reporting | Markdown / Pandas Profiling |
| Deployment | Local Script / CI Pipeline |
