Most organizations running experiments on people — A/B tests, user studies, surveys, model evaluations — are making expensive decisions on designs that don't hold up. Underpowered samples. Broken randomization. Metrics that measure the wrong thing. Results that are noise dressed as signal.
I'm a methodologist who teaches this at the graduate level and writes the code to execute it. I review your design, tell you whether the result is real, and show you how to fix what isn't. The same rigor that goes into peer-reviewed experimental research — applied to the decisions your team is making this quarter.
I also work at the intersection most teams can't staff: measurement for AI systems. Model evaluations are human-subjects experiments. They need someone who understands both the statistics of valid measurement and the machine learning context. That combination is rare. It's what I do.
A/B tests, pricing experiments, user studies. Power analysis, randomization integrity, construct validity, multiple-comparison correction. I tell you if the result is real.
Designing valid model evaluations, eliminating bias in human rating protocols, and knowing whether "Model A beat Model B" survives scrutiny. Measurement rigor for ML teams.
Qualtrics builds with proper scale construction, randomization, and skip logic. Prolific sampling that gives you clean, usable data the first time.
You have data and need answers. I run the models in STATA or R and write up results you can defend — mediation, moderation, multilevel, longitudinal.
Dissertation and thesis methodology, R1 graduate-school applications, and the academic job market — research statements, teaching statements, writing samples.
Discovery calls are free. Project rates are quoted up front — no hourly surprises. Retainers available for ongoing work.
Tell me what you're trying to measure.