
In this piece
Synthetic respondents are AI-generated simulations of customer voices, built by prompting a large language model to respond as a defined persona. They produce plausible answers instantly, at no fielding cost, which is why they've attracted serious attention and serious criticism from the research community in the last two years.
Key Takeaways
- Synthetic respondents regenerate the patterns in their training data; they do not reflect any real person's actual opinions or experiences
- A 2025 Columbia Business School study across 19 pre-registered sub-studies found that feeding an LLM 500+ answers from a real person improved accuracy by only 1.4 percentage points over a prompt with zero personalization
- They are legitimate tools for study design, guide rehearsal, and red-teaming a screener before fielding, not for primary research findings
- A synthetic respondent cannot surface a genuine surprise, a contradiction, or a silence: the three things qualitative research depends on most
- Using synthetic outputs as evidence of customer opinion is a category error that serious buyers are beginning to recognize and penalize
- The right frame: synthetic respondents as rehearsal space, real respondents as the source of truth
What Synthetic Respondents Are (and Aren't)
A synthetic respondent doesn't have opinions. It has a statistical model of what someone matching a description might plausibly say, drawn from the text it was trained on. That is genuinely useful for some things: testing whether a discussion guide is confusing, pressure-testing a screener, or exploring whether a concept framing surfaces coherent responses before you spend budget on real fielding.
What it cannot do is surprise you. A real participant in a qualitative interview says something you didn't expect, contradicts their own earlier answer, or goes silent when you expected disclosure. Those ruptures are where qualitative insight lives. Synthetic respondents reproduce the expected. The expected was already in your head.
The category error most vendors are skating toward: presenting synthetic outputs as directional consumer insight. They are directional only in the sense that a mirror is directional. they reflect what you put in front of them. A synthetic respondent is not a person wearing a persona. It is the base model wearing a thin demographic costume.
What the Research Actually Shows
The empirical foundation for synthetic respondents just got stress-tested at scale. A 2025 Columbia Business School study (Peng, Gui, Brucks, Johnson, Toubia, et al.) ran 19 pre-registered sub-studies across 1,784 real humans and their AI twins, where each twin was built on 500-plus prior answers totaling roughly 128,000 characters of personal psychometric data. The headline finding is stark: individual-level accuracy for full-persona twins was 0.748, compared to 0.734 for an empty persona with no personalization at all. Adding a person's complete psychological profile improved accuracy by 1.4 percentage points over telling the model nothing.
The study identified five systematic distortions that explain why. Twins fail to deviate enough from the base LLM's prior, a problem the authors call insufficient individuation. When they do deviate, they lean on demographic stereotypes rather than individual nuance. They overrepresent higher-educated, higher-income, moderate respondents, amplifying the WEIRD-population skew familiar from decades of academic research. They carry ideological biases in consistent directions. And they are hyper-rational: too coherent, too consistent, missing the heuristics, contradictions, and emotional noise that characterize actual human decision-making. Twins were under-dispersed relative to real humans in 93.9 percent of outcomes. The market was being sold individuation; it was getting stereotyping at higher resolution.
Run your next study on Enumerate.
See how Enumerate works on a study like yours. Book a 30-minute demo and we'll walk you through it.
Book a demoTailored to your use case
Where Synthetic Respondents Actually Help
The useful applications are narrower than the hype suggests, but they're real.
Guide design and rehearsal. Before fielding a depth interview study, running your discussion guide against a synthetic persona reveals structural problems: questions that confuse, sequences that don't build, probes that lead. This is the strongest legitimate use case.
Red-teaming a concept stimulus. If you're testing a product concept and want to know whether the stimulus is ambiguous before showing it to real participants, a synthetic run surfaces obvious interpretation problems cheaply.
Training new moderators. Practicing probing technique against a synthetic persona is safer than practicing on real participants. The AI won't file a complaint if the moderator stacks three questions in one sentence.
What these uses share: the synthetic respondent is a rehearsal tool, not a research instrument. The output informs process, not findings. Vendors who blur this line are selling fiction. Synthetic respondents are better understood as a complement to AI-moderated research at scale, not a replacement for it.
The Risk the Category Hasn't Priced In Yet
Synthetic respondents are smooth. LLMs write coherent, confident prose. That fluency is the danger. A synthetic research report reads better than a sloppy real one, which means the quality signal researchers rely on. "this sounds credible". no longer works as a filter.
The Columbia study's robustness checks ran across GPT-4.1, GPT-5, DeepSeek, and Gemini, including fine-tuned variants. Performance was similarly modest across all of them. This isn't a model-choice problem that a better LLM will solve next quarter. It's a structural limitation of simulating individuals from population-level training data. The field hasn't had its high-profile failure yet. When a brand makes a significant decision on synthetic "consumer insight" and the market behaves nothing like the simulated personas predicted, the category will have to answer for it publicly. As the debate around AI in research continues to sharpen, the vendors who drew the line early will be remembered for it.
For concept testing, message testing, or any study where findings drive real decisions, real respondents remain the only defensible source. Enumerate's AI-moderated asynchronous interviews give you the depth of qual with the reach you actually need, without asking you to trust a simulation.
See how AI-moderated research works in practice.
Related Reading

What High Accuracy in Transcription and Translation Actually Means
High accuracy in transcription and translation isn't just low error rates — it means your coding holds up, your themes are real, and your analysis travels across languages.
Read more
Vernacular Research Is an Architecture Problem
Vernacular research isn't solved by translation. Learn why the architecture of your research stack determines whether non-English insight is first-class or filtered.
Read more
Hybrid Product Research: Diary Studies + IDIs
Learn how combining diary studies with IDIs reveals both the texture of daily product use and the reasoning behind it. A practical guide for research teams.
Read more
Run your next study on Enumerate.
See how Enumerate works on a study like yours. Book a 30-minute demo and we'll walk you through it.
Book a demoTailored to your use case