Almost no one is building a synthetic user publicly and telling you what they're finding. So that's what we're doing, across four parts, start to finish.
Four editions · One live experiment
The vocabulary, three distinct build approaches from our team, an expert conversation, and the priors I'm taking into the experiment.
The data audit, the rigor framework I'm applying, the step-by-step build, and what I'm learning along the way.
Same study, two panels: a synthetic one and a real one we recruit through Great Question. The results, side by side.
When synthetic users earn a spot in your workflow. The full guide, and a Claude skill you can run yourself.
For the past few months, I've been reading every take on synthetic users I can find. NN/G's "If, When, and How" piece. The ACM Interactions article on the people-pleasing problem. Park et al's 86% accuracy paper. The same LinkedIn debate playing out on repeat. A pile of accuracy studies that sometimes contradict each other.
Most of it is people arguing whether synthetic users should exist.
Great Question has a front-row view of this debate. Every day, our customers run extensive research with their own users and panels they recruit and schedule inside our product. And every day, we watch them build AI workflows on top of our MCP that surprise us, including their own synthetic users.
We hold a strong opinion that nothing replaces watching a person fumble through a prototype, or the magic that happens during a conversation.
But those same customers are asking us how to use AI to move faster. A lot of them are sitting on years of interview transcripts, survey data, support tickets, and product usage data. They want to know if any of that can act as a stand-in for an interview when the real one isn't possible.
Our position going in: synthetic personas, synthetic users, and synthetic panels are becoming a core layer of research and product workflows over the next year or less. The synthetic layer makes research cheaper to iterate on and faster to validate, and at its best, surfaces the gaps where new research is still required.
The best ones build on real customer data. The ones you generate from a prompt alone, or pull from a synthetic-user tool that doesn't know your audience, are a statistical impression of a demographic an LLM read about online. They answer like it. For the sake of this series, I am not exploring a ChatGPT or LLM-based synthetic user, as I believe these have no place in any product workflow.
So we're building ours and telling you what we find:
Part 1 (this one): the terminology, the workflows I'm considering based on conversations with our product and engineering teams, and the hypothesis I have going into this live experiment.
Part 2: how I'm building a synthetic user, persona, or panel. The audit I'm running on our own data, the rigor framework I'm applying, the step-by-step build, and what I'm learning along the way.
Part 3: the head-to-head. Same study, two panels: a synthetic one and a real one we recruit through Great Question.
Part 4: the recap. A decision tree for when synthetic personas earn a spot in your workflow, the full guide, and a Claude skill you can run yourself.
First, the vocabulary
Synthetic user. Synthetic persona. Synthetic panel. Digital twin. They get used interchangeably across LinkedIn, academic papers, and product docs, but they describe completely different things, with completely different methods behind them.
Getting crisp on which is which is the first step.
Here are the three terms I'm using across the series:
An archetype or representation of a group of users grounded in real research evidence. Describes a type of user, not a specific person.
A specific user with a name, attributes, voice, and behaviour. You either sample one from a persona, or clone a real user as a digital twin.
A group of synthetic users running through the same study together. The synthetic equivalent of a recruited panel.
Three distinct approaches
Once I've ruled out the bad versions, I sit down with two people inside Great Question who've been closer to this than anyone: Mark on our product team, and Jack who built the MCP and has been talking to customers about it for months.
Between them, Mark and Jack name three distinct workflows for building a grounded synthetic user/persona/panel.

AI Product Manager @ Great Question

AI Product Manager @ great question
Take a real user you know well, strip out the personal stuff, store them as a synthetic user doc, and instruct the agent to play that role. Narrow but powerful.

Aggregate 8-10+ real users into a synthetic persona document, creating a defensible archetype of a power user vs a casual user, or another segmentation. The aggregation itself protects privacy.

The persona doesn't exist until you query it. It polls our MCP live, contextualises whatever it pulls based on the input you feed it (a PRD, a design file, a prototype) and comes back with customer evidence.
Calling in the expert
week in, I called Caitlin Sullivan, a UX researcher who's been digging into AI workflows for the last few years, writes one of the sharpest newsletters on AI in research, and is running her own synthetic-user experiments on pricing and messaging.She's read the studies more closely than anyone I've spoken to. Although she did note that she’s hesitant to be seen as an expert, because no one is really defining this space yet.
What I love about Caitlin’s experiments though is that she’s taking it from a pragmatic lens, which is perfect for product teams who want to understand the practical application of synthetic users, not just the academic lens.
Five useful insights I documented from that conversation:

UX Researcher · AI workflow specialist
Author, AI Customer Research newsletter
· Running her own synthetic-user experiments on pricing and messaging.
"If you dig deeply into the methodology, nobody is doing anything the same way. It's all just kind of winging it and putting together their own completely unique methodology for how to go about replicating human behaviours."
The famous Stanford / Google paper replicated 1,052 humans. Everyone quotes 85%.
Here's what's underneath that number: Humans only match their own survey answers ~81% of the time two weeks later.
Raw synthetic-user accuracy was 68–69%.
85% is 68 divided by 81 — normalised against the wobbly human baseline.
"They were wrong one out of three times. That doesn't sound particularly confidence-inspiring."
Caitlin was most helpful in framing my understanding here.Academic studies design their methodology forward: collect interviews, then collect a separate "holdout" set of survey questions, then compare.
Working backwards from a pile of existing transcripts breaks the comparison.
Different questions get pulled from different interviews. Accuracy drops. Nothing is apples-to-apples.
"There isn't really a standardised way of measurement in general, and definitely not for starting with a pile."
"Using it to test a research study, it's one of the lowest risk use cases. What's the worst that could happen? It's just highlighting weaknesses or things you didn't think about."
Synthetic users can tell you somewhat whether users would prefer A or B. But they’re somewhat unreliable by telling you just how much. Caitlin relayed that is because we can only speak to the studies conducted so far, which aren’t conclusive for the 1000’s of other potential cases, and maybe they’ll even be able to predict magnitude too at some point.
Ask whether you can raise prices — useful directional read.
Ask whether you can raise them by $5 or $10 — falls apart.
She added one more frame: synthetic users are better at logical decisions (budget fit, seat licensing, workflow constraints) than emotional ones. The more emotional the call, the less accurate the prediction.
While Mark and Jack come at building a synthetic user from different angles, they both land on the same recommendation: early directional feedback before you commit human time or budget.
Take any artifact (a survey, a PRD, a design, a concept) and run it past a synthetic panel first. What comes out:
If you set up the synthetic user skill to require evidence behind every claim, and the persona can't make a claim because the evidence isn't there, that's actually a really good thing. AI is flagging a research opportunity for you.

Mark independently describes almost the same workflow, from the study-design angle: every study should start with a synthetic round before you go to humans. At worst, it stress-tests the questions and surfaces issues with the study design. At best, the AI panel comes back unanimous on a low-stakes question and you answer it without spending a dollar on recruitment. Most of the time it sits somewhere in the middle: cheaper iteration, and a much sharper study that you put in front of real people.
His framing on the economics: you've spent a little on tokens, but nothing on recruitment fees, incentives, or any of that overhead. It's a good way to raise the bar of quality across every study you run.
My priors going in
These aren't technically hypotheses. They're documented thoughts. The point of the next three editions is to find out which of these survive contact with actual data.

If you're already building something like this, I want to hear what you've tried and where you've hit walls.
See you in part 2!
- Tania, PMM @ Great Question
Part 2 lands soon which includes the data audit I'm running, the step-by-step build, and what I'm learning as I go.