LIVE EXPERIMENT - PART 1 of 4

We're building a synthetic user
in public.
This is our live experiment.

Almost no one is building a synthetic user publicly and telling you what they're finding. So that's what we're doing, across four parts, start to finish.

Tania Clarke
Tania
PMM · Great Question
June 2026~12 min read

Four editions · One live experiment

We're building in public,
start to finish.

YOU ARE HERE
01

Terminology, workflows & hypothesis

The vocabulary, three distinct build approaches from our team, an expert conversation, and the priors I'm taking into the experiment.

02

The build: audit, framework & step-by-step

The data audit, the rigor framework I'm applying, the step-by-step build, and what I'm learning along the way.

03

The head-to-head

Same study, two panels: a synthetic one and a real one we recruit through Great Question. The results, side by side.

04

The recap & decision tree

When synthetic users earn a spot in your workflow. The full guide, and a Claude skill you can run yourself.

For the past few months, I've been reading every take on synthetic users I can find. NN/G's "If, When, and How" piece. The ACM Interactions article on the people-pleasing problem. Park et al's 86% accuracy paper. The same LinkedIn debate playing out on repeat. A pile of accuracy studies that sometimes contradict each other.

Most of it is people arguing whether synthetic users should exist.

What we're seeing from where we sit

Great Question has a front-row view of this debate. Every day, our customers run extensive research with their own users and panels they recruit and schedule inside our product. And every day, we watch them build AI workflows on top of our MCP that surprise us, including their own synthetic users.

We hold a strong opinion that nothing replaces watching a person fumble through a prototype, or the magic that happens during a conversation.

But those same customers are asking us how to use AI to move faster. A lot of them are sitting on years of interview transcripts, survey data, support tickets, and product usage data. They want to know if any of that can act as a stand-in for an interview when the real one isn't possible.

Our position going in: synthetic personas, synthetic users, and synthetic panels are becoming a core layer of research and product workflows over the next year or less. The synthetic layer makes research cheaper to iterate on and faster to validate, and at its best, surfaces the gaps where new research is still required.

The best ones build on real customer data. The ones you generate from a prompt alone, or pull from a synthetic-user tool that doesn't know your audience, are a statistical impression of a demographic an LLM read about online. They answer like it. For the sake of this series, I am not exploring a ChatGPT or LLM-based synthetic user, as I believe these have no place in any product workflow.

So we're building ours and telling you what we find:

Part 1 (this one): the terminology, the workflows I'm considering based on conversations with our product and engineering teams, and the hypothesis I have going into this live experiment.

Part 2: how I'm building a synthetic user, persona, or panel. The audit I'm running on our own data, the rigor framework I'm applying, the step-by-step build, and what I'm learning along the way.

Part 3: the head-to-head. Same study, two panels: a synthetic one and a real one we recruit through Great Question.

Part 4: the recap. A decision tree for when synthetic personas earn a spot in your workflow, the full guide, and a Claude skill you can run yourself.

First, the vocabulary

Synthetic user. Persona. Panel.
They're not the same thing.

Synthetic user. Synthetic persona. Synthetic panel. Digital twin. They get used interchangeably across LinkedIn, academic papers, and product docs, but they describe completely different things, with completely different methods behind them.
Getting crisp on which is which is the first step.

Here are the three terms I'm using across the series:

Term 01

Synthetic Persona

An archetype or representation of a group of users grounded in real research evidence. Describes a type of user, not a specific person.

Example: "The senior UX researcher at a B2B SaaS company with 500-5,000 employees."
Term 02

Synthetic User

A specific user with a name, attributes, voice, and behaviour. You either sample one from a persona, or clone a real user as a digital twin.

Example: "Sarah, 38, lead UXR at a 2,000-person SaaS, frustrated by tool sprawl."
Term 03

Synthetic Panel

A group of synthetic users running through the same study together. The synthetic equivalent of a recruited panel.

Example: "10 synthetic users completing the same survey."

Three distinct approaches

Three ways to build, and they're very different.

Once I've ruled out the bad versions, I sit down with two people inside Great Question who've been closer to this than anyone: Mark on our product team, and Jack who built the MCP and has been talking to customers about it for months.

Between them, Mark and Jack name three distinct workflows for building a grounded synthetic user/persona/panel.

Mark

AI Product Manager @ Great Question

Jack

AI Product Manager @ great question

1

Digital twin.

Take a real user you know well, strip out the personal stuff, store them as a synthetic user doc, and instruct the agent to play that role. Narrow but powerful.

Jack's framing: "Take a real person and complete a study as them." Works well any time you need to replay a specific person's perspective on a product direction.
Jack
2

Persona-generated.

Aggregate 8-10+ real users into a synthetic persona document, creating a defensible archetype of a power user vs a casual user, or another segmentation. The aggregation itself protects privacy.

Mark's example: "Generate me a power user and it will make up Bob, who's 53, does this, thinks that." Best when the panel needs to be shareable across a team.
Mark
3

Live synthetic retrieval via skill.

The persona doesn't exist until you query it. It polls our MCP live, contextualises whatever it pulls based on the input you feed it (a PRD, a design file, a prototype) and comes back with customer evidence.

The interesting one: no persistent persona. Works in flow with what you're already working on, instead of treating research as a separate exercise. One customer is already running this in production.

Calling in the expert

Five insights from someone who's
read every study.

 week in, I called Caitlin Sullivan, a UX researcher who's been digging into AI workflows for the last few years, writes one of the sharpest newsletters on AI in research, and is running her own synthetic-user experiments on pricing and messaging.She's read the studies more closely than anyone I've spoken to. Although she did note that she’s hesitant to be seen as an expert, because no one is really defining this space yet.

What I love about Caitlin’s experiments though is that she’s taking it from a pragmatic lens, which is perfect for product teams who want to understand the practical application of synthetic users, not just the academic lens. 

Five useful insights I documented from that conversation:

Caitlin Sullivan

Caitlin Sullivan

UX Researcher · AI workflow specialist
Author, AI Customer Research newsletter
· Running her own synthetic-user experiments on pricing and messaging.

1

Nobody has a settled methodology. Not even the academics.

"If you dig deeply into the methodology, nobody is doing anything the same way. It's all just kind of winging it and putting together their own completely unique methodology for how to go about replicating human behaviours."
Something works in one study and you can't compare it to the next one. Different measurement, different results, every time.
2

The "85% accuracy" headline is misleading.

The famous Stanford / Google paper replicated 1,052 humans. Everyone quotes 85%.
Here's what's underneath that number: Humans only match their own survey answers ~81% of the time two weeks later.

Raw synthetic-user accuracy was 68–69%.

85% is 68 divided by 81 — normalised against the wobbly human baseline.

"They were wrong one out of three times. That doesn't sound particularly confidence-inspiring."

81%
Humans matching their own answers two weeks later
68%
Raw synthetic-user accuracy
85%
68 / 81, normalised against the wobbly baseline
"They were wrong one out of three times. That doesn't sound particularly confidence-inspiring." (Caitlin)
3

Building from an existing pile is harder to measure.

Caitlin was most helpful in framing my understanding here.Academic studies design their methodology forward: collect interviews, then collect a separate "holdout" set of survey questions, then compare.

Working backwards from a pile of existing transcripts breaks the comparison.

Different questions get pulled from different interviews. Accuracy drops. Nothing is apples-to-apples.

"There isn't really a standardised way of measurement in general, and definitely not for starting with a pile."

Her advice: run small experiments, predict what real users will do, then track whether predictions came true. Longitudinal validation, not lab measurement.
4

The lowest-risk use case is the right place to start.

"Using it to test a research study, it's one of the lowest risk use cases. What's the worst that could happen? It's just highlighting weaknesses or things you didn't think about."
Same answer my own team gave me. Stress-test a study or artifact before humans see it.
5

Directional yes. Magnitude not yet.

Synthetic users can tell you somewhat whether users would prefer A or B. But they’re somewhat unreliable by telling you just how much. Caitlin relayed that is because we can only speak to the studies conducted so far, which aren’t conclusive for the 1000’s of other potential cases, and maybe they’ll even be able to predict magnitude too at some point. 

Ask whether you can raise prices — useful directional read.
Ask whether you can raise them by $5 or $10 — falls apart.

She added one more frame: synthetic users are better at logical decisions (budget fit, seat licensing, workflow constraints) than emotional ones. The more emotional the call, the less accurate the prediction.

Two of my priors reshaped. Two others confirmed.

An early insight into where synthetic users fit

While Mark and Jack come at building a synthetic user from different angles, they both land on the same recommendation: early directional feedback before you commit human time or budget.

Take any artifact (a survey, a PRD, a design, a concept) and run it past a synthetic panel first. What comes out:

  • Directional answers in minutes, and a clear view of where the synthetic panel and real users would actually diverge.
  • The gaps where the panel can't answer with confidence. These become your next research brief.
If you set up the synthetic user skill to require evidence behind every claim, and the persona can't make a claim because the evidence isn't there, that's actually a really good thing. AI is flagging a research opportunity for you.
Jack · Product Manager, Great Question MCP
Jack

Mark independently describes almost the same workflow, from the study-design angle: every study should start with a synthetic round before you go to humans. At worst, it stress-tests the questions and surfaces issues with the study design. At best, the AI panel comes back unanimous on a low-stakes question and you answer it without spending a dollar on recruitment. Most of the time it sits somewhere in the middle: cheaper iteration, and a much sharper study that you put in front of real people.

His framing on the economics: you've spent a little on tokens, but nothing on recruitment fees, incentives, or any of that overhead. It's a good way to raise the bar of quality across every study you run.

My priors going in

Four beliefs I'm taking into the experiment.

These aren't technically hypotheses. They're documented thoughts. The point of the next three editions is to find out which of these survive contact with actual data.

1. Can synthetic users be trusted?
Prior
Trustworthy enough for early, directional signal. Not for high-stakes product decisions.
Risk
False confidence and wasted time.
Updated after Caitlin: her framing of the magnitude problem makes me more confident this is right for directional questions and wrong for any "by how much" question.
2. What's the best mix of underlying data?
prior
Right now, I believe that interview transcripts plus survey data plus behavioural data is the right blend underneath. interview transcripts will do the heavy lifting on language and pain points and it will be the closest we get to replicating a real user. Jack also pointed to a volume threshold I think matters: at least 8–10 interviews to back any meaningful claim. Below that, the persona is probably guessing and the skill should flag it as low-confidence. This is the foundation of the rigor framework I'll build out in Edition 2.
RISK
We might miss a key category (there's a world where I'd love to add in sales-call themes or support tickets too, but at this stage I'm not sure it's feasible).
Updated after Caitlin: the richer the data, the better the results, but the format matters too. Synthetic users get numerical and rating-scale answers somewhat right far more often than they get the reasoning behind them. Interviews are still the heavy lift, but a satisfaction score and the reason for that score are not the same prediction problem.
3. Where and when should you use them?
Prior
Synthetic personas earn their keep when used early in the product building process, but only as a testing mechanism.
Risk
Almost certainly true based on what Mark and Jack are both seeing in practice.
4. Static or dynamic? Does consistency matter?
Prior
Genuinely unsure. It can't be ideal that everyone gets a slightly different answer if they're relying on live retrieval. If team consistency matters (and in enterprise contexts it usually does), you probably need persona and user docs as a consistent data source, stored somewhere queryable like a GitHub repo or a research repository.
Risk
Live retrieval drifts across teammates. Static docs go stale and stop matching reality. Neither is clearly better yet.
Caitlin Sullivan

If you're already building something like this, I want to hear what you've tried and where you've hit walls.

See you in part 2!

- Tania, PMM @ Great Question

The build continues.
Follow along.

Part 2 lands soon which includes the data audit I'm running, the step-by-step build, and what I'm learning as I go.

01
Now · Terminology & hypothesis
Vocabulary, three workflows, Caitlin's insights, priors going in.
02
Soon · The build
Data audit, rigor framework, step-by-step, what I learn.
03
Coming · Head-to-head
Same study. Synthetic panel vs real recruited panel.
04
Coming · Recap & guide
Decision tree, full guide, and a Claude skill you can run yourself.