How to measure whether your research actually ships: the Recommendation Adoption Score

By
Carly Hartshorn
Published
March 29, 2026
How to measure whether your research actually ships: the Recommendation Adoption Score

You've seen this play out. The research readout goes well. People nod. Someone says "we should really fix that." Then the roadmap stays locked, sprint cycles roll forward, and six months later you're back in the same room, presenting the same findings to the same people about the same problems.

That's research breakage, and it's quietly undermining research teams everywhere.

Brian Utesch, Head of Intelligent Experience Research at Cisco, and Tammi Fitzwater, in UX Research and Data Analytics, built a framework to make this invisible problem visible. It's called the Recommendation Adoption Score (RAS), and they published it as a multi-part series with Nielsen Norman Group. We sat down with both of them (they each hold PhDs and bring decades of combined experience from organizations like IBM and Cisco) to walk through how it works, why it matters, and how you can start using it tomorrow.

Here's the core idea: insights don't fix anything. They describe the problem. Recommendations fix things, but only if someone actually builds them into the product. RAS measures that last mile, the gap between "great research" and "shipped improvement."

Why insights alone aren't enough

There's a perspective in the research community that stops at insights. We provide the data; it's not our job to provide recommendations. Brian and Tammi push past that boundary, and for good reason.

When recommendations don't get adopted, the cost isn't just abstract. Users keep struggling with the same problems. Product quality stagnates. And researchers burn out from the Groundhog Day cycle of presenting findings that never turn into fixes. As Tammi put it during our conversation: "You feel like you're back in the same room telling the same people about the same problems, and it's defeating, and it's demoralizing."

The pressure is only increasing. With AI enabling more people to run research faster, the volume of insights is growing. But volume without adoption is just noise. If you're producing more research but not tracking whether any of it ships, you're scaling the wrong thing.

This is why the shift from insights to recommendations to tracked adoption matters. It's the difference between measuring research impact through activity (studies completed, reports delivered) and measuring it through outcomes (user problems actually fixed).

What is research breakage?

Brian and Tammi borrow the term "breakage" from retail, where it refers to inventory that gets lost, damaged, or stolen between the warehouse and the shelf. Research breakage works the same way: recommendations get lost, deprioritized, or quietly abandoned between the readout and the product.

The tricky part? It's almost never intentional. As Tammi explained, everyone comes to the table with good intentions. Researchers want to make life better for users. Product owners want to create good products. Engineers want to build things that work well. At the surface, everything feels aligned.

But breakage happens in the gaps. The roadmap is already locked for the next several sprint cycles. Other priorities have already claimed the resources. The recommendation that everyone head-nodded about needing to fix just quietly falls out of the system. No one flat-out rejected it. Nothing dramatic happened. It simply didn't make it into the plan.

"Breakage is really those small insidious leaks between the good intention and the outcome," says Tammi.

Nielsen Norman Group's article, Insights Aren't Outcomes: Research Recommendation Breakage, goes deeper into this phenomenon and the patterns behind it. It's worth reading alongside this piece.

Warning signs that your recommendations are breaking

You don't need a score to spot the early symptoms. Tammi identified three patterns that signal breakage is happening:

Repetition. You do a study, identify a problem, present it, and then six months later you run another study and the same user problem still exists. If you're presenting the same findings repeatedly, your recommendations aren't making it into the product. That cycle is the clearest signal.

Organizational fog. There's communication, there's head-nodding, but there's no clear owner. Tickets might get created, but they sit in a backlog with no one taking responsibility and no timeline attached. The fog of unclear ownership means nothing moves, and recommendations die on the vine.

Enthusiasm for insights, not recommendations. People love hearing about the problems. The readout gets attention and engagement. But when the conversation shifts to "what needs to change and who's going to change it," the energy drops. If your audience lights up at findings but goes quiet at recommendations, that's a warning sign.

We added a fourth: if the presentations about what your team is working on — the strategy decks, the roadmap reviews — don't lead with or embed research recommendations, that's another signal. Research should be woven into how teams communicate and plan, not treated as a separate input that gets filed away.

How the Recommendation Adoption Score works

RAS is simple math. It's a percentage: how many of your communicated recommendations have actually made it to adoption?

But Brian and Tammi added important nuance, because not all recommendations are created equal.

The weighting system

The framework assigns multipliers based on the value a recommendation delivers to the user:

High value (3x multiplier): Recommendations that would noticeably change the user's experience. Think higher task success rates, greater retention, meaningfully improved satisfaction. These are the ones where users would feel the difference.

Medium value (2x multiplier): Recommendations that still matter but might not affect the user's day-to-day core tasks. When a user does encounter the improvement, they'll notice it, but it's not reshaping their primary workflow.

Low value (1x, face value): The bells and whistles. Low-hanging fruit that polishes the product but won't noticeably change how well users can accomplish their goals.

An important design decision: the value classification is based on impact to the user, not on how the researcher or stakeholder feels about the recommendation. As Brian noted, keeping the user as the focal point eliminates a lot of the arguments about what counts as "high" versus "low."

"By using a weighting system, you get to take your opinion about how you feel about that particular recommendation and its importance from your perspective off the table, and just focus on what it's going to do for the user."

The formula

The numerator adds up the weighted value of adopted recommendations, plus a partial credit for committed ones (more on that below). The denominator captures the total potential value of all communicated recommendations, using the same multipliers.

So if a high-value recommendation gets adopted, it contributes 3 points to the numerator and 3 points to the denominator. If it doesn't get adopted, it still contributes 3 to the denominator, pulling the score down. This means ignoring your highest-impact recommendations hurts the score more than ignoring the small polish items — which is exactly the right incentive.

The final score lands on a 0–100 scale. Higher is better. Nielsen Norman Group's second article, Tracking Adoption of Research Recommendations: The Recommendation-Adoption Score, includes the full mathematical breakdown and worked examples.

The committed bonus

Here's a smart touch: recommendations that are committed (scoped, resourced, and in the roadmap) get a two-thirds credit in the numerator, but only at face value, not at their weighted multiplier. The logic is that a committed recommendation shows real progress and should be rewarded. But it hasn't shipped yet, so it doesn't get the full value boost until the user actually experiences the fix.

This rewards the researcher–product owner relationship for getting recommendations into the plan, without inflating the score before the work actually reaches users.

Understanding recommendation statuses

The framework tracks six statuses, and each one matters:

Communicated: The recommendation has been delivered to the product team through a report, deck, readout, or conversation. This is the starting line, and these recommendations form the denominator (the pool of potential value).

Committed: The recommendation has been scoped and resourced. It's in the roadmap with resources allocated. This earns the two-thirds partial credit.

Adopted: The recommendation has shipped. It's in the product. Full weighted credit in the numerator.

Deferred: The recommendation is valid and agreed upon, but the team knows it's further out. It gets temporarily removed from the denominator so it doesn't drag the score down unfairly. But once the deferral window expires, it must be revisited: does it go back into the active pool, get committed, or get canceled?

Canceled: Both the researcher and product owner agree the recommendation won't happen, for valid reasons (resources aren't available, an executive decision was made, the feature is being sunsetted). This gets removed from the denominator entirely, with documented reasoning. There are receipts.

One of Tammi's strongest pieces of advice: avoid the status "in progress." It's one of the vaguest ways to track a recommendation and it obscures whether real movement is happening or not.

How to avoid gaming the system

An obvious question: can researchers game RAS to look good? Could you just make lots of easy, low-value recommendations that are simple for product teams to ship?

The weighting system handles most of this. Low-value recommendations only count at 1x, while high-value ones count at 3x. A researcher who generates ten low-value adopted recommendations scores lower than one who gets three high-value ones adopted. The math rewards impact, not volume.

Could a researcher inflate values to make their recommendations seem more important? That's where the user-centric definition matters. High, medium, and low are defined by the impact on the user's experience, not by the researcher's opinion of the work. If the team aligns on that definition upfront, the subjectivity is manageable.

And Brian made an important point about incentives: RAS is not designed to be a performance metric for individual researchers. It's a diagnostic tool for the system. It tells you where research is being adopted and where it isn't, which teams are converting recommendations into shipped improvements and which ones aren't, and where to invest your effort next.

Getting started: practical steps for tomorrow

Tammi compared it to running. People who don't run ask distance runners, "How did you get there?" The answer: you go outside, put one foot in front of the other, and start somewhere.

Here's what that looks like for RAS:

Start tracking recommendations. Before you worry about formulas and multipliers, just start writing down your recommendations in a way that's explicit and trackable. Each recommendation needs to be specific enough that you can objectively verify whether it shipped or not. "Improve the onboarding flow" is too vague. "Add a progress indicator to the three-step setup wizard" is trackable.

Define your statuses. Use the framework's statuses (Communicated, Committed, Adopted, Deferred, Canceled) or adapt them to your organization. The key is having clear definitions that everyone understands. Just please avoid "in progress."

Assign owners. Every recommendation needs someone responsible for tracking its status. Without ownership, recommendations drift into the fog Tammi described.

Groom consistently. Set a regular cadence for reviewing and updating your recommendation statuses. Brian's team at Cisco tried a Trello board first; it got too overwhelming. They moved to a simple Excel spreadsheet and it worked. They've since created a template you can download (check Brian's LinkedIn for the link) that auto-calculates your RAS score.

Set thresholds that fit your org. Brian and Tammi created threshold bands for interpreting RAS scores (poor, fair, good, great), and they're included in the NN/g article and the template. But they emphasized that thresholds should be calibrated to your organization's maturity. A team just starting out might set more lenient lines between "fair" and "good" than a mature research org with deep product partnerships.

Where AI fits in

The conversation naturally turned to AI, and two threads emerged.

First, can AI help automate the tracking? Tammi was enthusiastic about this possibility. If AI agents could connect Jira tickets to a recommendation tracker, auto-updating statuses when tickets move through sprints, that would eliminate a lot of the manual grooming labor that makes RAS feel burdensome. The framework is conceptually simple, but the operational overhead of keeping statuses current is real. Automation could solve that.

Second, does RAS apply to research conducted by non-researchers? When product managers or designers run their own studies (increasingly common with AI-powered research tools), should those recommendations be tracked too? Brian suggested the answer is yes, and that it could actually help bring more rigor to democratized research. If a PM runs a study and generates recommendations, tracking adoption creates accountability for the quality of those recommendations, not just their existence.

There's also an interesting question about what the delta in RAS scores between professional researchers and non-researchers might reveal. Are trained researchers getting higher adoption rates? If so, that's a compelling data point for the value of research expertise.

This connects to a broader shift in how we think about research value

RAS fits into a larger movement in the research field: the shift from measuring research by outputs (number of studies, number of reports, stakeholder satisfaction) to measuring it by outcomes (decisions enabled, user problems solved, product improvements shipped).

We've discussed this shift through the lens of the Researcher Effort Score, which measures how easy it is for product teams to work with research. RAS complements that by measuring whether the work actually converted into product changes. Together, they give you a fuller picture: is research easy to engage with (RES), and is it actually getting built (RAS)?

This outcome-oriented mindset also connects to how teams communicate research findings. If you know your recommendations will be tracked, you write them differently. They become more specific, more actionable, more tied to what product teams can actually build. The act of measurement changes the behavior upstream.

For teams thinking about building a research repository, RAS adds another dimension to what you store. It's not just about making insights findable; it's about tracking what happened after they were found. A repository that shows "insight surfaced → recommendation made → shipped in Q3" tells a fundamentally different story than one that just catalogs findings.

And for research leaders making the case for budget and headcount, RAS provides exactly the kind of metric that finance and executive teams understand: here's what we recommended, here's what shipped, here's the value that was delivered versus the value that was left on the table.

How to use the Recommendation Adoption Score with your team

If you're ready to go deeper, here's where to start:

Read the NN/g articles. Part one, Insights Aren't Outcomes: Research Recommendation Breakage, explains the problem. Part two, Tracking Adoption of Research Recommendations: The Recommendation-Adoption Score, gives you the full framework with worked examples and threshold guidance.

Start small. Pick one product team you work with closely. Track recommendations from your next three studies. Calculate RAS after 90 days. Use that baseline to understand your starting point before expanding to other teams.

Share the score, not the blame. RAS works best when it's positioned as a diagnostic for the system, not a performance scorecard for individuals. Frame it as: "Here's where our research is getting traction, and here's where it isn't. How do we fix the gaps?"

Table of contents
Subscribe to the Great Question newsletter

More from the Great Question blog