How to Run UX Research for AI Products
What's Different About AI UX Research
Standard usability testing assumes the system under test behaves deterministically. The researcher knows what will happen when a user clicks a button. The question being studied is whether users understand how to achieve their goals.
AI products behave probabilistically. The same action can produce different outputs. The quality of the output varies. The user may or may not trust what they see. The experience of using the product changes as the AI "learns" from interactions (or appears to).
These properties require adapted research methods. Standard usability testing misses the most interesting and important questions about AI product UX.
What You're Actually Trying to Learn
Before designing your research, be explicit about the questions you're trying to answer. For AI products, the most important questions tend to be:
Trust calibration: Do users trust the AI appropriately — neither over-trusting outputs they should verify nor under-trusting outputs that are correct?
Mental model accuracy: Do users have an accurate understanding of what the AI can and can't do? Inaccurate mental models lead to wrong use of the product and disappointment.
Correction behavior: When the AI is wrong, what do users do? Do they correct it, dismiss it, or accept it anyway?
Latency perception: How do users experience waiting for AI responses? Does the loading state feel appropriate for the task?
Error recovery: When the AI fails completely or produces a clearly wrong output, how do users recover? Is the experience recoverable?
Standard usability testing studies task completion and efficiency. AI UX research must also study trust, mental models, and error recovery.
Methods That Work for AI Products
Contextual inquiry: Observe users performing their real work tasks in their real environment. This is more revealing than lab sessions because you see how users integrate (or fail to integrate) the AI into their actual workflow.
Contextual inquiry for AI products reveals things that lab sessions miss: the frequency of verification behaviors ("let me double-check that"), the use of complementary tools alongside the AI, the moments where users stop trusting the AI and return to manual processes.
Think-aloud protocol with explicit trust probing: Standard think-aloud protocol (ask users to verbalize their thoughts while completing tasks) is effective but misses trust dynamics. Augment it with explicit probes:
- "How confident are you in that output?"
- "Would you act on this without checking it?"
- "How did you decide that was good enough?"
These probes surface the calibration questions that standard think-aloud misses.
Diary studies: Have participants use the product in their real lives for 1-2 weeks, logging their interactions. Particularly valuable for AI products because they capture the evolution of trust over time. Users who are skeptical on day 1 may become over-trusting by day 14 — or vice versa. Diary studies capture this arc.
AI-specific usability heuristics: Nielsen's 10 usability heuristics were developed before AI products existed. Augmented heuristic evaluations for AI should also assess:
- Does the system communicate its confidence appropriately?
- Can users easily correct AI outputs?
- Are AI limitations communicated proactively?
- Does the system explain its reasoning when relevant?
- Is it clear what data the AI is using?
Recruiting the Right Participants
AI products require deliberate attention to the spectrum from AI-skeptical to AI-enthusiastic users.
AI-skeptical users: Users who are hesitant about AI, skeptical of automated outputs, or who have had bad experiences with AI in the past. These users often identify the trust and transparency gaps that enthusiastic users overlook. They represent a significant portion of the eventual user base for most B2B products.
AI-enthusiastic users: Users who are excited about AI, early adopters, and power users. These users identify advanced use cases and feature gaps. They're easy to recruit but overrepresented in most research panels.
AI-neutral users: The majority of eventual users. They have no strong feelings about AI and will use whatever tool solves their problem. These users are hardest to recruit specifically but most important to represent.
Aim for roughly equal thirds in your participant pool. The AI-skeptical users will tell you things you don't want to hear and need to hear.
Measuring Trust
Trust is the most important and hardest to measure dimension of AI product UX.
Behavioral indicators of trust:
- Does the user verify AI outputs, and if so how?
- Does the user edit AI outputs before using them?
- Does the user abandon the AI and do tasks manually?
- Does the user recommend the AI to colleagues?
Self-reported trust: Use validated trust in automation scales (e.g., Jian, Bisantz & Drury's Trust in Automation scale). These are standardized questionnaires with known psychometric properties. More reliable than ad-hoc "how much do you trust this?" questions.
Calibration accuracy: The most precise measure: compare user confidence ratings of AI outputs with the actual correctness of those outputs. Users who express high confidence in incorrect outputs and low confidence in correct outputs are poorly calibrated. This can be measured in a structured study where you know which outputs are correct.
How to Use What You Learn
The most common outcome of AI UX research is discovering that users either over-trust or under-trust the system. The design interventions for each are different:
Over-trust interventions: Add confidence indicators. Surface uncertainty explicitly. Add prompts to verify in high-stakes contexts. Show the sources the AI used. Make it easy to report errors.
Under-trust interventions: Provide accurate comparisons to the manual baseline. Show track record of accuracy. Let the AI explain its reasoning. Start with small, low-stakes tasks where trust can be built through demonstrated accuracy.
Mental model gaps: If users have consistently wrong expectations about AI capabilities, address this in onboarding — not with a feature list, but with task-based examples that demonstrate actual capability.
The goal of AI UX research is not to make the product more usable in the conventional sense — it's to produce users who are well-calibrated in their trust, accurate in their mental models, and effective in their collaboration with the AI. That's a different outcome than standard usability, and it requires the adapted methods described here.









