
A History of Product Testing: When Behavioral Economics Changed Everything
In this piece
- The Moment "Do You Like It?" Stopped Being Enough
- How Qualitative Product Research Absorbed the Shock
- Frequently Asked Questions
- How did Kahneman's behavioral economics reshape product testing methodology?
- What's the difference between traditional satisfaction ratings and peak-end rule assessment?
- Why did hedonic adaptation force researchers to abandon simple preference questions?
- How should product researchers account for peak-end bias in longitudinal studies?
Historically, products were mostly built on a single flawed assumption: that people can accurately report how much they like something. That assumption held for decades. Then Daniel Kahneman's work on memory, peak moments, and endings arrived in commercial research, and the field spent the better part of the 2000s scrambling to catch up.
Key Takeaways
- Kahneman's peak-end rule showed that satisfaction ratings reflect memory of an experience, not the experience itself, a direct challenge to standard product preference questions.
- Hedonic adaptation means repeated exposure to a product flattens emotional response, making early-use ratings unreliable predictors of long-term repurchase.
- Traditional satisfaction scales miss the temporal structure of product experience; longitudinal diary methods were adopted to capture the arc, not just a snapshot.
- Agencies and in-house teams now need qualitative product research that probes emotional texture over time, not just a single moment of evaluation.
The Moment "Do You Like It?" Stopped Being Enough
Before Kahneman's 2002 Nobel Prize put behavioral economics into every MBA syllabus, product testing ran on a stable protocol. You exposed a participant to a product, you asked them to rate it, and you aggregated the ratings. CLT studies, HUT placements, paired comparisons, all built on the premise that stated preference was a reliable proxy for future behavior. It mostly worked well enough that no one questioned the architecture. Kahneman and his collaborators had already documented the peak-end rule in the 1990s: people judge an experience almost entirely by its most intense moment and how it ended, not by an integrated average of the whole experience. Applied to product testing, this was disruptive.
A cleaning product that delivered a sharp sensory peak at first use and a satisfying "clean smell" finish would outscore a more consistently effective product in a single-session evaluation, even if long-term performance clearly favored the latter. Researchers who had built careers on hedonic batteries and nine-point scales suddenly had a validity problem. The follow-on problem was hedonic adaptation, documented extensively by researchers at Cornell and Columbia in the early 2000s. Repeated exposure to almost any stimulus (pleasant or unpleasant) shifts the emotional baseline toward neutral. A product that scores a 7.8 in week one reliably scores closer to a 6.5 by week four, not because the product changed but because the respondent stopped noticing it. Single-exposure product tests were producing optimism bias at scale. Launch models built on that data were systematically overestimating preference durability.
How Qualitative Product Research Absorbed the Shock
The quantitative side of product testing responded with more sophisticated longitudinal designs, repeat measures, monadic sequences, panel tracking. The qualitative product research response was slower but ultimately more interesting. Diary methods, which had existed at the margins of consumer research since the 1970s, moved toward the center. If peak-end dynamics mean that a single evaluation captures memory rather than experience, the obvious fix is to capture experience in real time, across multiple moments, before memory has a chance to reshape it. What changed wasn't just method. It was the questions researchers started asking. "Do you like it?" became "What were you doing when you reached for it?" and "What made you stop using it?" The qualitative craft of depth interview design shifted toward emotional texture and behavioral context. Agencies running studies for CPG and personal care clients started building discussion guides that mapped the arc of product experience including first use, habitual use, the moment satisfaction either solidified or eroded.
The harder problem was scale. A diary study that captures weekly entries over four weeks, followed by a depth interview, is methodologically sound but expensive. A senior qualitative researcher managing eight diarists simultaneously is already near capacity. The result was that most teams ran these studies with small samples and accepted the limits, or skipped the longitudinal design entirely and returned to the single-session evaluation they knew was inadequate. Automated AI diary study analysis has started to change that calculus by handling the transcription, coding, and pattern detection that made longitudinal qual unsustainable at meaningful sample sizes. The behavioral economics intrusion didn't kill product testing. It forced a more honest accounting of what product experience actually is: something that unfolds, adapts, and is remembered imperfectly. Research designs that ignore that arc are still common. They just have a harder time explaining why the product that aced the CLT stalled in reorder data six months later.
Want to see how Enumerate's AI-moderated diaries can capture the full arc of product experience without the coordination overhead? Book a demo with Enumerate.
Run your next study on Enumerate.
See how Enumerate works on a study like yours. Book a 30-minute demo and we'll walk you through it.
Book a demoTailored to your use case
Frequently Asked Questions
How did Kahneman's behavioral economics reshape product testing methodology?
Kahneman's peak-end rule revealed that product satisfaction ratings reflect memory of an experience rather than the experience itself. That finding invalidated single-session hedonic batteries as standalone predictors of long-term preference, pushing teams toward longitudinal designs that capture how experience evolves across multiple use occasions.
What's the difference between traditional satisfaction ratings and peak-end rule assessment?
Traditional ratings ask respondents to summarize how much they liked a product at a single moment. Peak-end assessment recognizes that recall is dominated by the most emotionally intense moment and the ending, so a product with a weak middle but a strong finish can outperform a more consistently effective competitor in standard testing.
Why did hedonic adaptation force researchers to abandon simple preference questions?
Hedonic adaptation means emotional response to a repeated stimulus reverts toward neutral over time, regardless of actual product quality. A single-exposure preference question captures novelty response, not durable preference, which is why products that score well in CLT studies sometimes underperform in reorder and repeat-purchase data.
How should product researchers account for peak-end bias in longitudinal studies?
Capture experience in the moment rather than retrospectively. Diary entries logged at or near the point of use are less vulnerable to peak-end distortion than a single end-of-study interview. When a depth interview closes out the study, focus it on the emotional arc and the specific moments that shifted perception, not a global satisfaction summary.
Related Reading

Phenomenological Research: A Guide for Market Researchers
Phenomenological research studies lived experience, not just behavior. Learn how it works, when to use it, and how it fits modern research programs.
Read more
Five Waves of Shopper Research Method (and What Each One Got Wrong)
A diagnostic look at the five waves of shopper research methodology: what each one gained, what each one lost, and how to build a hybrid stack.
Read more
Qualitative Coding Methods: A Practitioner's Guide
A senior researcher's guide to qualitative coding methods: inductive vs deductive, when to mix them, and how to keep rigor when coding at scale.
Read more
Run your next study on Enumerate.
See how Enumerate works on a study like yours. Book a 30-minute demo and we'll walk you through it.
Book a demoTailored to your use case