How Do You Actually Study Beauty in a Lab?

Here's a question that kept coming up as I studied the science of beauty: if beauty is this deeply subjective, personal, emotional experience — how do you put it in a lab and measure it?

You can't stick a ruler on beauty. You can't weigh it. You can't draw blood and test for it. And yet researchers have been trying to study it empirically, and some of the results are genuinely illuminating.

But the methods matter. A lot. Because the distance between what beauty is and what a researcher can actually measure is where things get interesting — and where good science separates from hand-waving.

The core challenge is this: every study of beauty has to translate abstract concepts like "the feeling of beauty" or "thought" into concrete measurements. And that translation always loses something.

The gap between what you feel and what you can test

In psychology and cognitive science, there's a fundamental distinction between two things:

Theoretical constructs — the abstract ideas you're actually interested in. Intelligence. Happiness. Beauty. Thought. These are real human experiences, but you can't observe them directly.

Operational definitions — the specific, measurable things you use as proxies. An IQ score. A self-report scale from 1 to 10. A reaction time. These are what you actually collect data on.

Loading diagram...

The key question is construct validity — does the thing you're measuring actually represent the thing you're interested in?

Some pairings are tight. Height and a ruler. Temperature and a thermometer. The theoretical construct and the operational definition are so close they're almost the same thing.

But in psychology, these pairs drift apart. Sometimes way apart.

How far apart can they get?

Consider intelligence. The theoretical construct is this rich, multidimensional capacity for reasoning, learning, and problem-solving. Now consider some operational definitions:

An IQ test score — reasonable proxy, widely used, but narrow and culturally biased
Reading speed — loosely correlated at best, misses the entire construct
Shoe size — completely unrelated, absurd as a measure

Loading diagram...

This matters because when you read a study claiming "beauty requires thought," what you really need to ask is: how did they measure "beauty"? How did they measure "thought"? And are those measurements actually capturing what they claim to capture?

The paper that tested Kant

If you wanted one representative paper that gathers all of these questions into a single place, it would be Brielmann and Pelli (2017), published in Current Biology — a well-respected journal. Denis Pelli is a prominent vision researcher. This isn't fringe work.

They set out to test two of Kant's claims empirically:

Beauty requires thought (the "harmonious free play of the imagination")
Sensuous pleasures cannot count as beauty, because they involve desire to possess rather than a form of contemplation detached from possession

Loading diagram...

The experimental design

Here's what they did. Participants experienced three types of stimuli:

Images — three subcategories: self-selected beautiful images (ones they brought from home), high-positive-valence images from a standardized database, and neutral images (Ikea furniture)
Candy — Jolly Ranchers (taste pleasure)
Teddy bear — soft texture (touch pleasure)

While experiencing these stimuli, participants did one of two things:

No secondary task — just experience the stimulus freely (thought available)
Two-back task — monitor a stream of letters and press a button whenever the current letter matched the one from two positions back (thought reduced)

Loading diagram...

The two-back task is a standard tool in cognitive psychology. It's demanding enough to consume executive function resources without being impossible. You have to hold recent items in working memory while monitoring for matches — it genuinely occupies your cognitive bandwidth.

What they measured

Two dependent variables:

Pleasure — rated continuously while experiencing the stimulus, using an iPad where participants adjusted the distance between their index and middle fingers. Further apart = more pleasure. Closer together = less.
Beauty — rated after the stimulus on a four-point scale: "Definitely not," "Perhaps not," "Perhaps yes," "Definitely yes"

Loading diagram...

That continuous pleasure measure is easy to miss — it's described in about one line of the paper. But it's crucial for understanding the results.

The results

The core finding is this: when people were doing the two-back task, meaning available thought was reduced, their pleasure and beauty ratings dropped, but only for stimuli they found beautiful. For neutral stimuli, the task made no difference.

Loading diagram...

The researchers' conclusion was simple: beauty requires thought. When you reduce someone's cognitive resources, their experience of beauty diminishes. But their experience of non-beautiful stimuli stays the same.

The second finding: some proportion of participants did report experiencing beauty from sensuous pleasures — the Jolly Rancher candy, the teddy bear. Not everyone, but enough to conclude that sensuous pleasures can occasionally be beautiful, contradicting Kant's second claim.

So: Kant was right that beauty requires thought. But wrong that sensuous pleasures can never cross the beauty threshold.

Why this paper deserves scrutiny

The results are interesting. But the construct validity questions are where it gets really instructive.

Is the two-back task actually measuring "thought"?

This is the biggest stretch in the paper. Kant talked about the "harmonious free play of the imagination." That's a rich, poetic, deliberately vague concept about how the mind plays with ideas, forms, and feelings in an unconstrained way.

The researchers operationalized this as: executive function, measured by whether or not you're doing a two-back task.

Loading diagram...

There's a long game of telephone here. From Kant's original concept to the operational definition, each step loses something. The two-back task doesn't reduce "harmonious imagination" — it reduces attentional bandwidth. Those might overlap, but they're not the same thing.

As one student pointed out in discussion: "I feel like the operational definitions are really far removed. It might be more of an attention thing than a thought thing." That's a legitimate critique. The task manipulates attention and working memory, which is a component of executive function, which is one interpretation of "thought," which is one interpretation of Kant's original phrase.

Is the iPad measure actually measuring pleasure?

The continuous finger-distance measure is creative but unusual. People adjust how far apart their fingers are while simultaneously experiencing a stimulus and (in one condition) doing a demanding cognitive task. There are reasonable concerns:

When distracted by the two-back task, are people just defaulting to a middle position because they're too busy to adjust?
Is the reduced pleasure score a genuine reduction in pleasure, or just reduced attention to the measurement task?
Could the average reflect a blend of task displeasure and stimulus pleasure rather than pure stimulus pleasure?

Loading diagram...

Stephen Palmer — a prominent vision researcher whose paper on visual aesthetics we'll encounter later in this series — raised a version of this concern: maybe the ratings are just an average of the unpleasant task experience and the pleasant stimulus experience.

Is a four-point scale measuring beauty?

Self-report on a four-point scale ("Definitely not" to "Definitely yes") is a common and generally accepted method. But when you're asking someone to evaluate whether they experienced beauty from a Jolly Rancher versus a painting they brought from home, the word "beauty" might mean very different things in those two contexts.

When comparing within a category (which painting is more beautiful?), self-report works fine. When comparing across categories (is this candy as beautiful as this sunset?), it gets murkier. What does "beauty" mean when applied to candy?

What we can and can't conclude

Set the construct validity concerns aside for a moment, and the paper still shows something important:

When cognitive resources are reduced, the positive experience associated with stimuli people find beautiful is diminished, while the experience of non-beautiful stimuli is unaffected.

That's a more careful statement than "beauty requires thought." But it's still interesting. It suggests that whatever beauty is, it's not a passive sensory reflex — it requires some form of active processing that can be disrupted.

Loading diagram...

The sensuous pleasure finding is also noteworthy. Even if the proportion of people calling a Jolly Rancher "beautiful" is small, it isn't zero. And it often seemed tied to personal meaning — a candy that triggers a childhood memory, a texture that feels like comfort. In other words, beauty can enter sensory experience through that layer of personal significance.

Why music hits different

Across aesthetic domains, music seems to be the place where people most strongly feel that their judgment is more than just personal preference.

When people were asked about aesthetic disagreements that genuinely bothered them, music dominated the responses. Not paintings, not fashion, not architecture. Music.

Loading diagram...

Why? A few possibilities came up:

Identity signaling — music preferences signal who you are more strongly than other aesthetic domains. Band T-shirts, playlists, concert attendance — these are social markers. Disagreement about music feels like disagreement about identity.
Stronger objectivity illusion — for some reason, people seem to feel more confident that their music taste is "correct" compared to, say, their taste in paintings. This might fade with age and exposure to diverse genres.
Cultural cohesion — shared music creates shared tribe. Threatening someone's musical preferences threatens their group identity.

This isn't well-studied, but it's an interesting empirical pattern that emerges from actual classroom data. When you poll a room, music consistently produces the strongest responses of "this isn't just my taste; this feels actually right."

The liking/wanting distinction keeps coming back

When I step back from the discussion responses, one pattern stands out: most people intuitively feel that liking has to come before wanting. It's hard to think of examples where you want something you don't like.

But the neuroscience tells a different story. Liking and wanting are served by distinct neural pathways:

Loading diagram...

Drug addiction is the clearest case of wanting without liking. As addiction progresses, the hedonic hotspots can go nearly quiet, meaning the person barely experiences pleasure from the drug anymore, while the dopamine-driven wanting pathway grows stronger than ever. The craving intensifies while the pleasure fades.

The mirror image is what Kant described as liking without wanting: standing before a sunset and feeling moved without any desire to possess, consume, or acquire it. The hedonic system is active; the dopamine drive system is quiet.

Neuroscience suggests these two systems differ not only in brain circuitry but can also be separated chemically:

Naloxone (blocks opioids): reduces the pleasure of eating but doesn't reduce the desire to eat. Liking down, wanting preserved.
Dopamine cell destruction (in animal models): rats stop seeking food but still show pleasure responses when food touches their tongue. Wanting down, liking preserved.

The intuition that liking must precede wanting is understandable — in everyday life, they usually travel together. But the neuroscience shows they're separate systems that can uncouple. And that uncoupling is precisely what aesthetic experience might be.

A few things I'm taking away

The gap between theoretical constructs (beauty, thought, pleasure) and operational definitions (self-report scales, two-back tasks, finger distance) is where the hardest questions in aesthetics research live
Construct validity — whether your measurement actually captures what you think it does — should be the first thing you evaluate when reading any study on beauty
Brielmann and Pelli's paper is methodologically solid but the chain from Kant's "harmonious free play of imagination" to "no two-back task" involves several interpretive leaps
The core finding holds up: reducing cognitive resources reduces the positive experience of beautiful stimuli without affecting neutral ones — beauty involves active processing
Sensuous pleasures can occasionally be rated as beautiful, especially when personal meaning is involved — the boundary between pleasure and beauty is blurrier than Kant thought
Music is the aesthetic domain where people seem least likely to experience their judgment as "just personal taste," possibly because music is so tightly tied to identity
Liking and wanting are neurally distinct systems that usually travel together but can uncouple — addiction is wanting without liking, and aesthetic appreciation may be liking without wanting
The best behavioral experiments in cognitive science don't always need brain scans — fMRI can introduce its own problems with spurious correlations and presupposes we know what brain areas represent beauty
When a study claims to measure "beauty" or "thought," always ask: what did they actually measure, and how far is that from the thing they claim to have studied?

And the meta-lesson: studying beauty scientifically is hard. Not because beauty isn't real — it clearly is — but because the instruments we have to measure it are blunt compared to the richness of the experience. That doesn't mean we shouldn't try. It means we should be honest about what our measurements can and can't tell us.

The ruler measures height perfectly. The four-point beauty scale measures... something. And the gap between those two is where intellectual humility lives.

Sources:

Brielmann, A. A., & Pelli, D. G. (2017). Beauty requires thought. Current Biology, 27(10), 1506-1513.
Kant, I. (1790). Critique of Judgment. Trans. Werner Pluhar.
Berridge, K. C., & Robinson, T. E. (2003). Parsing reward. Trends in Neurosciences, 26(9), 507-513.
Palmer, S. E., Schloss, K. B., & Sammartino, J. (2013). Visual aesthetics and human preference. Annual Review of Psychology, 64, 77-107.
Chatterjee, A. (2014). The Aesthetic Brain: How We Evolved to Desire Beauty and Enjoy Art. Oxford University Press.

How Do You Actually Study Beauty in a Lab?

The gap between what you feel and what you can test

How far apart can they get?

The paper that tested Kant

The experimental design

What they measured

The results

Why this paper deserves scrutiny

Is the two-back task actually measuring "thought"?

Is the iPad measure actually measuring pleasure?

Is a four-point scale measuring beauty?

What we can and can't conclude

Why music hits different

The liking/wanting distinction keeps coming back

A few things I'm taking away

Comments