Art vs Science: UX research in the age of the reproducibility crisis

In tech in recent years, it feels a bit like everybody and their dog has hopped on the ‘scientific method’ bandwagon. Design Thinking, Agile and UX Research methodologies are all modeled on this approach. But science itself is in the grips of an existential crisis, as embodied by a series of failed attempts to reproduce existing research. (See ‘The Reproducibility Project’ for context). Against this backdrop, it’s a prescient time to ask the question: should UX Research be reproducible?

John’s intro talked about the discussions he and Laura had been having around the “reproducibility crisis” in science, where people are unable to recreate results of experiments.

This story begins with Amy Cuddy’s TED talk about body language, with the very bold claims about using power poses and the impacts the technique could have on peoples lives. The video blew up massively, people went crazy for it.

But meanwhile people were talking a lot about whether enough work was being done to reproduce the results of studies, or to put it another way – to cross check the results. The Reproducibility Project was born; and a high profile example was that they could not replicate Amy Cuddy’s work. Cuddy’s TED talk was torn down, becoming a shorthand for flashy social psychological work that could not be replicated.

In science there are some key problems that lead to the reproducibility crisis:

  • publish or perish – the pressure to publish leads to a very high volume of studies and papers
  • no replication studies – nobody wants to fund replication studies over new research
  • clickbait

P-Hacking shows that people are manipluating work to come in just below the line where results become “statistically significant”. Whether they even realise they’re doing it.

So what does this mean for UX research?

  • “prove me right!”
  • demanding shortcuts or early results
  • preferring quant data over qual data

UX research is trying to reduce business risk. Does reproducibility even matter? We come from science! We are science lite – said with love! eg. the double diamond? From science. It’s a lift from the scientific method.

Also we have to remember that scientific ideas get debunked, but they live on. Myers-Briggs is a classic case – it is astrology for people who should be too smart to rely on astrology.

Method/problem mismatches – people use the wrong method to solve problems. eg. they’ll use UX testing to attempt to work out product/market fit. We have to question what we’re doing and why we’re doing it.

UX testing is to work out if they can use the tool, not if they will, or if they want to.

Should UX research be reproducible? Sometimes, it depends!

Reproducibility is impacted by…

  1. Experiment design – model and methodology, sample size and selection
  2. Data capture – will the same study get the same results if run again
  3. Interpretation – if I looked at the same data as an earlier study, would I reach the same conclusions?

UX has a tension between qual and quant work. Qual is almost never reproducible, but it’s still valid. It’s unlikely multiple rounds of UX research would produce the same data; nor would people always draw the same conclusions.

So what can we do?

  • If crunching numbers, make sure you practice research design hygiene and understand the strength of your signal. Slides have links to lots of tools to help this, like AB Testguide.
  • Embrace uncertainty, just as scientists do. Don’t speak in absolutes. Cultivate your curiosity. Reward “I don’t know” – as people become more senior, they stop feeling comfortable to say they don’t know something.
  • Define your study before you start (in science this is called pre-registration, and it enables people to get feedback before investing time and resources).
  • Plan for no conclusion – sometimes you don’t get clear results. You have to be able to admit it.
GoodPoor
expect the effect to be consistent over time don’t expect the effect to be consistent
the hypothesis is important to business model functioning low risk to business model or low business priority
product has been consistent since last study UX or UI in flux; too many variables to control for

Get fresh eyes on your data – see what the comparison can bring.

Power ups:

  • fact check – if someone claims that a study shows something, go have a look at it. Even just reading the abstract will often be enough. Develop your sniff test.
  • open UX – perhaps we need open data, open results, FOSS style? Should we be doing meta research using data sets from different companies and the research they are doing?
  • continuous discovery – doing one-shot studies leaves value behind. What would it look like if we did the same research across a timeline and start looking at long term trends?

Recommendations

  • Build culture for continuous learning
  • Embrace uncertainty
  • Replication studies (maybe)
  • Spend more time on research design & analysis
  • Uncertain results are ok

In short, be sceptical, pick a good question, and try to answer it in many ways. It takes many numbers to get close to the truth. – source

@summerscope | slides