Oct 03 2016

Doubt About Power Poses

powerpose1underwoodIn 2012 psychologist Amy Cuddy gave a TED talk about “power poses.” This was the result of her recent research, which found that adopting “expansive” postures, such as standing with your hands on your hips, makes you feel more powerful, and this feeling translates into action, such as taking more risks.

Cuddy and her coauthors are serious psychological researchers, but the result was pop-psychology and self-help gold. The self-help industry in particular loves tricks that they can argue will help people succeed at some goal. This is because the public wants the tricks – the easy shortcut that can reach a goal without all the hard work or that can give you an edge over others.

But of course we have to ask the hard question: is the effect real?

Power Poses and P-Hacking

Recently one of Cuddy’s coauthors, Dana Carney, published a statement in which she details why she no longer believes that the power pose effect is real. (There are a number of individual effects here, including internal psychological changes, behavioral changes, and outcome changes, but for brevity I will simply refer to power poses, also called expansive postures, and the power pose effect.)

This is a remarkable admission, and seems to reflect changing standards in psychological research over the last 5 years. It is clear from her statement that Carney read and internalized the 2012 paper by Simmons, Nelson, and Simonsohn in which they describe researcher degrees of freedom and how this results in p-hacking.

Briefly, researchers need to make choices about the details of their research: how many subjects to study, which variables to consider, which comparisons to make, and which statistical analysis to use. If they look at any data prior to making these choices, or revise their choices in response to the data, that is likely to result in p-hacking.

The term “p-hacking” refers to the p-value, which is the probability of the data being what it is or greater given the null hypothesis. If, for example, a p-value threshold of 0.05 is used for statistical significance, a researcher may keep collecting data until they cross over the 0.05 threshold, and then stop. That is p-hacking.

To give a simple analogy, you might consider what the odds are of flipping heads on a fair coin at least 60% of the time. You can test this by giving a predetermined 10 honest flips in a row and checking the result. What if you kept flipping until you got heads 60% of the time? That would dramatically increase your odds. You are just taking the drunken walk until you happen to stumble over the finish line, and then declare victory.

Carney is essentially arguing that she and her co-authors were engaging in p-hacking in the original research without realizing it. She writes:

Initially, the primary DV of interest was risk-taking. We ran subjects in chunks and checked the effect along the way. It was something like 25 subjects run, then 10, then 7, then 5. Back then this did not seem like p-hacking. It seemed like saving money (assuming your effect size was big enough and p-value was the only issue).

Yep, that’s p-hacking. Checking the data along the way is basically cheating. As Carney says, however, it is often done naively, as a way to save money. You might think this is a way to demonstrate your effect with the smallest number of subjects – you can just stop once you reach statistical significance.

Carney also admits to looking at multiple variables and then publishing only the ones that “work” and also using multiple statistical methods and choosing the one that was positive, even though the other method was more appropriate. Further, the sample sizes they used were “tiny” and the effect sizes were small. Those are all the ingredients of a false positive result.

Looking back, she says:

“I do not have any faith in the embodied effects of “power poses.” I do not think the effect is real.”

It certainly seems from reading her statement that she had an epiphany (probably from reading Simmons paper) and realized the mistakes she had been making in the past.

Her conversion to power pose skeptic was completed after multiple attempts at replicating the effect failed. Replication is often the ultimate arbiter. Replications have less p-hacking because many, if not all, of the decisions are already made.

The controversy, however, is not over. NPR asked Amy Cuddy for a response to Carney’s admissions. You can read her full response here, in which she stands by the conclusions of her research and argues that it has been replicated. I don’t have access to a lot of this research myself, and both sides refer to papers not yet published, but I find that Carney makes the more compelling case. It seems that the older studies were methodologically weaker, and the more recent and more rigorous studies are trending negative.

NPR also interviewed Simonsohn (of the researcher degrees of freedom paper) who agrees with Carney that the research on power poses is turning negative.

He makes a few points I have made here often. Specifically, that researchers should not be talking to the public about an effect until it reaches the textbook level of confirmation. Otherwise the public is treated to a host of claims, presented as solid scientific findings, that will later be abandoned. This confuses the public and erodes their confidence in scientists.

Conclusion

The entire power poses episode is fascinating. It is a real-life case of p-hacking and the power of understanding the true nature of p-hacking.

Dana Carney individually has gone through a process that I hope reflects the broader research community. She began naively engaging in p-hacking without even realizing it, then later was able to see the flaws in her own research. The standard in research methods over the last just 5-6 years in psychology have become, according to Simonsohn, more rigorous as a result. Researcher avoid p-hacking and want larger sample sizes and large effect sizes.

If this is true, that is a fantastic trend.

It is also interesting to see the parallels between this research and dubious fields, such as ESP. ESP research is also plagued with p-hacking, small effect sizes, and failures to replicate. The one major difference is that the ESP researcher community has not had an epiphany. They are still wallowing in p-hacking and fake effects.

 

8 responses so far