The Social Science Prediction Platform was just analyzed to understand how good researchers are at predicting study effect sizes. They don't do a good job of it🧵 Researchers routinely overestimate how large their effects will turn out!
When you compare what researchers predict (b) and what they find (a), the predictions are simply much larger than the realities on the ground. And this chart below may oversell the prediction accuracy, since the correlation is a sizable, but not confidence-inspiring 0.453.
As a qualifier on that result, there's relatively less misestimation for RCT results, and relatively more for non-RCT results. But, interestingly, the absolute degree is the same.
What factors modified prediction accuracy? The most powerful factor was the wisdom of crowds: groups of people outperformed individuals, decisively! Also, academics beat non-academics, paid predictor panelists beat non-panelists and confidence was nonlinearly bad!
Confident people are, absolutely, less accurate in general. But comparing the unconfident to those at the median, there's no difference. It's when you get into high confidence that the pattern shows up.
The reason is that the highly confident predict larger effect sizes, for some reason.
More interestingly, confidence between-persons is the thing that's correlated with lower accuracy, but confidence within-persons is correlated with higher accuracy. That is, when you look at people over time, their more confident predictions are their better ones!
Lots of other factors played small but notable roles in prediction accuracy, and I definitely recommend going and reading the paper to learn more. But what I recommend taking away from this is that, overall, people still aren't very good at predicting science.
In a sense, this is a good thing. If everything could be perfectly predicted, we wouldn't need to do research in the first place. In another, it's a bad thing, largely because of the specifics. Namely, researchers are overconfident and they seem to overhype results.
In another sense, this is a really informative thing that supports points I've made elsewhere For example, one of the arguments heard in defense of excessively many p-values in the literature being right at the edge of significance is that researchers "predicted" that Not true!
The argument goes that researchers did a power analysis—which requires picking some hopefully realistic effect size—and thus their results are expected to be just-significant. But they're not. If you have 80% power, most of your p-values are from the significance threshold.
No one can predict where the p-value will be without more precise knowledge about treatment effects, variance, and so on, and this is unavailable knowledge. But people are defending the impossible, and the fact that researchers predict treatments poorly supports that notion.
2.85K