Artifice or Intelligence?

Original Source Here

Artifice or Intelligence?

Report your statistical analysis plan before seeing any data

photo by Karen Laårk Boshoff at

Every time you HARK [hypothesize after the results are known, then portray these hypotheses as a priori], you risk claiming the implausible: that you reproduced a real signal where there was in fact only noise.


In Upstate New York, my college roommate set up a dartboard in our dorm room. A good friend — let’s call them “James in NY” — would often stop by. James loved to show off as a would-be darts ace.

James would square up at the line of tape on the floor. They’d carefully assess the dartboard, the dart in their hand, and the trajectory between the two. Finally, with a dramatic wind up, James would begin, “Watch, as I deftly hit …”

James would then pitch the dart into the air with a theatrical flourish, maybe even a muffled athletic grunt. Upon seeing it land decisively within the triangular band stretching from the bullseye to the outer arc marked “16”, James would triumphantly conclude, “… the six!” (This is slang for “16” in the dart game “cricket”.)

If you walked in just then, you might marvel at James’ dart-throwing prowess and accuracy. I mean, they called the shot fair-and-square, right? “Watch, as I deftly hit the six!”

HARK — The Texas Sharpshooter

In science, research, and statistics, James is called the Texas Sharpshooter. This trickster is famously known for “hypothesizing after the results are known” — while claiming that the hypotheses were created before the results were known (HARKing). This often manifests due to “researcher degrees of freedom”, and is commonly used in p-hacking and data fishing or snooping.

This is bad because it makes it seem like you’re simply finding evidence that supports your original hypothesis, when in fact you’re finding possible evidence of a new hypothesis — one that requires new evidence from a separate additional study to further support: “I hit the six. Maybe I’m good at hitting sixes? Let me try again.”

You talk about insights you just discovered as if you already knew them to begin with, and are now simply reproducing them.

This common confusion makes a scientific finding seem more real and reproducible than it actually is. You may unwittingly do this, for example, whenever you write your statistical analysis plan (SAP) or methods section after you’d already started examining or analyzing your data.

That is, even in the best cases, it’s easy to forget or fail to notice how you changed your initial hypotheses simply by checking your preliminary findings, or by viewing working plots or data visualizations. This “hypothesis creep” means you were slowly but surely tailoring your ideas to fit your study data, rather than using your study data to test or confirm your original ideas.

And these empirically updated hypotheses are what you wound up presenting in your SAP or methods section as your initial hypotheses! You tried to “let the data speak for themselves” — but they took a yard after you gave them that inch.

You fail to report that you “tuned your model (hypothesized) after the holdout fit (result) was known”.

In machine learning, a version of HARKing can produce a subtle type of statistical overfitting (i.e., finding structure in variation where there is none, or signal where there is only noise).

  • “Watch, as I deftly hit …” Suppose you fit, cross-validate, and select a model with your training data. You then test your model on your holdout data, and calculate its holdout performance.
  • “… the 6!” You feel you can improve this holdout performance, and decide to adjust your original modeling and cross-validation parameters. You then re-fit, cross-validate, and select a new model with your training data. You test your model on that same holdout data, notice that your new model’s holdout performance is improved, and are now satisfied.

Unfortunately, you’ve just optimized your model to fit both your training and holdout data — thereby worsening its true holdout performance (i.e., its ability to generalize to new data). You forget you did this, and in summarizing your results, you fail to report that you “tuned your model (hypothesized) after the holdout fit (result) was known”.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: