Notes on: Statistics as principled argument - Abelson

Abelson’s Laws

Chance is lumpy
Overconfidence abhors uncertainty
Never flout convention just once
Don’t talk Greek if you don’t know the English translation
If you have nothing to say, don’t say anything
There is no free hunch
You can’t see the dust if you don’t move the couch
Criticism generates methodology

Note: It is OK to find a translator who will help you.

Note: What about a null (insignificant) result?

Misunderstandings of Statistics

At some point in the development of 20th-century statistics, we got lost in the minutiae of inferential statistics, calculating P-values via NHST and an over-emphasis on the word significance.

Can we not use that word and still talk about our data?

While statistics can be viewed as a game, it can also be regarded as a narrative or an argument.

By argument, I do not mean two folks yelling at each other in a bar. Rather, a reasoned, evidence-based discussion occurs among Investigators and Experimenters within a paradigm where assumptions are made, and the truth or falsity of statistical claims are challenged.

Stand-alone statistics or Naked Numbers

Vision scientist and psychophysicist Stan Klein once referred to a single statistical number without error bars as “naked”—any measurement in science has uncertainty. Standard errors, deviations, variances, and confidence intervals are all examples of “clothes” for naked numbers.

Comparison

A number without comparison is less informative than one with context. For example, if I note that my guitar gear buddy can solder a circuit project in an hour, does that mean anything? Without context—implicit (e.g., how much soldering the project requires) or explicit (e.g., another friend can do the same task in half an hour)—there is no way to determine whether my first buddy is an electronics expert.

An example closer to your student experience: do you have a standard for how long it takes you to do a routine eye exam?

Our goal with statistical methods is to make claims backed by evidence. Abelson introduces the concept of ticks (claims) and buts (qualifiers based on design and methodology assumptions).

For Experimenters and Investigators, the goal is to create or manipulate factors thought to be causal. For example, lenses of positive or negative power can influence eye growth rate, making claims about factors relevant to emmetropization.

Chance: Sometimes your results are just a fluke yet are P < 0.05.

Systematic: Experiments can reveal causal factors, but without replication there is always a chance the result was a fluke.

Abelson notes that chance is underestimated, particularly in athletic contests. One of my favorite pieces of writing is Bill James’ “Underestimating the Fog”. The key sentence: “We ran astray because we have assumed that random data is proof of nothingness when, in reality, random data proves nothing.”

Abelson asserts that NHST might be best considered a game called “Devil’s Advocacy” because it assumes the null hypothesis. NHST cannot claim support for no difference; it is beyond its scope.

Despite emphasis on statistical education, significance tests provide very little information. Do you agree or disagree with this statement? If so, why? If not, why not?

MAGIC

Magnitude
Articulation
Generality
Interestingness
Credibility

Pages 12–14 are essential to read and understand. Abelson's prose is lucid and precise.

Pages 36–38: “In cases of inconclusive or messy data in an uncharted area, experienced researchers will usually withhold publication until further research clarifies the situation or the research is abandoned entirely.” This does not describe the current state of scientific publishing.

Pages 45–53 discuss effect size and introduce the idea of expected effect size as Bayesian priors.

Think of weather forecasting: experience forms an implicit prior—e.g., carrying an umbrella based on your expectation of rain.

The Milgram Experiment is foundational for psychology. Hollywood attempted to capture it in a movie.

Confidence or credible intervals express measurement uncertainty and are worth including in your work.

Chapter 4 introduces research styles: Brash, Stuffy, Liberal, and Conservative. Be aware of these in literature, peer review, and conferences.

Replicability and Power | Fishiness and Bias

Modern research focuses on power and replicability. Some projects replicate questionable, highly cited studies.

Adversarial collaborations pair researchers to resolve literature inconsistencies (e.g., emmetropization data in chicken vs. tree-shrew models).

Beware overemphasis on P-values rather than argumentation based on evidence.

Be on the lookout for strange test statistics, excessively large effect sizes, vanishingly small P-values, or flawed methodology.

Page 103: Be your own best critic!

Page 124: “Can I do that?”—clinical in the worst sense.

Page 130: Plan rigorously, tidy data, be adaptable.

Page 153 presents a thought experiment.

Pages 173–194: Some of the most important passages in this course.