Acknowleding Multifunctions of P Value and Statistical
Acknowledgement Test - A User’s Perspective

Xinshu Zhao

info@biomedres.us +1 (720) 414-3554

One Westbrook Corporate Center, Suite 300, Westchester, IL 60154, USA

Biomedical Journal of Scientific & Technical Research

June, 2022, Volume 44, 3, pp 35464-35466

Perspective

Acknowleding Multifunctions of P Value and Statistical Acknowledgement Test - A User’s Perspective

Xinshu Zhao*

Author Affiliations

Chair Professor of Communication, University of Macau, Macau

Received: May 31, 2022 | Published: June 08, 2022

Corresponding author: Xinshu Zhao, Chair Professor of Communication, University of Macau, Macau

DOI: 10.26717/BJSTR.2022.44.007045

Introduction

While P value has been used across almost all fields, including biomedical research, debates over its uses, misuses and abuses never ceased (Amrhein, et al. [1-3]). The published views came mostly from mathematical statisticians, the group that produced the tool. Users of the tool, however, have developed, often unknowingly, functions that the producers did not design. Acknowledging the multiple functions of p value, designed and de facto, legitimate and illegitimate, may be a necessary step toward a more comprehensive understanding of this quotidian tool (Zhao [4]).

Function 1

Probabilitizing Observations: P was designed to probabilitize observation, i.e., to indicate the probability of obtaining the same or more extreme effects than observed in a random sample, assuming the observed effect does not exist in the population (Karl Pearson [5]).

Function 2

Preferring Hypothesis: Function 1 serves Function 2. If p is smaller than a predetermined threshold, p<α, the alternative hypothesis, that the effect exists in the population, is preferred over the null hypothesis, that the effect does not exist. Otherwise, if p≥α, the null hypothesis is preferred.

Functions 3 & 4

Projecting Population or Proxying Population: In practice, many have used p to indicate the probability that the effect in the population is in the same direction as observed in the sample (pp) (Hunter [6,7]). Probability of A (effect in the population) given B (effect in a sample) is not equal to the probability of B given A. Equating the two is therefore considered an illegitimate traverse and cited as a main justification for one journal to ban p values (Nuzzo [8-12]). Users, however, need an indicator to proxy for, i.e., to project approximately, the population. Given the strong and positive correlation between p and pp, p appears to be the best available for the task (BAT) (Colquhoun [4,13-15]).

Function 5

Prescreening Effect: As a pretest index, p represents the propensity, but not a precisely defined probability, that the direction of the observed effect contradicts the direction of the effect in a vaguely defined target population. Therefore, if p<α, we acknowledge that the observed direction of the effect is unlikely a fluke, therefore we are sufficiently confident that we may legitimately interpret the observed size of the effect. The pretest function of p value relies on

1) Negative correlation between p and effect size.

2) Negative correlation between p and data size.

3) Positive correlation between p and measurement variation.

They are desirable features. Users need an index that possess these features. No index has been designed to serve the need. P value is the best available for the task. Researchers need to screen out trivial relations and fortuitous occurrences before they take a closer look at the data. Editors and reviewers need to screen out hopeless manuscripts. The test p<0.05 fills these needs. After decades of interactions among researchers, reviewers, editors, and readers of science across disciplines, p < 0.05 now regularly serves as a relatively low hurdle that a researcher must step over before more serious investigation begins.This function of p does not assume that the observed data are a random sample, or that variable distributions are normal or independent and identical (IID). This explains why users still find p values useful when

1) Observed data are not a random sample.

2) Observed data constitute the entire operational population.

3) Observed data are a substantial part of an operational population.

4) While observed data and operational population are from the past, the ultimate target population of most studies projects into future.

For this function, p value has nothing to do with “significance” or “statistical significance.” Instead, p<α indicates “statistical acknowledgement” (Liu, et al. [16-18]). The pretest function of p value urges users to focus more on effect size and less on p value (Zhao, et al. [15]). It also implies that a main task of replication studies is to replicate effect direction, but not p<α.

Function 6

Pretending Significance: “Statistically significant” or simply “significant” has become synonyms of “p < 0.05”, which many experts argue is a consequential misnomer (Amrhein, et al. [1]). Indicating or pretending “significance” is a misfunction of p value. One consequence of the misfunction is to mistake p<0.05 as a main indicator to be replicated if a study is to be replicable. (Tackett, et al. [19-21]). P values vary with data size, measurement variation, and effect size, which expectedly vary across studies. Technically, therefore, p values are not meant to be replicated. More importantly, when proxying population (F4) or pretesting effects (F5), the main research findings are about effect direction and effect size, but not about p value. Across studies, it is the effect direction that needs to be replicated, and effect sizes that need to be averaged.

Conclusion

It’s time to acknowledge the multiple functions, especially the pretest function, of p value and statistical acknowledgement test, aka significance test. It’s time to acknowledge that p value plays a reduced by still important role in scientific enquiries including biomedical research. It’s time to call for research aimed at developing better tools serving each legitimate need.