Statistical Reporting Recommendations

Statistical Reporting Recommendations

Sound statistical analyses are necessary to draw inferences from biomedical data. JAH recommends that authors adhere to the following established guidelines for reporting scientific results, as applicable:

In addition, JAH recommends that authors adhere to the following general statical reporting guidelines:

  • Methods used to analyze data should be described in sufficient detail so that the work can be replicated.
  • Demographic information should be given for human subjects.
  • Quantitative results should be presented in the following order: the estimated effect size (point estimate), the confidence interval (typically 95%), and then the associated actual p-value. P-values should be reported to a minimum of two significant digits. Scientific notation is recommended for small p-values. Designations of “not significant” (NS) should be avoided.
  • Adjustment of P-values for multiple comparisons is under the discretion of the authors. However, all manuscripts should state whether or not P-values were adjusted for multiple comparisons and authors should provide an explanation for the approach used. Authors should state the total number of statistical tests performed and, when applicable, the number of tests that were adjusted for.
  • Authors should state whether the primary analyses were pre-planned or post hoc. Authors should explain whether the primary goals of the study were to confirm an existing hypothesis (and state the hypothesis) or to generate a new hypothesis.
  • Study conclusions should reflect the nature of the hypothesis tested and not overstate the certainty of findings. Findings based on post-hoc comparisons should be framed as exploratory and should emphasize the importance of future replication and confirmation.
  • Study conclusions should emphasize clinical or biological significant rather than just statistical significance of the findings. Conclusions should not be based only on whether a p-value passes a specific threshold.
  • Exact sample size must be clearly stated for every statistical test conducted and every sub-group compared.
  • Explicitly state power calculations and assumptions made by power calculations including effect size and significance threshold. If power calculations were not performed, explain why. Power calculations can be performed using most statistical software, many of which are available online.
  • All studies must consider sex as a biological variable. For studies that include only one sex by design, authors must provide sufficient scientific justification for the approach.
  • Combining or pooling data across sexes is encouraged to improve power; however, the analysis should include covariate adjustment for sex if it is a confounder. Results should routinely be presented disaggregated by sex. If the sample size is large enough, authors should present a statistical evaluation of interaction effects involving sex. Omitting sex from the analysis should only occur after it has been robustly and explicitly demonstrated that sex was not acting as a confounder or effect modifier in the experiment.
  • Data should be presented in a way that emphasizes completeness, informativeness, and truthfulness. In addition to adhering to the figure guidelines, please consider the following recommendations:
    • To better display data density and distribution, dot plots or violin plots are preferred in lieu of bar charts.
    • Error bars should be shown in both directions.
    • For correlations showing the scatter plot helps display the full relationship and may show outliers that are driving the correlation.
  • When presenting “representative” images to illustrate a key finding, authors should describe the process of selecting the representative image from multiple choices. Images from all replicates should be included as supplementary material.
  • Units should be clearly stated in the text, tables, and figures. Arbitrary and relative units must be clearly defined by stating the normalization procedures used.
  • Please note these commonly misapplied tests:
    • Using parametric tests (e.g. t-test, ANOVA) when population mean normality has not been established and the central limit theorem (CLT) cannot be assumed to apply. In such scenarios a non-parametric test should be used (e.g. Mann-Whitney, Wilcoxon rank sum). Note that CLT often applies in the case of many experimental replicates (such as in cell line studies).
    • Common tests of normality are not well powered to detect departures from normality with the n is small. In these cases, normality should be supported by external information from larger sample sizes in the literature. In the absence of this information, non-parametric tests should be used.
    • Applying a t-test to proportions should be avoided, as it is preferable to use exact count values. If one or more individual cell sizes are n<20, Fisher’s exact test should be used. If n>20 for all cells, then chi-square should be used. When exact count values are not available, a test designed for analyzing proportions (e.g. Z-test, Mantel-Haenszel test) should be used.
    • Pearson correlations should be accompanied by a scatter plot and tests should be done to identify the presence of outliers. If the correlation is being driven by outliers, use a non-parametric correlation test (e.g., Spearman’s rho or Kendall’s tau) instead.
    • If the assumption of independent sampling is not met (e.g. data points are multiple samples from the same organism), the statistical model or test should account for this. Appropriate models in this scenario could include repeated measures ANOVA, mixed models, averaging or taking the median value from each organism, or paired t-test.