Clinical overview

Every clinical decision a registrar makes — to give magnesium sulphate, to offer caesarean after a previous caesarean, to start anti-D — rests on a chain of evidence that someone analysed statistically. You cannot read a guideline, critique a journal club paper, defend a management plan in the FCOG(SA) oral, or design the research dissertation required for MMed without a working command of biomedical statistics. This is not a niche academic skill: it is the literacy that lets you tell a real treatment effect from noise, a useful test from a misleading one, and a trustworthy trial from a fragile one.

The FCOG(SA) examiners do not expect you to derive formulae. They expect interpretation under uncertainty: given a 2×2 table, a confidence interval, a forest plot, or a Kaplan–Meier curve, can you say what it means for the patient in front of you and the population behind her? This objective is weighted to higher-order thinking — you will be asked to apply, analyse and judge, not merely define. The discipline runs through the whole curriculum: the sensitivity of the cervical screening HPV test, the number-needed-to-treat for aspirin in pre-eclampsia prophylaxis, the relative risk reduction in the WOMAN trial of tranexamic acid for Postpartum haemorrhage. Statistics is the connective tissue of evidence-based obstetrics and gynaecology.

Core knowledge

Types of data and descriptive statistics

Know your variable types, because they dictate the test. Categorical data are nominal (blood group, mode of delivery) or ordinal (Apgar score, cancer stage). Numerical data are discrete (parity, number of antenatal visits) or continuous (birthweight, blood pressure). Continuous data are summarised by a measure of central tendency and one of spread: use mean and standard deviation (SD) for symmetrical (approximately normal) distributions, and median and interquartile range (IQR) for skewed data — birthweight is roughly normal, but length of hospital stay and serum βhCG are right-skewed and demand the median. The standard error of the mean (SEM = SD/√n) is not a measure of spread of the data; it measures the precision of the estimate and shrinks as the sample grows. Confusing SD with SEM is a classic error that makes a result look more precise than it is.

Distributions, the null hypothesis and p-values

Much of frequentist inference assumes an underlying normal (Gaussian) distribution, where ~68% of observations lie within 1 SD of the mean and ~95% within ~1.96 SD (standard teaching). We test a null hypothesis (H₀) — typically "no difference between groups" — against an alternative. The p-value is the probability of observing data as extreme as, or more extreme than, those obtained if the null hypothesis were true. By convention p < 0.05 is called "statistically significant," but this threshold is arbitrary and much abused. A p-value is not the probability that the null hypothesis is true, not the probability the result arose by chance, and tells you nothing about the size or clinical importance of an effect. A trivial difference can be highly significant in a huge sample; a clinically important difference can be non-significant in a small one.

Confidence intervals

Prefer the confidence interval (CI) to the bare p-value. A 95% CI is the range that, on repeated sampling, would contain the true population value 95% of the time — pragmatically, the plausible range for the true effect. Its width reflects precision (narrow = precise, usually from a large sample). The CI carries the significance test within it:

For a difference (mean difference, risk difference), a 95% CI that crosses 0 is non-significant at p < 0.05.
For a ratio (relative risk, odds ratio, hazard ratio), a 95% CI that crosses 1 is non-significant.

This single rule lets you read most results in a paper without the p-value at all.

Type I and Type II error, power

Two ways to be wrong. A Type I (α) error is a false positive — rejecting a true null, "finding" an effect that is not real; α is conventionally set at 0.05. A Type II (β) error is a false negative — failing to detect a real effect, usually because the study is too small. Power = 1 − β is the probability of detecting an effect that genuinely exists; trials are typically powered to 80–90%. Underpowered studies are the commonest reason a real benefit is "not significant," which is why "no significant difference" must never be read as "no difference" — absence of evidence is not evidence of absence. Multiple testing inflates the Type I error: test twenty independent hypotheses at α = 0.05 and you expect one false positive by chance, the rationale for caution about unplanned subgroup analyses and for corrections such as Bonferroni.

Clinical overview

Core knowledge

Types of data and descriptive statistics

Distributions, the null hypothesis and p-values

Confidence intervals

For a difference (mean difference, risk difference), a 95% CI that crosses 0 is non-significant at p < 0.05.

For a ratio (relative risk, odds ratio, hazard ratio), a 95% CI that crosses 1 is non-significant.

This single rule lets you read most results in a paper without the p-value at all.

Type I and Type II error, power

Completion of basic biomedical statistics course (if offered or required uniformly for MMed)

Clinical overview

Core knowledge

Types of data and descriptive statistics

Distributions, the null hypothesis and p-values

Confidence intervals

Type I and Type II error, power

The rest of this chapter is locked

Unlock the full package.

Completion of basic biomedical statistics course (if offered or required uniformly for MMed)

Clinical overview

Core knowledge

Types of data and descriptive statistics

Distributions, the null hypothesis and p-values

Confidence intervals

Type I and Type II error, power

The rest of this chapter is locked