Power for a cluster-randomized trial

You ran a trial where whole clinics were randomized to a treatment, and you measured cholesterol on the patients inside them. Patients in the same clinic resemble each other, so a plain regression would treat correlated observations as if they were independent. A random intercept per clinic absorbs that between-clinic variation: cholesterol ~ treatment + (1|clinic), fit by maximum likelihood (the MLE estimator for family="lme").

Variations

Smaller or larger treatment effect. The effect is on the binary benchmark scale — swap treatment=0.50 (medium) for treatment=0.20 (small) or treatment=0.80 (large) to see how the expected group gap moves power.
More or stronger clustering. Move set_cluster("clinic", ICC=0.10, …) to a higher ICC (e.g. 0.20) when clinics differ more from each other — stronger clustering costs power for the same total N. The ICC is the within-clinic correlation that survives after the treatment is accounted for.
Trade clusters against cluster size. Keep N fixed but swap n_clusters=40 for n_clusters=20 (now 20 patients per clinic) — for a cluster-randomized design more clinics usually buys more power than more patients per clinic.
Add a covariate. Append a pre-treatment continuous control (e.g. cholesterol ~ treatment + baseline_bp + (1|clinic), baseline_bp=0.25) to adjust the treatment estimate and recover power.
Solve for N instead. Replace find_power(sample_size=400, …) with find_sample_size(target_test="treatment", from_size=120, to_size=600, by=40) to get the minimum N that reaches 80% power.
Same design, other fields:
- biomass ~ treatment + (1|tank) — one tank per treatment arm; test whether plant biomass differs across tanks (ecology).
- wage ~ training + (1|sector) — one sector per arm; test whether a training programme shifts wages across sectors (social science).

Not this setup?

If you'd rather have…

A treatment effect that varies across sites — let the treatment effect vary across clusters/sites (random treatment slope) instead of a single fixed effect.
Repeated measures on patients — random intercept grouping units that are repeatedly measured (patients) rather than randomized clusters.
The same contrast without clustering — estimate the same two-group treatment contrast without any clustering / random effect.

Copy-paste setup

from mcpower import MCPower

# Cluster-randomized trial: does the treatment lower cholesterol, once we
# account for patients in the same clinic being correlated? A random intercept
# per clinic soaks up that between-clinic variation.
model = MCPower("cholesterol = treatment + (1|clinic)", family="lme")

# treatment is a binary two-level predictor (0 = control, 1 = treatment).
model.set_variable_type("treatment=binary")

# Expected effect on the binary benchmark scale: 0.50 = a medium treatment gap.
model.set_effects("treatment=0.50")

# Clustering: ICC=0.10 (10% of variance is between-clinic) across 40 clinics.
# At N=400 that is 10 patients per clinic.
model.set_cluster("clinic", ICC=0.10, n_clusters=40)

model.find_power(sample_size=400, target_test="treatment")

suppressMessages(library(mcpower))

# Cluster-randomized trial: does the treatment lower cholesterol, once we
# account for patients in the same clinic being correlated? A random intercept
# per clinic soaks up that between-clinic variation.
model <- MCPower$new("cholesterol ~ treatment + (1|clinic)", family = "lme")

# treatment is a binary two-level predictor (0 = control, 1 = treatment).
model$set_variable_type("treatment=binary")

# Expected effect on the binary benchmark scale: 0.50 = a medium treatment gap.
model$set_effects("treatment=0.50")

# Clustering: ICC=0.10 (10% of variance is between-clinic) across 40 clinics.
# At N=400 that is 10 patients per clinic.
model$set_cluster("clinic", ICC = 0.10, n_clusters = 40)

invisible(model$find_power(sample_size = 400, target_test = "treatment"))