Placebo tests for the measurement pipeline — audit

Negative controls for the audit. A placebo asks whether the pipeline can produce the reported number when, by construction, it should not: when the link between labels and units is broken, or when the measured texts do not contain the construct. A pipeline that "finds" the effect under either condition is measuring its instrument.

Usage

audit_placebo(
  audit,
  type = c("label_permutation", "irrelevant_text"),
  reps = 200L,
  texts = NULL,
  .runner = NULL,
  ...
)

Arguments

audit: An audit returned by audit_run().
type: Placebo construction. "label_permutation" permutes labels within each audited cell, with no new model calls. "irrelevant_text" replaces the plan's text column with researcher-supplied construct-free texts and re-runs the grid.
reps: Number of label permutations per cell for "label_permutation".
texts: For "irrelevant_text": a character vector of texts in which the construct is absent by design (weather reports for a partisanship estimand, say). Choosing them is a research decision the package cannot make for you. The vector is recycled deterministically (rep_len()) to the number of units.
.runner: Offline runner seam, passed to audit_run() for the irrelevant-text rerun.
...: Passed to audit_run() for the irrelevant-text rerun.

Value

An audit_placebo object: a list with type, cells, reps, and n_units, with a print method. For "label_permutation", cells has one row per audit cell with the observed estimate, the centered permutation p, the null interval (null_lo, null_hi, the 2.5% and 97.5% permutation quantiles), a degenerate flag, and permutation counts. For "irrelevant_text", cells has the real estimate, estimate_placebo, and parse_failures_placebo per cell; the comparison is descriptive, with no p-values.

Details

The permutation placebo holds each cell's label marginal fixed and shuffles which unit got which label, recomputing the estimator each time. Estimators that use only the marginal (a share, say) are permutation-invariant: every permuted estimate equals the observed one, the cell is flagged degenerate = TRUE with p = NA, and the print method says the placebo is uninformative for that estimand. For association estimands the permutation distribution is the no-association null, and the p-value is centered on its median: (1 + #(|perm - m| >= |obs - m|)) / (#valid + 1). Estimator errors inside a permutation become NA, are excluded, and are counted.

Permutations use the current RNG state; the function never sets a seed. Set one beforehand when the draw must be reproducible.

The irrelevant-text placebo re-runs the full grid and therefore costs calls unless a .runner is injected.

Examples

if (FALSE) { # \dontrun{
speeches <- data.frame(
  text = c("cut taxes now", "deregulate markets",
           "fund the schools", "expand care"),
  half = c("first", "first", "second", "second"))
plan <- audit_plan(
  data = speeches, text = "text",
  estimator = function(d) {
    mean(d$label[d$half == "first"] == "conservative", na.rm = TRUE) -
      mean(d$label[d$half == "second"] == "conservative", na.rm = TRUE)
  },
  labels = c("conservative", "progressive"),
  prompt = "Classify as one of: {labels}.\n\n{text}\n\nLabel:")
plan <- audit_add_models(plan,
  list(oss = LLMR::llm_config("groq", "openai/gpt-oss-20b", temperature = 0)))
audit <- audit_run(plan)

set.seed(110)            # the permutation draws locally
audit_placebo(audit, reps = 199L)

audit_placebo(audit, type = "irrelevant_text",
              texts = c("Rain is likely this afternoon.",
                        "Winds stay light overnight."))
} # }