Skip to contents

Crosses prompts x models x label orders x temperatures, measures every unit under every cell, recomputes the estimator per cell, and returns the audit frame the stability functions consume. Cell 1 – baseline prompt, first model, "as_given", first temperature – is the reference for audit_fragility().

Usage

audit_run(plan, .runner = NULL, ...)

Arguments

plan

An audit_plan() with at least one model.

.runner

Internal seam for tests: function(experiments, ...) returning the experiments with a response_text column. Default LLMR::call_llm_par().

...

Passed to the runner (e.g. tries, progress).

Value

An audit: a tibble with one row per cell – cell, prompt, model, label_order, temperature, estimate, parse_failures, tokens (when the runner reports usage) – with the plan and the unit-level table (see audit_units()) as attributes. Estimator errors inside a cell yield estimate = NA for that cell.

Examples

if (FALSE) { # \dontrun{
speeches <- data.frame(
  text = c("cut taxes now", "deregulate markets",
           "fund the schools", "expand care"))
plan <- audit_plan(
  data = speeches, text = "text",
  estimator = function(d) mean(d$label == "conservative", na.rm = TRUE),
  labels = c("conservative", "progressive"),
  prompt = "Classify as one of: {labels}.\n\n{text}\n\nLabel:")
plan <- audit_add_models(plan,
  list(oss = LLMR::llm_config("groq", "openai/gpt-oss-20b", temperature = 0)))
audit <- audit_run(plan)
audit
audit_stability(audit)
audit_fragility(audit)
head(audit_units(audit))
} # }

# The `.runner` seam answers the grid without a provider, for tests or for
# a deterministic or external coder. The same plan, scored offline:
speeches <- data.frame(
  text = c("cut taxes now", "deregulate markets",
           "fund the schools", "expand care"))
plan <- audit_plan(
  data = speeches, text = "text",
  estimator = function(d) mean(d$label == "conservative", na.rm = TRUE),
  labels = c("conservative", "progressive"),
  prompt = "Classify as one of: {labels}.\n\n{text}\n\nLabel:")
plan <- audit_add_models(plan,
  list(oss = LLMR::llm_config("groq", "openai/gpt-oss-20b", temperature = 0)))
keyword_coder <- function(experiments, ...) {
  msg <- vapply(experiments$messages, `[[`, "", "user")
  experiments$response_text <- ifelse(grepl("taxes|deregulate", msg),
                                      "conservative", "progressive")
  experiments
}
audit <- audit_run(plan, .runner = keyword_coder)
audit_stability(audit)