Run the audit grid — audit_run • LLMRcontent

Crosses prompts x models x label orders x temperatures, measures every unit under every cell, recomputes the estimator per cell, and returns the audit object the stability functions consume. Cell 1 – baseline prompt, first model, "as_given", first temperature – is the reference for audit_fragility().

Usage

audit_run(plan, .runner = NULL, ...)

Arguments

plan: An audit_plan() with at least one model.
.runner: Offline runner seam: a function (experiments, ...) that receives a data frame with config and messages list-columns and returns those rows with at least response_text. When it returns a success column, every row must be successful. Default LLMR::call_llm_par().
...: Passed to the runner (e.g. tries, progress).

Value

An audit object with cells (one row per grid cell), units (the unit-level trail; see audit_units()), and plan. The cells table contains cell, prompt, model, label_order, temperature, estimate, parse_failures, and tokens. Estimator errors inside a cell yield estimate = NA for that cell.

Examples

if (FALSE) { # \dontrun{
speeches <- data.frame(
  text = c("cut taxes now", "deregulate markets",
           "fund the schools", "expand care"))
plan <- audit_plan(
  data = speeches, text = "text",
  estimator = function(d) mean(d$label == "conservative", na.rm = TRUE),
  labels = c("conservative", "progressive"),
  prompt = "Classify as one of: {labels}.\n\n{text}\n\nLabel:")
plan <- audit_add_models(plan,
  list(oss = LLMR::llm_config("groq", "openai/gpt-oss-20b", temperature = 0)))
audit <- audit_run(plan)
audit
audit_stability(audit)
audit_fragility(audit)
head(audit_units(audit))
} # }

# The `.runner` seam answers the grid without a provider, for tests or for
# a deterministic or external coder. The same plan, scored offline:
speeches <- data.frame(
  text = c("cut taxes now", "deregulate markets",
           "fund the schools", "expand care"))
plan <- audit_plan(
  data = speeches, text = "text",
  estimator = function(d) mean(d$label == "conservative", na.rm = TRUE),
  labels = c("conservative", "progressive"),
  prompt = "Classify as one of: {labels}.\n\n{text}\n\nLabel:")
plan <- audit_add_models(plan,
  list(oss = LLMR::llm_config("groq", "openai/gpt-oss-20b", temperature = 0)))
keyword_coder <- function(experiments, ...) {
  msg <- vapply(experiments$messages, `[[`, "", "user")
  experiments$response_text <- ifelse(grepl("taxes|deregulate", msg),
                                      "conservative", "progressive")
  experiments
}
audit <- audit_run(plan, .runner = keyword_coder)
audit_stability(audit)