Crosses prompts x models x label orders x temperatures, measures every
unit under every cell, recomputes the estimator per cell, and returns the
audit frame the stability functions consume. Cell 1 – baseline prompt,
first model, "as_given", first temperature – is the reference for
audit_fragility().
Arguments
- plan
An
audit_plan()with at least one model.- .runner
Internal seam for tests:
function(experiments, ...)returning the experiments with aresponse_textcolumn. DefaultLLMR::call_llm_par().- ...
Passed to the runner (e.g.
tries,progress).
Value
An audit: a tibble with one row per cell – cell,
prompt, model, label_order, temperature, estimate,
parse_failures, tokens (when the runner reports usage) – with the
plan and the unit-level table (see audit_units()) as attributes.
Estimator errors inside a cell yield estimate = NA for that cell.
Examples
if (FALSE) { # \dontrun{
speeches <- data.frame(
text = c("cut taxes now", "deregulate markets",
"fund the schools", "expand care"))
plan <- audit_plan(
data = speeches, text = "text",
estimator = function(d) mean(d$label == "conservative", na.rm = TRUE),
labels = c("conservative", "progressive"),
prompt = "Classify as one of: {labels}.\n\n{text}\n\nLabel:")
plan <- audit_add_models(plan,
list(oss = LLMR::llm_config("groq", "openai/gpt-oss-20b", temperature = 0)))
audit <- audit_run(plan)
audit
audit_stability(audit)
audit_fragility(audit)
head(audit_units(audit))
} # }
# The `.runner` seam answers the grid without a provider, for tests or for
# a deterministic or external coder. The same plan, scored offline:
speeches <- data.frame(
text = c("cut taxes now", "deregulate markets",
"fund the schools", "expand care"))
plan <- audit_plan(
data = speeches, text = "text",
estimator = function(d) mean(d$label == "conservative", na.rm = TRUE),
labels = c("conservative", "progressive"),
prompt = "Classify as one of: {labels}.\n\n{text}\n\nLabel:")
plan <- audit_add_models(plan,
list(oss = LLMR::llm_config("groq", "openai/gpt-oss-20b", temperature = 0)))
keyword_coder <- function(experiments, ...) {
msg <- vapply(experiments$messages, `[[`, "", "user")
experiments$response_text <- ifelse(grepl("taxes|deregulate", msg),
"conservative", "progressive")
experiments
}
audit <- audit_run(plan, .runner = keyword_coder)
audit_stability(audit)