Summarize execution failures, parse failures, and first-option sensitivity

Counts execution and parse failures by item. For closed items administered with randomized option order, it also applies a chi-squared test to the chosen response and the option shown first. The test does not use the full option permutation.

Usage

panel_bias_audit(responses)

Arguments

responses: A panel_administer() result.

Value

A tibble: item_id, n, parse_failures, execution_failures, order_effect_p (the first-option chi-squared p-value; NA when order was not randomized or cells are too sparse).

Examples

panel <- panel_from_margins(list(group = c(A = 1)), n = 4)
instrument <- panel_instrument(
  item_choice("pick", "Choose one.", c("A", "B")),
  randomize = character(0))
config <- LLMR::llm_config("groq", "example-model")
runner <- function(experiments, ...) {
  experiments$response_text <- "A"
  experiments$success <- TRUE
  experiments
}
responses <- panel_administer(panel, instrument, config, .runner = runner)
panel_bias_audit(responses)