Compares the panel's response marginals, item by item, to human benchmark marginals you supply (from ANES, GSS, Pew, your own fielded study). Calibration here reports deviation from the benchmark without adjusting the underlying estimates: deviations are reported as found, and the comparison is restricted to items the benchmark actually covers. Coverage is partial when only some items have a benchmark, and the print banner reflects it – a benchmark touching one of five items yields PARTIALLY CALIBRATED (1/5). Nonresponse (parse failures, refusals) is recorded per item alongside, since shares computed only over valid responses flatter an instrument the model often refuses.
Arguments
- responses
A
panel_administer()result.- benchmark
A data frame with columns
item_id,response, andshare(human marginal proportions). Shares within an item should sum to 1; a deviation beyond rounding draws a warning.- benchmark_name
How the source should be cited in reports (e.g.
"ANES 2024 pilot").
Value
responses with the calibration attribute set:
$table (per covered item and response: share_silicon,
share_human, deviation), $nonresponse (per item),
$items_covered / $items_total, $mad, $max_dev.
Examples
if (FALSE) { # \dontrun{
set.seed(110)
panel <- panel_from_margins(list(party = c(left = .5, right = .5)), n = 12)
instr <- panel_instrument(item_choice("plan", "Which plan do you prefer?",
c("A", "B")))
cfg <- LLMR::llm_config("groq", "openai/gpt-oss-20b")
r <- panel_administer(panel, instr, cfg)
r # UNCALIBRATED banner
bench <- data.frame(item_id = "plan", response = c("A", "B"),
share = c(.5, .5))
panel_calibrate(r, bench, "toy human study")
} # }