Applies the locked protocol to every text, with protocol$replicates
codings per unit. With replicates, the modal label and the share of
replicates agreeing with it are returned, which is the unit-level
stability diagnostic reviewers should ask for.
Arguments
- corpus
A data frame.
- protocol
A locked
protocol().- text
Name of the text column in
corpus.- .runner
Internal seam for tests: a function
(experiments, ...)returning the experiments with aresponse_textcolumn. DefaultLLMR::call_llm_par().- id
Optional name of a stable unit-identifier column in
corpus. Carry it through sogold_correct()can link audit units to corpus rows by id, the only way to disambiguate rows that share identical text. Use the sameidhere as ingold_set().- ...
Passed to the runner (e.g.
tries,progress).
Value
corpus plus label (modal label), label_share (share of
replicates agreeing with it), parse_failures per unit, a .text_hash
linkage column, and when protocol$replicates > 1 the individual replicate
columns label_rep1, label_rep2, ....
Details
Execution is live and parallel (LLMR::call_llm_par()); the
.runner seam accepts any function with the same contract, including
the replayer archive_replay() returns.
Examples
cb <- codebook("tone", "one sentence",
list(cb_category("positive", "Approving."),
cb_category("negative", "Critical.")))
cfg <- LLMR::llm_config("groq", "openai/gpt-oss-20b", temperature = 0)
if (FALSE) { # \dontrun{
p <- protocol_lock(protocol(cb, cfg, replicates = 2))
code_corpus(data.frame(text = c("clear progress", "serious problem")),
p, "text")
} # }
# The `.runner` seam answers the calls without a provider, for tests or for
# a deterministic or external coder:
p <- protocol_lock(protocol(cb, cfg))
keyword_coder <- function(experiments, ...) {
user <- vapply(experiments$messages, `[[`, "", "user")
experiments$response_text <- ifelse(grepl("progress", user),
"positive", "negative")
experiments
}
code_corpus(data.frame(text = c("clear progress", "serious problem")),
p, "text", .runner = keyword_coder)