Skip to contents

Applies the locked protocol to every text, with protocol$replicates codings per unit. With replicates, the modal label and the share of replicates agreeing with it are returned, which is the unit-level stability diagnostic reviewers should ask for.

Usage

code_corpus(corpus, protocol, text, .runner = NULL, id = NULL, ...)

Arguments

corpus

A data frame.

protocol

A locked protocol().

text

Name of the text column in corpus.

.runner

Internal seam for tests: a function (experiments, ...) returning the experiments with a response_text column. Default LLMR::call_llm_par().

id

Optional name of a stable unit-identifier column in corpus. Carry it through so gold_correct() can link audit units to corpus rows by id, the only way to disambiguate rows that share identical text. Use the same id here as in gold_set().

...

Passed to the runner (e.g. tries, progress).

Value

corpus plus label (modal label), label_share (share of replicates agreeing with it), parse_failures per unit, a .text_hash linkage column, and when protocol$replicates > 1 the individual replicate columns label_rep1, label_rep2, ....

Details

Execution is live and parallel (LLMR::call_llm_par()); the .runner seam accepts any function with the same contract, including the replayer archive_replay() returns.

Examples

cb <- codebook("tone", "one sentence",
  list(cb_category("positive", "Approving."),
       cb_category("negative", "Critical.")))
cfg <- LLMR::llm_config("groq", "openai/gpt-oss-20b", temperature = 0)
if (FALSE) { # \dontrun{
p <- protocol_lock(protocol(cb, cfg, replicates = 2))
code_corpus(data.frame(text = c("clear progress", "serious problem")),
            p, "text")
} # }

# The `.runner` seam answers the calls without a provider, for tests or for
# a deterministic or external coder:
p <- protocol_lock(protocol(cb, cfg))
keyword_coder <- function(experiments, ...) {
  user <- vapply(experiments$messages, `[[`, "", "user")
  experiments$response_text <- ifelse(grepl("progress", user),
                                      "positive", "negative")
  experiments
}
code_corpus(data.frame(text = c("clear progress", "serious problem")),
            p, "text", .runner = keyword_coder)