Agreement across replicated LLM annotations

Computes per-row majority labels and overall reliability for replicate columns produced by llm_replicate() (or any set of columns holding repeated codings of the same units, including codings by different models or by humans). Reliability is reported as average pairwise percent agreement and Krippendorff's alpha for nominal data, the statistic reviewers most often ask for; alpha handles missing values (failed calls) gracefully.

Usage

llm_agreement(.data, cols = NULL, prefix = NULL, normalize = TRUE)

Arguments

.data: A data frame holding the replicate columns.
cols: Character vector naming the replicate columns. Alternatively supply prefix.
prefix: Base name: columns matching <prefix>_1, <prefix>_2, ... are used.
normalize: If TRUE (default), values are compared after trimming whitespace and lowercasing, so "Positive" and " positive" agree. Set to FALSE for exact string comparison.

Value

An object of class llmr_agreement: a list with

by_row: a tibble with one row per unit: majority (modal label, NA on ties), share (modal share of non-missing replicates), n_distinct, unanimous, tie, n_missing.
summary: a one-row tibble: n_units, n_replicates, mean_pairwise_agreement, krippendorff_alpha, n_unanimous, n_ties.

Printing shows the summary.

References

Krippendorff, K. (2019). Content Analysis: An Introduction to Its Methodology (4th ed.), chapter 12. The alpha implemented here is the nominal-data form with missing values allowed.

Agreement across replicated LLM annotations

Usage

Arguments

Value

References

See also