Skip to contents

Calls the model .times times for every row of .data (all replicates run through the parallel engine in one pass) and appends one column per replicate: <output>_1, <output>_2, .... Feed the result to llm_agreement() for per-row majority labels and overall reliability.

Usage

llm_replicate(
  .data,
  output,
  prompt,
  .config,
  .times = 3L,
  .system_prompt = NULL,
  ...
)

Arguments

.data

A data.frame / tibble.

output

Unquoted base name for the replicate columns.

prompt

A glue template string evaluated against the columns of .data (as in llm_mutate()).

.config

An llm_config object (generative).

.times

Number of replicates (default 3).

.system_prompt

Optional system message.

...

Passed to call_llm_par() (e.g. tries, progress).

Value

.data with .times new character columns, <output>_1 ... <output>_<.times> (NA where a call failed).

Details

Replication only measures sampling variability if the model can vary: with temperature = 0 (or a fixed seed) most providers return nearly identical draws, which inflates agreement. Conversely, for measurement purposes you may want exactly that check: high disagreement at low temperature signals a prompt the model finds genuinely ambiguous.

Examples

if (FALSE) { # \dontrun{
cfg <- llm_config("groq", "openai/gpt-oss-20b", temperature = 1)
df <- tibble::tibble(text = c("I loved it", "Meh", "Terrible service"))
reps <- df |>
  llm_replicate(sentiment,
                prompt = "Sentiment of '{text}'. One word: positive, negative, or neutral.",
                .config = cfg, .times = 5)
llm_agreement(reps, prefix = "sentiment")
} # }