Skip to contents

Draws a stratified sample of archived calls, re-issues them, and reports exact-match rates and any change in the served model_version. Exact reproduction is expected only for temperature-zero calls on pinned open-weight backends; for sampled calls disagreement is sampling, not necessarily drift, and the print method says so.

Usage

archive_verify(
  archive,
  sample = 0.05,
  strata = c("provider", "model"),
  .runner = NULL,
  ...
)

Arguments

archive

An archive.

sample

Fraction in (0, 1], or a total count of calls to re-issue.

strata

Manifest columns to stratify by.

.runner

Internal seam for tests: a function taking an experiments tibble (config and messages list-columns) and returning it with a response_text column. Default LLMR::call_llm_par().

...

Passed to the runner.

Value

An archive_drift object: table is a per-stratum tibble that begins with the chosen strata columns (provider, model by default), followed by n_eligible, n_sampled, n_exact, exact_rate, n_temperature0, n_version_changed; details is a per-record tibble (idx, provider, model, temperature, exact, archived_version, served_version).

Details

With sample <= 1 the value is a fraction and each stratum contributes ceiling(sample * n) records. With sample > 1 the value is a total count, allocated across strata by largest remainder in proportion to stratum size, with at least one record per nonempty stratum when the total allows. The draw uses sample(); set a seed beforehand for a reproducible sample.

Examples

log <- tempfile(fileext = ".jsonl")
writeLines(paste0('{"ts":"2026-06-01T10:00:01+0000","schema_version":"1.0",',
  '"kind":"call","provider":"openai","model":"gpt-4o-mini","status":200,',
  '"request":{"messages":[{"role":"user","content":"Capital of France?"}],',
  '"temperature":0},"model_version":"gpt-4o-mini-2026-06-01",',
  '"usage":{"sent":5,"rec":1},"response_id":"r-1","text":"Paris"}'), log)
a <- archive_build(log)

# Live use issues real calls; here a runner that echoes the archived text
# stands in through the .runner seam so the example needs no key.
echo <- function(experiments, ...) {
  experiments$response_text <- "Paris"
  experiments$model_version <- "gpt-4o-mini-2026-06-01"
  experiments
}
archive_verify(a, sample = 1, .runner = echo)$table