Replay archived responses offline — archive

Builds a runner that serves archived replies from an archive. Passed as the .runner argument to an LLMR-style execution function, it lets the original pipeline recompute from stored responses with no provider calls and no keys: the reviewer reruns the study and gets the paper's numbers back.

Usage

archive_replay(archive, replay_mode = c("queue", "first", "strict_once"))

Arguments

archive: An unredacted archive (replay needs the content).
replay_mode: One of "queue", "first", "strict_once"; see Details.

Value

A function of class archive_replayer, suitable as a .runner argument. It returns the input experiments with response_text, sent_tokens, rec_tokens, response_id, success, and error_message columns; unmatched rows carry NA text, success = FALSE, and "not in archive". Reset it with LLMR::reset().

Details

Matching is by provider, model, canonical message content, and the generation parameters that change the answer (temperature, max tokens), so the same prompt at two temperatures does not collide. Repeated identical requests are served in archived order. Under the original parallel execution that order is completion order, so for sampled (temperature above zero) studies the multiset of draws is preserved while their assignment to replicate indices is not; for temperature-zero studies the draws are identical and the point is moot. Records logged without message content (include_messages = FALSE) cannot be replayed and are excluded.

The replayer is stateful: each key holds a queue consumed as requests arrive. LLMR::reset() returns it to its initial position so a deterministic pipeline can be replayed again. replay_mode chooses what a repeated request gets: "queue" (default) serves the next archived response in order; "first" always serves the first response for a key (idempotent, never exhausts); "strict_once" errors if a key is requested more times than it was archived, catching unintended reuse.

Examples

# One archived call (in practice the log comes from LLMR::llm_log_enable()).
log <- tempfile(fileext = ".jsonl")
writeLines(paste0('{"ts":"2026-06-01T10:00:01+0000","schema_version":"1.0",',
  '"kind":"call","provider":"openai","model":"gpt-4o-mini","status":200,',
  '"request":{"messages":[{"role":"user","content":"Capital of France?"}],',
  '"temperature":0},"usage":{"sent":5,"rec":1},',
  '"response_id":"r-1","text":"Paris"}'), log)
a <- archive_build(log)

replay <- archive_replay(a)
replay   # how many records, over how many distinct requests

# The original pipeline's calls, answered from the archive. The config's
# generation parameters are part of the key, so set them as the study did:
experiments <- tibble::tibble(
  config   = list(LLMR::llm_config("openai", "gpt-4o-mini", temperature = 0)),
  messages = list(c(user = "Capital of France?")))
replay(experiments)$response_text

# The queue advances as it serves; LLMR::reset() restores it for a second pass.
LLMR::reset(replay)
replay(experiments)$response_text