Skip to contents

Removes prompts and reply text from every record and re-serializes them with a redacted marker. Two hash families then coexist, explicitly:

Usage

archive_redact(archive)

Arguments

archive

An archive (not already redacted).

Value

The archive with content removed, $redacted = TRUE, and public_record_hash filled in the manifest.

Details

  • the original record hashes and the seal root stay in the manifest untouched – they attest the full content, checkable by whoever holds the unredacted archive (the authors, under IRB terms);

  • a public hash per redacted record (public_record_hash) is added, and archive_check() verifies a redacted archive against these, so the public artifact has its own working integrity check.

What a reviewer gets from the public artifact: how many calls, to which models, with which parameters, when, at what token cost, under which root – everything except the sentences.

Examples

log <- tempfile(fileext = ".jsonl")
writeLines(paste0('{"ts":"2026-06-01T10:00:01+0000","schema_version":"1.0",',
  '"kind":"call","provider":"groq","model":"openai/gpt-oss-20b",',
  '"request":{"messages":[{"role":"user","content":"secret text"}]},',
  '"usage":{"sent":5,"rec":2},"response_id":"r-1","text":"reply"}'), log)
a <- archive_seal(archive_build(log))
r <- archive_redact(a)
archive_check(r)                       # verifies against public hashes
identical(r$seal$root, a$seal$root)    # original root preserved