
Redact an archive's content while keeping its hash tree
Source:R/archive_archive.R
archive_redact.RdRemoves prompts and reply text from every record and re-serializes them
with a redacted marker. Two hash families then coexist, explicitly:
Value
The archive with content removed, $redacted = TRUE, and
public_record_hash filled in the manifest.
Details
the original record hashes and the seal root stay in the manifest untouched – they attest the full content, checkable by whoever holds the unredacted archive (the authors, under IRB terms);
a public hash per redacted record (
public_record_hash) is added, andarchive_check()verifies a redacted archive against these, so the public artifact has its own working integrity check.
What a reviewer gets from the public artifact: how many calls, to which models, with which parameters, when, at what token cost, under which root – everything except the sentences.
Examples
log <- tempfile(fileext = ".jsonl")
writeLines(paste0('{"ts":"2026-06-01T10:00:01+0000","schema_version":"1.0",',
'"kind":"call","provider":"groq","model":"openai/gpt-oss-20b",',
'"request":{"messages":[{"role":"user","content":"secret text"}]},',
'"usage":{"sent":5,"rec":2},"response_id":"r-1","text":"reply"}'), log)
a <- archive_seal(archive_build(log))
r <- archive_redact(a)
archive_check(r) # verifies against public hashes
identical(r$seal$root, a$seal$root) # original root preserved