The LLMR ecosystem
A set of R packages for using language models in research. One package handles the providers; the others build the methods a study needs on top of it: annotation checked against human labels, robustness audits, replication archives, agent designs, and calibrated silicon samples. The methods can run on inexpensive open-weight models, so validation and auditing do not require a large compute budget.
Why this exists
Language-model output increasingly appears in published tables as if it were measured data. Often a single prompt and a single run stand in for an instrument: no codebook, no reliability estimate, and no record a later reader could check. Content analysis worked out how to handle these problems long ago, through codebooks, gold standards, reliability statistics, and documentation. These packages apply that older discipline to language-model work. They run on inexpensive open-weight models, so the checks they ask for cost little to perform.
The workflow, end to end
- callLLMR
any provider, one interface - simulateLLMRAgent · LLMRpanel · FocusGroup
agents, panels, groups - analyzeLLMRcontent
code, audit, archive
Which package?
| Package | Use it when | Not for |
|---|---|---|
| LLMRcontent | Validated text measurement for quantitative inference | Accessible qualitative coding or text segmentation |
| LLMRcontent | Robustness of a downstream estimand across measurement choices | Construct validity or pass/fail validation |
| LLMRcontent | Reviewer-runnable replication archives from LLM audit logs | Measurement or estimation |
| LLMRpanel | Calibrated silicon survey samples for design-stage work | Human-population estimates without calibration to a human benchmark |
| FocusGroup | Simulating moderated group discussion to pilot instruments or probe how a turn shifts the next | Estimating quantities about real human populations |
These packages follow the same workflow contract: construct a first
object, build or extend it, run it with a .runner seam, read
diagnostics(), draft report(), and optionally
archive with LLMRcontent.
The packages
LLMR
Unified access to OpenAI, Anthropic, Gemini, Groq, DeepSeek, Together, Ollama and more: structured output with schema enforcement, tool calling, streaming, logprobs, embeddings, tidy data-frame verbs, parallel and half-price batch execution, audit logging, replication and agreement statistics, cost accounting.
LLMRAgent
Agents with personas, tools, pluggable memory, and token budgets checked before every call; agents that delegate to other agents; multi-agent debates, interviews, and deliberations with private votes; factorial agent experiments; and an orchestrator that pairs a capable planning model with a cheaper model for routine work.
LLMRcontent
LLM-assisted content analysis in one package: codebook-first coding with versioned instruments and sealed gold validation, measurement-error-corrected prevalences, and prompt-and-model tournaments; measurement-multiverse robustness audits that recompute the estimand across prompts, model families, label order, and temperature, with a fragility index; and verifiable replication archives built from audit logs, with content-addressed sealing, IRB-grade redaction, and a verifiability horizon.
LLMRpanel
Calibrated silicon samples for the design stage: persona panels drawn from margins you supply, Likert and choice items, vignette and conjoint designs with the order randomization recorded as data, and bias audits. Results stay marked as uncalibrated until they are compared against a human benchmark.
FocusGroup
Simulation and analysis of focus-group discussions with LLM agents:
moderated sessions with configurable turn-taking, transcript analysis, and a
continuation experiment that perturbs one turn and reads the next as a
dependent variable. Ships a Shiny GUI
(run_focus_studio()) for running, analyzing, and experimenting.
Principles
- Validation by default. The codebooks, gold sets, and model comparisons that a careful study needs are built into the normal way of working, not left as extra steps a user has to add.
- Complete records. Every call can be logged, hashed, sealed, and replayed, so a methods section can cite a fixed artifact rather than a model name that may change.
- Diagnostics in the objects. Test-split evaluations are recorded, uncalibrated panels are labeled as such, and estimates carry a fragility index. The state of the evidence stays attached to the result.
- Modest requirements. The audits can be run with modest compute, and local open-weight models let restricted text stay on the machine where it is held.
Point-and-click
Three of the packages ship a Shiny front end, so the same workflows run without writing code. Keys are read from environment variables, never pasted into the app, and a deterministic demo mode runs offline.
- Content analysis.
LLMRcontent::run_content_studio()-- build a codebook, seal a gold set, run a coding tournament, and read the validation report. - Silicon surveys.
LLMRpanel::run_panel_studio()-- draw a persona panel from margins, administer Likert and choice items, and read the calibration banner. - Focus groups.
FocusGroup::run_focus_studio()-- run a moderated session, analyze a transcript, or run a continuation experiment that perturbs one turn and compares the next.
All three are built on LLMR.shiny, a shared substrate that supplies the provider sidebar, key handling, cost accounting, and the offline demo mode.
Install
install.packages("LLMR") # CRAN
# the rest, from GitHub:
remotes::install_github("asanaei/LLMRAgent")
remotes::install_github("asanaei/LLMRcontent")
remotes::install_github("asanaei/LLMRpanel")
remotes::install_github("asanaei/FocusGroup")
# the GUIs ship inside the packages above (run_*_studio()); they share
# one substrate, installed alongside them:
remotes::install_github("asanaei/LLMR.shiny")