Survey and experiment pilots with LLMRpanel • LLMRpanel

LLMRpanel administers survey and experimental instruments to panels of language model personas. Use it to pretest questions, pilot conjoint designs, calculate sample sizes from pilot dispersion, or measure responses from a configured model. panel_benchmark() compares closed-item response shares with benchmark shares supplied by the user. It records deviations by item and response, nonresponse, and the number of closed items covered.

Set RUN_LIVE to TRUE to evaluate the provider calls below. The default FALSE setting lets package and site builds run without provider credentials or charges.

What silicon panels are for

Instrument pretesting. Administer draft items and inspect unmatched replies and first-option sensitivity.
Design piloting. Run conjoint tasks and estimate response dispersion before planning human data collection.
Model measurement. Compare response distributions across personas, item orders, option orders, or model configurations.

Panels and instruments

library(LLMRpanel)

panel_from_margins() samples attribute values from the supplied marginal distributions. set.seed() makes this draw reproducible.

set.seed(110)
panel = panel_from_margins(
  list(
    age = c("18 to 34" = .30, "35 to 64" = .45, "65 plus" = .25),
    party = c(left = .45, right = .45, independent = .10)
  ),
  n = 12,
  persona_template = "A {age} year old voter who leans {party}."
)
panel

instrument = panel_instrument(list(
  item_likert("wk4", "A four day work week would benefit society."),
  item_choice(
    "fund",
    "Which should the city fund first?",
    c("public transit", "road repair")
  ),
  item_open("why", "In one sentence, why?")
))
instrument

Margins are useful when targets are published as tables. When microdata is available, panel_from_data() is the joint distribution counterpart. It draws personas from observed rows and therefore preserves relationships among attributes rather than sampling each margin independently. LLMR::report() identifies whether a panel came from supplied margins, microdata rows, or supplied personas.

panel_administer() sends each item to each persona. It returns a panel_responses object with response rows in $data and the panel, instrument, benchmark record, and token usage in separate components. It randomizes item and option order by default and records item_position and option_order in $data. The example repeats the administration with a second model configuration so response patterns can be compared across models.

cfg = LLMR::llm_config("groq", "openai/gpt-oss-20b", temperature = 0.8)
cfg_qwen = LLMR::llm_config("groq", "qwen/qwen3-32b", temperature = 0.8)

resp = panel_administer(panel, instrument, cfg)
resp
resp$data
panel_bias_audit(resp)
LLMR::diagnostics(resp)

resp_qwen = panel_administer(panel, instrument, cfg_qwen)
panel_bias_audit(resp_qwen)

Compare responses with a benchmark

panel_benchmark() compares valid model response shares with benchmark shares for matching item-response pairs. It also records benchmark coverage and item-level nonresponse in $benchmark. Before a benchmark is attached, response shares describe the configured model under the supplied personas, not a human population. bench_fund supplies shares for one closed item.

bench_fund = data.frame(
  item_id = rep("fund", 2),
  response = c("public transit", "road repair"),
  share = c(0.41, 0.59)
)

resp_partial = panel_benchmark(
  resp,
  bench_fund,
  benchmark_name = "toy city survey"
)
resp_partial

bench_fund covers one of the instrument’s two closed items. bench_all adds shares for wk4 and covers both.

bench_all = rbind(
  bench_fund,
  data.frame(
    item_id = rep("wk4", 5),
    response = c(
      "strongly disagree",
      "disagree",
      "neutral",
      "agree",
      "strongly agree"
    ),
    share = c(.05, .20, .25, .35, .15)
  )
)

resp = panel_benchmark(
  resp,
  bench_all,
  benchmark_name = "toy city survey"
)
resp
LLMR::report(resp)
resp$benchmark$nonresponse

resp$benchmark$nonresponse gives the missing response proportion for each closed item. The comparison shares use nonmissing responses as their denominator.

Conjoint designs

conjoint_design() uses R’s random-number generator to construct a classed design list. Its $profiles field contains the initial profile table, and its $attributes field contains the attribute universe. panel_administer() draws the profiles each respondent sees. Set a seed before administration to reproduce those respondent-level draws.

set.seed(110)
design = conjoint_design(
  list(
    price = c("low", "high"),
    origin = c("domestic", "imported")
  ),
  n_tasks = 4
)
design
design$profiles
design$attributes

conjoint_design() attempts to use distinct profiles within each task in $profiles and warns when the attribute space cannot supply them. conjoint_instrument() creates one forced-choice item per task. Administration renders a fresh draw for each respondent and records it with the response. conjoint_amce() estimates from those recorded profiles relative to the first level of each attribute and calculates standard errors clustered by persona.

cj_instr = conjoint_instrument(design, "Which product would you buy?")
cj = panel_administer(panel, cj_instr, cfg)
conjoint_amce(cj)

conjoint_amce() returns a classed result with one row for each observed attribute level. Baseline levels have estimate 0 and missing standard errors. Other rows contain the estimated contrast and 95 percent interval. Run counts remain in separate columns.

Two-arm sample sizes

panel_power() computes two-arm sample sizes from a minimum detectable effect and pilot dispersion. It uses the standard deviation of Likert scores. For choice items, it uses a named focal response or the modal response share. The result has one row per analyzed closed item.

panel_power(resp, effect = c(wk4 = 0.4, fund = 0.15))

Request counts and model choice

panel_administer() makes one request per persona-item pair. Item and option randomization do not add requests. Provider prices and prompt and response lengths determine cost. A versioned local model can support later reruns when hosted endpoints change. The $data field retains response_text, response_id, success, model, and provider, including when a reply cannot be matched to a closed-item option. finish_reason is retained when the runner supplies it.

Relations

LLMR supplies provider configuration and execution. LLMRcontent provides codebook-based text annotation and validation. LLMRagent provides agent experiments. LLMRpanel contains panel constructors, instruments, administration, and response summaries.