Plan the size of a gold set — gold

Answers, by simulation, the budgeting question every project starts with: how many human-labeled units do I need so that the agreement estimate (proportion of model labels matching gold) has a confidence interval no wider than ci_width? The simulation draws from a binomial at the anticipated agreement level, which is adequate for planning; report the realized interval from validate_protocol() in the paper.

Usage

gold_size(
  expected_agreement = 0.85,
  ci_width = 0.1,
  conf = 0.95,
  n_grid = c(50, 100, 200, 300, 500, 800),
  sims = 2000
)

Arguments

expected_agreement: Anticipated model-gold agreement (default 0.85).
ci_width: Target total width of the 95% interval (default 0.10).
conf: Confidence level (default 0.95).
n_grid: Candidate sizes to evaluate.
sims: Monte Carlo draws per candidate size.

Value

A gold_size object with recommended_size and a candidates tibble containing n, mean_ci_width, and meets_target.

Examples

set.seed(110)
gold_size(expected_agreement = 0.85, ci_width = 0.10)