Answers, by simulation, the budgeting question every project starts with:
how many human-labeled units do I need so that the agreement estimate
(proportion of model labels matching gold) has a confidence interval no
wider than ci_width? The simulation draws from a binomial at the
anticipated agreement level, which is adequate for planning; report the
realized interval from validate_protocol() in the paper.
Usage
gold_size(
expected_agreement = 0.85,
ci_width = 0.1,
conf = 0.95,
n_grid = c(50, 100, 200, 300, 500, 800),
sims = 2000
)Examples
set.seed(110)
gold_size(expected_agreement = 0.85, ci_width = 0.10)