Skip to contents

Answers, by simulation, the budgeting question every project starts with: how many human-labeled units do I need so that the agreement estimate (proportion of model labels matching gold) has a confidence interval no wider than ci_width? The simulation draws from a binomial at the anticipated agreement level, which is adequate for planning; report the realized interval from validate_protocol() in the paper.

Usage

gold_size(
  expected_agreement = 0.85,
  ci_width = 0.1,
  conf = 0.95,
  n_grid = c(50, 100, 200, 300, 500, 800),
  sims = 2000
)

Arguments

expected_agreement

Anticipated model-gold agreement (default 0.85).

ci_width

Target total width of the 95% interval (default 0.10).

conf

Confidence level (default 0.95).

n_grid

Candidate sizes to evaluate.

sims

Monte Carlo draws per candidate size.

Value

The smallest n in n_grid meeting the target, with the simulated widths as an attribute.

Examples

set.seed(110)
gold_size(expected_agreement = 0.85, ci_width = 0.10)