# 16 - Simulate sampling for variable pool sizes

This utility was developed as an additional tool to help in the evaluation of the validity and precision of different pooling strategies for variable pool sizes. It simulates sampling and prevalence estimation for up to 6 different pooling strategies for and up to 5 different pool sizes for each strategy. Simulations assume perfect test sensitivity and specificity and the specified assumed prevalence in the population. The program runs multiple iterations of sampling and estimation and calculates the mean prevalence and confidence limits for the specified level of confidence across all iterations and estimates the level of bias in the prevalence estimates.

For each pooling strategy, the program simulates sampling, pooling and testing of individuals from an infinite population with the specified prevalence, using a test of the specified true sensitivity and specificity. Sampling and testing is repeated for the specified number of iterations for each strategy and the prevalence, confidence interval width and variance are estimated for each iteration using the selected method and assumed values of 100% for both sensitivity and specificity. The mean prevalence, bias, confidence interval width and variance are calculated across all iterations for each strategy, where mean bias is the mean prevalence estimate less the true (design) prevalence for the population. Mean square error (mean variance plus square of mean bias) is also calculated, and the magnitude of the mean bias is also calculated as proportions of the mean estimated prevalence, the true (design) prevalence and the mean square error.

The program also allows for the assumed values of 100% for test sensitivity and specificity to be incorrect, allowing assessment of the potential impact of inaccurate estimates on the resulting prevalence estimate.

Required inputs for this program are:

• assumed true prevalence of infection - between 0 and 1;
• true test sensitivity and specificity - between 0 and 1;
• the desired level of confidence - between 0 and 1;
• the number of iterations to simulate - a positive integer; and
• the size and number of pools to be tested for each strategy to be simulated - positive integers.

Outputs are summarised across all iterations for each strategy entered and presented in a summary table. The main outputs are:

It is important to enter pool sizes and associated numbers of pools tested from the top of the table. You must enter at least one valid pool size and number of pools for Strategy 1. All values must be positive integers (>0). Any column in the input table for each strategy that includes an invalid value will be ignored, as will any subsequent values for that stratgey. If the first pool size or number of pools for any strategy is invalid that strategy and any subsequent strategies will be ignored.

This analysis may take several minutes to complete, depending on the number of strategies and the number of iterations required.

Contents
1 Introduction
2 Overview
3 Bayesian vs Frequentist methods
4 Fixed pool size and perfect tests
5 Fixed pool size and known Se & Sp
6 Fixed pool size and uncertain Se & Sp
7 Variable pool size and perfect tests
8 Pooled prevalence using a Gibbs sampler
9 True prevalence using one test
10 Estimated true prevalence using two tests with a Gibbs sampler
11 Estimation of parameters for prior Beta distributions
12 Sample size for fixed pool size and perfect test
13 Sample size for fixed pool size and known test sensitivity and specificity
14 Sample size for fixed pool size and uncertain test sensitivity and specificity
15 Simulate sampling for fixed pool size
16 Simulate sampling for variable pool sizes
17 Important Assumptions
18 Pooled prevalence estimates are biased!