# User guide

## Simulate sampling for variable pool sizes

This utility was developed as an additional tool to help in the evaluation of the validity and precision of different pooling strategies for variable pool sizes. It simulates sampling and prevalence estimation for up to 6 different pooling strategies for and up to 5 different pool sizes for each strategy. Simulations assume perfect test sensitivity and specificity and the specified assumed prevalence in the population. The program runs multiple iterations of sampling and estimation and calculates the mean prevalence and confidence limits for the specified level of confidence across all iterations and estimates the level of bias in the prevalence estimates.

For each pooling strategy, the program simulates sampling, pooling and testing of individuals from an infinite population with the specified prevalence, using a test of the specified true sensitivity and specificity. Sampling and testing is repeated for the specified number of iterations for each strategy and the prevalence, confidence interval width and variance are estimated for each iteration using the selected method and assumed values of 100% for both sensitivity and specificity. The mean prevalence, bias, confidence interval width and variance are calculated across all iterations for each strategy, where mean bias is the mean prevalence estimate less the true (design) prevalence for the population. Mean square error (mean variance plus square of mean bias) is also calculated, and the magnitude of the mean bias is also calculated as proportions of the mean estimated prevalence, the true (design) prevalence and the mean square error.

The program also allows for the assumed values of 100% for test sensitivity and specificity to be incorrect, allowing assessment of the potential impact of inaccurate estimates on the resulting prevalence estimate.

Required inputs for this program are:

- assumed true prevalence of infection - between 0 and 1;
- true test sensitivity and specificity - between 0 and 1;
- the desired level of confidence - between 0 and 1;
- the number of iterations to simulate - a positive integer; and
- the size and number of pools to be tested for each strategy to be simulated - positive integers.

Outputs are summarised across all iterations for each strategy entered and presented in a summary
table. The main outputs are:

- mean prevalence;
- minimum and maximum prevalence estimates;
- mean bias in the estimated prevalence;
- mean confidence interval width;
- mean standard error of the estimated prevalence;
- mean squared error of the estimated prevalence (mean variance plus the square of the mean bias);
- relative bias as a proportion of the mean estimated (apparent) prevalence (AP);
- relative bias as a proportion of the specified design (true) prevalence (TP);
- squared mean bias as a proportion of the mean squared error;
- the proportion of 'valid' estimates, where the confidence interval for the estimated prevalence contains the true (design) prevalence.
- detailed results for all iterations for each strategy (download as a text file by clicking on the appropriate icon in the summary results table and
- histogram of the distribution of prevalence estimates (view or download by clicking on the appropriate icon in the summary results table.

It is important to enter pool sizes and associated numbers of pools tested from the top of the table. You must enter at least one valid pool size and number of pools for Strategy 1. All values must be positive integers (>0). Any column in the input table for each strategy that includes an invalid value will be ignored, as will any subsequent values for that stratgey. If the first pool size or number of pools for any strategy is invalid that strategy and any subsequent strategies will be ignored.

This analysis may take several minutes to complete, depending on the number of strategies and the number of iterations required.