2-Stage surveys for demonstration of freedom
# Calculate sample sizes for 2-stage freedom survey where individual cluster details are NOT available

### Inputs

### Outputs

Calculate least-cost sample sizes for 2-stage surveys for demonstrating disease freedom, where cluster sizes are unknown. This analysis calculates the number of clusters and the number of units within each cluster to be tested to provide a specified system sensitivity (probability of detecting disease) for the given unit and cluster-level design prevalences and test sensitivity, where actual cluster sizes are unknown. Test specificity is assumed to be 100% (or follow-up testing of any positive will be undertaken to confirm or exclude disease).

Sample sizes are optimised to minimise overall cost for given cluster and unit-level testing costs. A maximum sample size per cluster must be specified and either the number of cluster in the population or a maximum number of clusters to be tested must be specified.

Numbers of units to test in each cluster are calculated using assumed binomial sampling (sample size is small relative to cluster sizes), while numbers of clusters to test are calculated using the hypergeometric distribution approximation (sampling without replacement) if the number of clusters in the population is specified or assuming binomial sampling if not.

Design prevalence (specified level of disease to be detected) must be specified at both unit and cluster levels. Design prevalence can be specified as either:

- - a proportion of the population infected; or
- - a specific (integer) number of clusters infected (for cluster-level prevalence only and only if the number of clusters in the population is specified).

Inputs required include:

- unit-level design prevalence (as a proportion only);
- cluster-level design prevalence as either a proportion or an integer number of clusters;
- the estimated test sensitivity;
- the relative (or actual) cost of testing at both cluster and unit levels;
- the target system sensitivity (SSe) which is the probability of detecting disease if it is present at the specified design prevalences;
- the maximum sample size to be tested per clusters; and
- The number of clusters in the population
**OR**the maximum number of clusters to be tested.

Outputs from the analysis include:

- A summary of the total numbers of clusters and units to be sampled, target number of units to test per cluster, estimated SeH per cluster and the achieved SSe;
- A summary of numbers of clusters to be tested and corresponding numbers of units to test in each cluster, the estimated SeH and the relative cost for each option; and
- An excel spreadsheet and graph of the summary results.

If it is not possible to achieve the desired system sensitivity by testing the specified maximum number of units in all (or the specified maximum number) of the clusters, a message will be returned, along with a summary of the achieved mean SeH and SSe if the maximum numbers of units and clusters were tested.