2-Stage surveys for demonstration of freedom
# Calculate sample sizes for 2-stage freedom survey where individual cluster details are available

### Inputs

**Note:** Any clusters with a cluster size < 1 or where clusters size is missing are excluded from the calculations.
### Outputs

Calculate least-cost sample sizes for 2-stage surveys for demonstrating disease freedom. This analysis calculates the number of clusters and the number of units within each cluster to be tested to provide a specified system sensitivity (probability of detecting disease) for the given unit and cluster-level design prevalences and test sensitivity. Calculations are based on actual cluster sizes provided (for the entire population) and a list of randomly selected clusters, along with the number of units to sample for each selected cluster is included in the outputs. Test specificity is assumed to be 100% (or follow-up testing of any positive will be undertaken to confirm or exclude disease).

Sample sizes are optimised to minimise overall cost for given cluster and unit-level testing costs. A maximum sample size per cluster can be specified, if desired and calculations can be specified to ensure either a fixed sample size per cluster or a fixed (minimum) cluster sensitivity.

Sample sizes are calculated using the hypergeometric probability approximation (assuming sampling without replacement).

Design prevalence (specified level of disease to be detected) must be specified at both unit and cluster levels. Design prevalence can be specified as either:

- a proportion of the population infected; or
- a specific (integer) number of clusters or units (within clusters) infected.

Inputs required include:

- unit-level design prevalence as either a proportion or an integer number of units;
- cluster-level design prevalence as either a proportion or an integer number of clusters;
- the estimated test sensitivity;
- the relative cost of testing at both cluster and unit levels;
- the target system sensitivity (SSe) which is the probability of detecting disease if it is present at the specified design prevalences;
- an optional maximum sample size to be tested per cluster;
- whether calculations are to be based on maintaining a fixed sample size per cluster or a fixed (minimum) cluster sensitivity (SeH);
- sampling frame data for
**all**clusters in the population, including cluster id (labelled "ClusterID") and cluster size (labelled "ClusterSize").

Outputs from the analysis include:

- A summary of the total numbers of clusters and units to be sampled, target number of units to test per selected cluster, mean SeH and achieved SSe;
- A list of clusters randomly selected for testing, the number of units to be tested for each cluster and the corresponding SeH;
- A graph of required numbers of clusters to test, SeH and relative costs for varying numbers of units tested per cluster; and
- An excel spreadsheet of the summary results and cluster list.

If it is not possible to achieve the desired system sensitivity by testing all (or the specified maximum number of) units in all of the clusters, a message will be returned, along with a summary of the achieved mean SeH and SSe if all units were tested. In this case a list of all clusters, and the SeH achieved if all or the maximum number of units were tested.