# Epitools - Glossary

## Accuracy

The degree to which a measurement, or an estimate based on measurements, represents the true value of the attribute that is being measured. (See also Precision and Validity which are the two components of "Accuracy")

## Alpha and Beta parameters

Two parameters used to define the Beta probability distribution. The mean value of the distribution can be calculated as alpha/(alpha+beta) and the mode is (alpha-1)/(alpha+beta-2).

## Asymptotic confidence limits

Confidence limits calculated using large-sample theory and assuming a normal approximation of the sampling distribution. Asymptotic confidence limits are symmetrical and may be less than zero or greater than unity if the true proportion is close to these values.

## Bayesian method

A statistical method based on Bayes' theorem. Used to calculate the conditional probability of an event given assumed prior knowledge. Prior estimates of probability are updated based on new data. A common application of Bayesian methods is the calculation of the predictive value of a test based on assumed values for prevalence and test sensitivity and specificity and test result.

## Beta distribution

Beta distributions are a type of probability distribution that is commonly used to describe uncertainty about the true value of a proportion, such as sensitivity, specificity or prevalence. They are appropriate distributions to express uncertainty about the prior values for prevalence, sensitivity or specificity in the Gibbs sampler ( Joseph et al., 1995; Vose, 2000). When used for this purpose, the Beta distribution can be defined by the two parameters, alpha and beta (written as Beta(alpha, beta)), with alpha = x + 1 and beta = n - x + 1, where x is the number of positive events out of n trials. As n increases, the degree of uncertainty (the width of the distribution) about the estimated proportion (x/n) decreases.

## Bias

Any effect at any stage of an investigation tending to produce results that depart systematically from the true values i.e. a systematic error. (See also Random Error)

## Binomial distribution

The binomial distribution - Binomial(n, p) - is a probability distribution of the number of successes that occur in n independent trials, where the probability of success at any trial is p, and the trials are independent (p remains constant). The mean of the distribution is np.

## Confidence & Probability limits

Confidence limits are the upper and lower end-points of an interval around a parameter estimate, such that if an experiment was repeated an infinite number of times, in the specified percentage (usually 95% or 99%) of trials the interval generated would contain the true value of the parameter. Confidence limits may be calculated using asymptotic (normal approximation) or exact methods.

Probabilty (or credibility) limits are the upper and lower end-points of the interval that has a specified probability (eg 95% or 99%) of containing the true value of a population parameter, such as a mean or proportion. Usually applied instead of confidence limits when Bayesian methods are used.

## Confidence level

The probability of accepting the null hypothesis when it is true - for example the probability that test results will detect disease when the true prevalence is greater than or equal to the specified design prevalence.

## Design (target) prevalence

A fixed value for prevalence used for testing the null hypothesis that the population is infected at a prevalence equal to or greater than the design prevalence. If all samples tested are negative, the null hypothesis is rejected and the prevalence is assumed to be less than the design prevalence (or 0). Alternatively, the assumed value for the true prevalence used in simulating sampling for estimation of prevalence.

This is the number of initial iterations from the Gibbs sampler that are discarded to allow for convergence of the model on the true value(s) for the parameter(s) of interest.

## Exact confidence limits

Confidence limits calculated using an appropriate probability distribution (usually the binomial distribution) to arrive at an exact value. Exact confidence limits are asymmetrical and cannot be less than zero or greater than unity.

## False negatives (FN)

The number of individuals with the characteristic of interest (e.g. truly infected) that have a negative test result.

## Gibbs sampler

A Gibbs sampler is a Bayesian method which uses Markov Chain Monte Carlo simulation to derive posterior probability distributions that best fit given prior distributions and experimental data. The gibbs sampler is run for many thousand iterations to allow the posterior parameter estimates to converge on the true values.

## Herd-Sensitivity (HSe)

The probability that an infected herd will give a positive result to a particular testing protocol, given that it is infected at a prevalence equal to or greater than the design prevalence.

## Herd-Specificity (HSp)

The probability that an uninfected herd will give a negative result to a particular testing protocol.

## Iterations

The total number of times the Gibbs sampler model is repeated to generate probability distributions for the parameter(s) of interest. For simulations to estimate bias, it is the number of model runs (simulations) used to estimate the mean prevalence and bias.

## Lower Confidence (Probability) limits

The lower limit of the specified confidence or probability interval.

## LR for negative (LRN)

The odds of a negative test result in diseased vs disease free individuals and can be calculated as (1-Sensitivity)/Specificity.

## LR for positive (LRP)

The odds of a positive test result in diseased vs disease free individuals and can be calculated as Sensitivity/(1-Specificity).

## Maximum

The maximum value of the posterior probability distribution for the parameter of interest.

## Mean

The arithmetic mean value of the posterior probability distribution for the parameter of interest.

## Mean Prevalence

The arithmetic mean of the estimated prevalence across all iterations for each strategy.

## Minimum Prevalence

The minimum value of the estimated prevalence across all iterations for each strategy.

## Maximum Prevalence

The maximum value of the estimated prevalence across all iterations for each strategy.

## Mean Bias

The arithmetic mean of the difference between the estimated prevalence and the assumed (design) prevalence across all iterations for each strategy (or the difference between the mean prevalence and the assumed (design) prevalence.

## Mean CI width

The arithmetic mean of the difference between the upper and lower confidence limits across all iterations for each strategy. For fixed pool sizes and perfect tests or tests of known sensitivity and specificity, exact binomial confidence limits are used. For fixed pool sizes and tests of uncertain sensitivity and specificity, asymptotic confidence limits are used.

## Mean Standard Error

The arithmetic mean of the standard errors of the prevalence estimates across all iterations for each strategy. Mean standard error is not available for variable pool size simulations.

## Mean square error (MSE)

The mean variance (the mean of the squares of the standard errors) plus the square of the mean bias. Mean square error is not available for variable pool size simulations.

## Bias/AP

Bias as a proportion of the apparent prevalence = The mean bias divided by the mean (apparent) prevalence.

## Bias/TP

Bias as a proportion of the true prevalence = The mean bias divided by the assumed (design) true prevalence.

## Bias/MSE

Bias as a proportion of the mean square error = The square of the mean bias divided by the mean square error. Bias/MSE is not available for variable pool size simulations.

## Median

The mid-point value of the posterior probability distribution for the parameter of interest, which is the value where 50% of values are higher and 50% are lower.

## Mode

The assumed most likely value of the for the parameter of interest.

## Minimum

The maximum value of the posterior probability distribution for the parameter of interest.

## Negative predictive value (NPV)

The probability that a test-negative individual is truly free of infection.

## Pooled prevalence

The proportion of individuals that have the characteristic of interest (e.g. infected or diseased), estimated from the testing of pooled samples.

## Pooled testing

Testing undertaken on aggregated (pooled) samples, where each sample tested is representative of a number of individuals.

## Pool size

The number of individuals represented in each pool subjected to pooled testing.

## Positive predictive value (PPV)

The probability that a test-positive individual is truly infected.

## Posterior probability distribution

A probability distribution generated by the Gibbs sampler for the parameter of interest (prevalence, sensitivity, specificity, etc). This is derived as the relative frequency distribution of values for the parameter of interest generated from multiple iterations of the model, after discarding a specified number of iterations to allow for convergence of the model.

## Precision

The inverse of the variance of a parameter estimate - a measure of the repeatability or consistency of the estimate. The quality of being sharply defined or stated, ie. lack of random error. Refers to the ability of a test or measuring device to give consistent results when applied repeatedly. See also validity. A good test is both precise and valid which are the two components of accuracy.

## Prevalence

The proportion of individuals that have the characteristic of interest (e.g. infected or diseased).

## Prior prevalence

The assumed prevalence of infection before taking into account any additional data that may be available for analysis. Prior prevalence is expressed as a Beta probability distribution for Bayesian analyses and can be estimated from pre-existing data or based on expert opinion using the estimated mode and 5% or 95% probability limit. The alpha and beta parameters for the distribution can be calculated using the Beta distribution utility provided.

## Prior sensitivity

The assumed sensitivity of the screening test used before taking into account any additional data that may be available for analysis. Prior sensitivity is expressed as a Beta probability distribution for Bayesian analyses and can be estimated from pre-existing data or based on expert opinion using the estimated mode and 5% or 95% probability limit. The alpha and beta parameters for the distribution can be calculated using the Beta distribution utility provided.

## Prior Specificity

The assumed specificity of the screening test used before taking into account any additional data that may be available for analysis. Prior specificity is expressed as a Beta probability distribution for Bayesian analyses and can be estimated from pre-existing data or based on expert opinion using the estimated mode and 5% or 95% probability limit. The alpha and beta parameters for the distribution can be calculated using the Beta distribution utility provided.

## Proportion valid

The proportion of iterations in which the confidence interval for the estimated prevalence contains the true (design) prevalence value.

## Repeatability

The ability of a test to give consistent results in repeated tests. See also precision .

## Sample size for Se and Sp estimation

The number of individuals used in previous trials to estimate the sensitivity or specificity of the test being used. The larger the sample size the more precise the estimate and hence the less uncertainty in the resulting estimates of sensitivity, specificity and prevalence.

## Sensitivity (Se)

The estimated sensitivity (synonym: True Positive Rate) of a diagnostic test is the estimated (or assumed) proportion of animals with the disease (or infection) of interest which test positive. It is a measure of the probability that a diseased individual will be correctly identified by the test. Sometimes called "population sensitivity" to distinguish from "analytical sensitivity".

### Important note

For pooled testing, sensitivity is estimated at the pool level, so that in this context, sensitivity is the probability that a pool which includes samples from one or more infected individuals will test positive. Pool-level sensitivity is therefore affected by both prevalence and pool size. The higher the prevalence, the more infected individuals that will be represented in individual pools and the more likely a pool is to test positive and therefore the higher the sensitivity. This is in contrast to individual-level sensitivity, which is independent of prevalence. Conversely, the larger the pool size, the greater the dilution of any positive individual samples, potentially reducing sensitivity.

## Specificity (Sp)

The estimated specificity (synonym: True Negative Rate) of a diagnostic test is the estimated (or assumed) proportion of animals without the disease (or infection) of interest which test negative. It is a measure of the probability that an individual without the disease of interest will be correctly identified by the test. Sometimes called "population specificity" to distinguish from "analytical specificity".

### Important note

For pooled testing, specificity is estimated at the pool level, so that in this context, specificity is the probability that a pool which does not include samples from any infected individuals will test positive (false positive). Pool-level specificity can therefore be affected by pool size, due to both a possible increase in the number of false-positive individuals in the pool as pool size increases and the effect of dilution on whether these false-positive individuals are also positive in the pooled test.

## True Sensitivity (Se)

The true sensitivity is the actual proportion of animals without the disease (or infection) of interest which test negative. If the estimated sensitivity differs from the true sensitivity then the resulting prevalence estimates will be biased to a degree depending on the amount of error in the estimate.

## True Specificity (Sp)

The true specificity is the actual proportion of animals with the disease (or infection) of interest which test positive. If the estimated specificity differs from the true specificity then the resulting prevalence estimates will be biased to a degree depending on the amount of error in the estimate.

## Standard deviation (SD)

A standard measure of the variation that exists in a series of values or of a frequency distribution. Calculated as the positive square root of the variance.

## Standard error (SE)

The standard deviation of a parameter estimate. Commonly used to calculate asymptotic confidence limits.

## Testing in parallel

The interpretation of multiple tests where an animal is considered positive if it reacts positively to either or both (or any) of the tests - this increases sensitivity at the expense of specificity.

## Testing in series

The interpretation of multiple tests where an animal must be positive on both (or all if more than 2) tests to be considered positive - this increases specificity at the expense of sensitivity.

## Strategies

The number of alternative pooling strategies evaluated (using simulation) to estimate the precision and bias of the estimated prevalence for each option.

## True positives (TP)

The number of individuals with the characteristic of interest (e.g. truly infected) that have a positive test result.

## Upper Confidence (Probability) limits

The upper limit of the specified confidence or probability interval.

## Variance

A standard measure of the variation that exists in a series of values or of a frequency distribution. Estimated as the sum of the squares of the deviations from the mean value for the variable divided by the number of degrees of freedom (n-1).

## Validity

The extent to which a study or test measures what it sets out to measure ie. lack of systematic error or bias. See also precision. A good test is both precise and valid which are the two components of accuracy.

## Y1, Y2, Y3, Y4

The true number of animals with the characteristic of interest (infection or disease) in each of the cells (a=++, b=+-, c=-+, d=--, respectively) of the 2-by-2 table describing the comparison of test results for two tests used concurrently on a sample from the population. Used in the Bayesian estimation of prevalence using two tests - starting values for each cell are required inputs and probability distributions for the true values are produced as outputs from the model.

## a (T1+/T2+)

The number of individuals positive to both tests when two tests are applied concurrently to a sample of individuals from a population, as shown by a in the table below.

 Test 2: Test 1: +ve -ve +ve: a b -ve: c d

## b (T1+/T2-)

The number of individuals positive to Test 1 and negative to Test 2 when two tests are applied concurrently to a sample of individuals from a population, as shown by b in the table below.

 Test 2: Test 1: +ve -ve +ve: a b -ve: c d

## c (T1-/T2+)

The number of individuals negative to Test 1 and positive to Test 2 when two tests are applied concurrently to a sample of individuals from a population, as shown by c in the table below.

 Test 2: Test 1: +ve -ve +ve: a b -ve: c d