Analysis -

Analysis

This project used the following tools for multivariate analysis using “R” and Systat software programs for data preparation, ordination and statistical analyses[1].

[1] The R Project for Statistical Computing URL: http://www.R-project.org
Systat Software Company. URL: http://www.systat.com

1. Data preparation

As with much ecological data, some of the datasets used in this analysis did not conform to the normal, or Gaussian, distribution required for lmany statistical analysis. To the extent possible, analyses were used that did not require this distribution. For linear regression, data were transformed using log or square root transformation, prior to analysis (Table 4).

Data on environmental variables was collected at the haypile scale and averaged up to the patch scale, where this was considered appropriate.

Table 4. Summary of data transformations

2. Ordination of response and predictor variables

Ordination techniques are used in multivariate analysis to indicate the amount of similarity between the observations for different variables (Manly 1994). This method is useful to reduce complexity when analyzing multiple response and predictor variables.

Ordination was used in this project as a first step to identify the primary relationships amongst response and predictor variables, prior to conducting more in-depth statistical analyses. At all three spatial scales, response and predictor variables were ordinated using non-metric multi-dimensional scaling (NMDS) in R. NMDS has the advantage that it is robust to data that is not normally distributed (Manly 1994). These analyses used indirect gradient analysis, ordinating the population (response) variables and overlaying the output with the spatial , environmental and climate variables. The distance measure used in these analyses was Mahalanobis , which is appropriate for frequency data and ranked highly in a rank index test of appropriate distance measures. This distance measure also provided a more readily interpretable visual output.

3. Statistical analyses

Three types of statistical analysis were used in this project:

1.     Multivariate regression

The multivariate regression tree (MRT) is a statistical tool that describes the relationship among multiple predictor and response variables through repeated splitting of data to form a ‘tree’ comprised of nodes and clusters in which data is grouped to minimize variability (De'ath 2002). MRT was used in analyses at the patch and study area scales to explore in more detail, the relationships among variables identified as relational using indirect gradient analysis. A benefit of MRT is that it shows where the optimal splits occur in the data to minimize variability. A limitation of this tool is that it is constrained to only respond to the variables used as inputs i.e., it cannot identify whether there are other (unidentified) factors contributing to overall variability in the data.

2.     Linear regression

Standard linear regression was used to estimate the strength of correlation between individual population and spatial or environmental variables at the patch scale. The magnitude and sign of the regression coefficient (slope), and the calculated r2 and p-values, define the nature of each relationship and show whether it is statistically significant. This method was used here to look in more detail at relationships identified through indirect gradient analysis and MRT.

3.     Generalized linear modelling

Generalized linear models (GLMs) are useful to assess data that is non-normal, for example count data. GLM was used in this project to assess the correlation between response and predictor variables at the haypile scale, where the population variables are based on counts of the number of times a haypile is occupied.

4. A comment on sample size and statistical power

These analyses were carried out on a subset of the total patches and data available to the project. As a result, there are only 13 samples for each variable at the patch scale (based on the number of patches for which there was continuous data) and only 11 samples of each variable at the study area scale (based on the number of years sampled). In general, small sample sizes have low statistical power, i.e., there is a higher probability of arriving at an incorrect conclusion and the results of these analyses should be treated with caution.