Influence of noisy environmental data on canonical correspondence analysis
Ecology, Dec, 1997 by Bruce McCune
INTRODUCTION
Canonical correspondence analysis (CCA; ter Braak 1986, 1994) is unusual among the ordination methods used in community analysis in that the ordination of the community data matrix (by reciprocal averaging; RA or CA) is constrained by a multiple regression on its relationships to environmental variables. Because CCA uses data on environment to structure the community analysis, CCA has been called a method for "direct gradient analysis" (ter Braak 1986). In contrast, performing an ordination on just the community data, then secondarily relating the ordination to the environmental variables, allows an expression of pure community gradients, followed by an independent assessment of the importance of the environmental variables.
CCA is best suited to community data sets where: (1) species responses are unimodal (hump-shaped), and (2) the important underlying environmental variables have been measured. According to ter Braak (1986, 1994), unimodal species responses to environment cause problems for methods assuming linear response curves (such as principal components analysis or redundancy analysis) but cause no problems for CCA. The second condition results from the environmental matrix being used to constrain the ordination results.
CCA is currently one of the most popular ordination techniques in community ecology. Many ecologists use CCA as if it is yet another ordination technique, when in fact they differ in objectives (Okland 1996). CCA is easily misused because it is a relatively complex method and options in the software can strongly affect the meaning of results. Furthermore, the performance of the method has not been adequately explored and documented in the literature.
In particular, a fundamental but poorly understood characteristic of CCA is how it responds to noisy data. The environmental matrices used in community ecology often contain variables that are measured out of convenience rather than selecting the "best" variables from the organismal point of view. In many cases environmental variables are moderately noisy or worse. Because CCA explicitly uses these variables in extracting the most important community gradients, the influence of noisy environmental variables on CCA deserves close scrutiny.
This scrutiny has not been received. Palmer (1993) used simulated community data of known underlying structure and added variables containing random numbers to the environmental matrix. However, he left the [TABULAR DATA FOR TABLE 1 OMITTED] noiseless environmental variables intact. In this case it is not surprising that CCA still managed to extract the major gradients successfully.
In this paper CCA is put to stronger, more realistic tests. In one case, a moderate amount of noise is added to the variables representing the two underlying gradients. In the second case, the environmental matrix is replaced with different numbers of variables containing random numbers. The random variables represent measured environmental variables that are irrelevant to community composition. The third case is perhaps most realistic, in that it combines moderate noise added to the most important underlying environmental variables, along with a number of additional variables composed of random numbers.
BASIC METHOD FOR CCA
The basic method for CCA has been clearly elaborated elsewhere (ter Braak 1986, 1994, Palmer 1993). A key point for this paper, however, is that two sets of site scores are produced.
Assume that data matrix Y contains nonnegative abundances, [y.sub.ij], for i = 1 to n sample units and j = 1 to p species; [y.sub. j] and [y.sub.i ] indicate species totals and sample unit (= site) totals respectively. The environmental matrix Z contains values for n sites by q environmental variables.
At one step in the iterative algorithm, sites scores [Mathematical Expression Omitted] are calculated as weighted averages of species scores, [u.sub.j]. The term formed by the eigenvalue, [Lambda], and a user-selected constant, [Alpha], is a scaling factor (ter Braak 1986):
[Mathematical Expression Omitted]
A different set of site scores ([x.sub.1]) is produced based on weighted least-squares multiple regression of [Mathematical Expression Omitted] on the environmental variables (Z). Weights are [y.sub. i ] in the diagonal of the otherwise empty matrix R. The regression coefficients b are calculated as
b = [(Z[prime]RZ).sup.-1] Z[prime][Rx.sup.*].
New site scores ([x.sub.1]) are calculated as the fitted values from the preceding regression:
x = zb.
These site scores are, therefore, linear combinations of environmental variables. These are "predicted" values produced by the regression equations built into CCA. Thus, [x.sup.*] and x are derived such that the correlation between them is maximized, subject to the constraint that each is orthogonal to all previously extracted axes.
Following Palmer (1993), the site scores produced by the weighted averaging [Mathematical Expression Omitted] will be called the WA scores. The site scores produced as linear combinations ([x.sub.i]) of the environmental variables will be called the LC scores.
Most Recent Reference Articles
Most Recent Reference Publications
Most Popular Reference Articles
- Credit card debt on college campuses: causes, consequences, and solutions
- The Greek chorus, Jimmy the Greek got it wrong but so did his critics - Jimmy Snyder and his views on pro sports and race
- How Tyler Perry rose from homelessness to a $5 million mansion
- 9 questions to ask your new lover: what you were afraid to ask, but always wanted to know
- Living by the word: light the candles



