Compositional data in community ecology: the paradigm or peril of proportions?
Ecology, April, 1997 by Donald A. Jackson
INTRODUCTION
Ecologists must often analyze data sets comprising samples varying greatly in total species abundance. In this instance species with the greatest abundance in an observation may overwhelm the analysis and subsequent relationships may simply reflect differences in absolute abundance rather than relative abundance. To compensate for this problem, ecologists often choose to convert such data to proportions, percentages, or frequencies by dividing each variable by the total for each observation prior to more detailed analysis. The rationale for this standardization is the desire to compare all samples on a similar scale, thereby "correcting" or removing the influence of overall species abundance. Conceptually, this approach is appealing; however, it is rarely recognized that this standardization will limit the possible range of interspecific relationships as well as the patterns among the samples. Occasionally data are converted to proportions for other reasons. For example, Gates et al. (1983) found that ordinations were easier to interpret and that greater amounts of the total variance could be explained by using proportional data. In general, the implications of this type of standardization and its consequences are not recognized by ecologists. By converting raw data to compositional data (i.e., percentages, proportions, or frequencies), changes in the covariance and correlation structure of the data matrix may lead us to conclude that particular relationships exist when such patterns are predictable artefacts of this type of standardization (Chayes 1971, Aitchison 1986).
The difficulty with using traditional or standard statistical approaches when analyzing such data is that the results obtained from an analysis of the raw data (i.e., termed the "basis" or "normative data" in the literature) and from the "compositional" or "ipsative" data lead to very different interpretations. The raw data may suggest that some variables are uncorrelated with one another, whereas the composition-based analyses may show highly correlated relationships for the same variables. As well, the reverse situation occurs frequently. The difficulty is how to reconcile these divergent results when both are available. Also, it is often the case that only the composition is available (e.g., paleolimnology, toxicology, activity budgets, feeding selectivity). Ideally, what we require is a means of analyzing the data that emphasizes relative, rather than absolute, relationships between variables and provides a similar result regardless of whether the basis or compositional data are analyzed. Such a result would permit us to compare results obtained from studies employing different enumeration methods. For example, it is common in working with zooplankton, pollen, and various other taxa that a specific number of organisms be counted, e.g., a count of 300 individuals, and then the relative proportions of each taxon in an observation be determined prior to statistical analysis. However, other researchers working with these same taxa may choose a different approach, e.g., based on total counts found in a specified volume. If the results are analyzed with traditional statistical approaches, then the interpretations and conclusions may depend predominantly on the method of enumeration and standardization, rather than on any inherent ecological relationships. If we can use alternative methods of analysis, as identified in this paper and the references therein, then we can compare results from different studies without concern for the constraints imposed by differences in the basis and composition, but rather focus on the ecological relationships.
TABLE 1. Means and variances for the basis and composition for
simulated data SIM and lake zooplankton ZOO.
Basis variables for SIM
Statistic S1 S2 S3 S4 S5
Mean 30 60 60 120 120
Variance 16 16 64 64 4096
Basis variables for ZOO
H1 H2 H3 H4 H5
Mean 83.82 33.63 266.6 95.64 43.34
Variance 7687 1211 49110 7971 1837
The underlying principle in using proportions is to understand how one variable responds relative to another when standardized to a common scale. This has led some researchers to propose using ratios as a means of scaling variables (e.g., Mosimann and James 1979, James and McCulloch 1990). Some measure of the magnitude or size of each observation is selected (e.g., total length in morphometrics) and all variables are divided by this measure to scale the variables to a common level, and then, generally, log transformed. (Note that ratio-based analysis is not without controversy, e.g., Atchley et al. 1976, Gibson 1984, Pearson 1897, Rising and Somers 1989, Jackson and Somers 1991). This approach is used as a means of examining the pattern of "relative" covariation between the variables after "standardizing" for the magnitude or size effect.
Most Recent Reference Articles
- ARAB EUROPEAN RELATIONS - Dec 22 - Russia Denies Selling Missile System To Iran
- EGYPT - Dec 29 - Opposition Says Mubarak Blessed Israeli Attacks
- ARAB AFFAIRS - Dec 22 - Syria Will Eventually Move To Direct Talks With Israel
- ARAB AFFAIRS - Dec 30 - GCC Denounces Massacre
- ARAB ISRAELI RELATIONS - Israel Issues An Appeal To Palestinians In Gaza
Most Recent Reference Publications
Most Popular Reference Articles
- Credit card debt on college campuses: causes, consequences, and solutions
- 9 questions to ask your new lover: what you were afraid to ask, but always wanted to know
- How Tyler Perry rose from homelessness to a $5 million mansion
- Rejoice anyway - Zephaniah 3:14-20, Philippians 4:4-7 - Living by the Word - Column
- Living by the word


