Featured White Papers
- Hosted CRM buyer's guide (Inside CRM)
- Enterprise PBX buyer's guide (VoIP-News)
- Enterprise PBX comparison guide (VoIP-News)
How not to test mediums : critiquing The Afterlife Experiments - 1
Skeptical Inquirer, Jan-Feb, 2003 by Ray Hyamn
Finally, the sitter designated which of the two transcripts was the one that actually was intended for him or her.
The hypothesis was that if Campbell could truly access information from the sitter's departed acquaintances, this would show up on all three measures. In other words, the sitters would successfully pick their own reading from the two transcripts; they would record significantly more dazzle shots in their own transcripts as compared with the control transcripts; and they would find many more hits and fewer misses in the actual as opposed to the control transcript. Each one of these three predictions failed. Four of the sitters did correctly pick their own transcript, but this is consistent with the chance expectation of three successes. On the two more sensitive measures, there were no significant differences in number of dazzle shots or hits and misses.
The authors admit that for the overall data, "there was no apparent evidence of a reliable anomalous information retrieval effect." So how can they use these results to proclaim a "breathtaking" vindication of their previous findings? This is because, when they looked at the results separately for each sitter, they discovered that in the case of GD, who had been the star sitter in a previous experiment with Campbell, he not only successfully identified his own transcript but also found nine dazzle shots in this transcript and none in the control. The results for the hits and misses were equally striking. He found only a few misses in his own transcript and a large number of misses in the control. He found many hits in his own transcript and not a single one in the control transcript. Given this "unanticipated replication," the authors hail the results as compelling support for their survival hypothesis. However, for anyone trained in statistical inference and experimental methodology, this will appear as just another blatant attempt to snatch victory out of the jaws of defeat. An accepted principle of research methodology is that the reporting of statistical significance from experimental findings derives meaning from the fact that the experimenter specifies in advance which comparisons he or she will test. If the experimenter plans to make many comparisons, then the criteria for statistical significance must be adjusted to take into account that the more comparisons that will be made the more chances there will be to find something "significant" just by chance. In the present case, it was obvious that the planned comparisons involved the overall differences between the ratings of the actual and the control transcripts. The authors do not indicate whether they intended to make adjustments for the fact that they were using three different measures, but, in any case, it does not matter because there were no meaningful differences on any of the three indicators.
Of course, these strictures do not preclude the investigators from noticing unexpected outcomes in their data. Such unplanned outcomes can serve as hypotheses for new experiments. When an experimenter finds unanticipated, but interesting, quirks in the data, he or she cannot draw conclusions until the surprise finding has been cross-validated with new data. The reason for this is simple. Any set of data that is reasonably complex will always, just by chance, display peculiarities. Some statisticians and methodologists do allow testing for unexpected findings by means of "post hoc" tests. Such tests require that the departures be much greater than those needed for planned comparisons before they can be declared "significant." Furthermore, such post hoc tests on specific subparts of the data are typically licensed only when the overall tests are significant, which is not the case for the present situation.