Reaffirming Construct, Convergent And Predictive Validity Between Objective Tests And Holistically Scored Essays

College Student Journal, March, 1999 by Ronald Evans, L. Carolyn Pearson, Michael Bundrick

Recent research suggests that the validity of holistically-scored writing may in turn affect both predictive and convergent validities of holistic scores for writing competence as such tests predict external unrelated scores (GPA) and are convergent with other measures of writing competence. To reaffirm convergent validity between objective tests and holistically-scored essays, and to establish predictive validity of both formats for gradepoint average (GPA), 392 test profiles were gathered from 1800 freshmen and sophomore student records. Test scores reflected ETS's Test of Standard Written English, The American College Test English battery, and the CLAST English Language Skills and Essay portions. Against much criticism, this study offers evidence that holistic scoring procedures not only produce construct validity but also correlate significantly (convergent validity) with other measures of writing competence and unrelated measures such as grade point average (predictive validity).

Nearly five decades ago, Educational Testing Service (ETS) began investigating correlations between objective tests and prompted essays as measures of writing competence. Thereafter during the 1960's, ETS launched large-scale essay-scoring as a valid component of the College Level Educational Test (English) (CLEP), the Advanced Placement English Essay (AP), and the English Composition Test (ECT) option of the Scholastic Aptitude Test (SAT). Since that time, holistically-scored essays have become staple components of writing-competency measures with both corporate testers and many states.

Despite its apparent popularity, however, holistic scoring has generated some controversy about its validity among educational researchers in general and composition researchers in particular. Disagreement about holistic scoring's construct validity has evolved due to the complexity of the scoring task which each reader implicitly performs during an essay reading session. Charney (1984) questions the construct validity of holistic scoring because of the "variability of topics and types of discourse, the arbitrariness of the criteria, and the failure of the raters to adhere to the criteria" (cited in Lauer & Asherp, 1984, p 144). Added to this criticism has been the higher cost of holistic-scoring, versus machine-scoring of objective measures, during these cost-conscious times of educational cutbacks.

Hoetker and Brossel (1988) have reported that Florida's College Level Academic Skills Test-Essay (CLAST) controls for topic variability and discourse type by permitting only two prompt choices--one requiring a detached, objective viewpoint, the other a personal narrative viewpoint. Moreover, future essay prompts are extensively field tested and evaluated by veteran readers who select for future use only those essay prompts that receive higher holistic scores and qualitative consensus from the readers. To avoid arbitrariness, ETS constantly revisits and revises its criteria, by gleaning new "rubric" elements from surveys of a national sample of veteran readers as well as from chief readers and scoring-table leaders at every reading site. Finally, inter-rater reliability is constantly monitored by ETS and states like Florida by double reading-scoring each paper, random third reading-scorings, and refereed reading-scorings of any discrepant score (two or more numerical scores apart). The purpose here was to determine if there was statistically significant results between and among readers at all sites and at all sessions.

On the national level, ETS has spent twenty years and huge sums of money to establish convergent validity between their objective tests of writing competence and holistically scored essays. Over the years, both the refinement of technique and the correlation coefficients have improved dramatically between these two most common formats. According to Lauer and Asher (1984):

   In 1977 ... ETS correlated the essay test with the multiple-choice
   components of the (ECT) and found a correlation of .50 and in 1978 of .48.
   The (CLEP) was originally validated during the 1940's by correlating its
   objective scores on usage and sentence correlation with five papers scored
   by five papers scored by five readers. The test achieved a .72 coefficient
   for concurrent validity (p. 142).

Method

To uniquely reaffirm convergent validity between objective tests and holistically-scored essays and to establish predictive validity of both formats for grade-point average (GPA), the researchers of this study selected 392 test profiles from 1800 freshmen and sophomore student records at the University of West Florida in Pensacola, FL. Test scores reflected ETS's Test of Standard Written English (TSWE), The American College Test (ACT-EN) English battery, and the CLAST English Language Skills (ELS) and Essay (ESS) portions.

Results

Table 1 supports the convergent validity between and among the specific test instruments reported in the student records, though not every record contained all tests (TSWE, ACT-EN, CLAST-ELS, and CLAST-ESS). Pearson correlations for all possible pairs of the four measures were statistically significant (p [is less than] .0001), in one sense confirming the convergent validity of these measures. Yet because of the large sample size, such statistical significance does not translate into strong correlations. With correlations ranging from .32 to .59, TSWE correlates best with other test formats including the CLAST essay portion on English Language Skills (CLAST-ELS) and the ACT English battery (ACT-EN), repeating its strong correlations with holistically scored essays reported earlier by ETS (1974). If establishing the predictive validity of TSWE for these other test formats, shared variance would range from 10.2 to 34.8 percent.


 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
Click Here
advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement

Content provided in partnership with Thompson Gale