Validity, reliability and the assessment of engineering education

Journal of Engineering Education, Jul 2002 by Moskal, Barbara M, Leydens, Jon A, Pavelich, Michael J

An Educational Brief

ABSTRACT

Educational measurement represents a field of study that has been intensely researched and that provides a framework for designing assessment programs. The purpose of this paper is to clarify two key measurement concepts, validity and reliability, and to illustrate how these concepts can be used to improve assessment efforts in engineering education.

I. INTRODUCTION

Educational measurement is a field of study that is dedicated to the examination of diverse assessment and evaluation issues. Research in educational measurement provides a potential framework that we in engineering education may use to guide our own assessment and evaluation efforts. The purpose of this paper is to clarify two particularly useful concepts from the educational measurement literature-validity and reliability. By understanding these concepts, we educators will be better equipped to implement rigorous assessment standards in the evaluation of our engineering programs. Although educational measurement has two paradigms, quantitative and qualitative research, this paper only addresses the quantitative perspective.

II. EDUCATIONAL MEAsUREMENT LITERATURE

Experts in educational measurement have argued that the issues of validity and reliability underlie all aspects of assessment instruction [2, 4, 5, 21, 22]. Validity and reliability are also recognized as key concepts in the support materials provided to project evaluators by the National Science Foundation [7, 20]. The sections that follow define the concepts of validity and reliability and provide examples that illustrate these terms within engineering education.

A. Validity

Validation is the process of accumulating evidence that supports the appropriateness of the inferences that are made of responses to an assessment instrument for specified assessment uses. Validity refers to the degree to which the evidence supports that the interpretations are correct and the manner in which the interpretations are used is appropriate [2, 13]. Four types of evidence are commonly examined to support the validity of an assessment instrument: content-related evidence, construct-related evidence, criterion-related evidence and consequence-related evidence.

Content-related evidence refers to the extent to which a student's responses to a given assessment instrument reflect that student's knowledge of the content area of interest. For example, a statics exam that requires knowledge of mathematical concepts beyond the students' current course work unintentionally measures the students' mathematical knowledge (or lack thereof) rather than their understanding of statics. A professor who is interpreting a student's incorrect response may conclude that the student does not have the appropriate knowledge of statics when actually that student does not have an appropriate background in mathematics.

Content-related evidence is also concerned with the extent to which the assessment instrument adequately samples the intended content domain. For example, a mathematics test that primarily includes problems that require integration by substitution would provide inadequate evidence of a student's ability to complete problems that require integration by parts. One manner in which to support the content validity of an assessment is to clearly define the objectives of the assessment before an instrument is developed or selected. These objectives may then be used to verify the appropriateness of a given instrument.

Construct-related evidence refers to the extent to which an assessment instrument elicits evidence that supports the examination of the nature of the students' underlying constructs. Constructs are processes that are internal to an individual. An example of a construct is an individual's reasoning process. Although reasoning occurs inside a person, it may be partially displayed through results and explanations. When the purpose of an assessment is to evaluate reasoning, both the product (i.e., the answer) and the process (i.e., the explanation) should be requested and examined. For example, the flow of electricity through circuits can be modeled using differential equations. A student who derives a correct mathematical model for a given circuit based on the information provided in the problem is displaying a deeper understanding than is a student who is applying a memorized formula. Understanding the process that each student uses to derive the differential equation is essential to making sense of the students' reasoning process, the construct of interest. Other examples of constructs that engineering educators may wish to examine are creativity, writing processes and attitudes. Regardless of the construct, an effort should be made to identify the facets of the construct that may be displayed and that would provide convincing evidence of the students' underlying processes.

Another type of validity evidence is criterion-related evidence, which supports the extent to which the results of an assessment correlate with a current or future event [16]. For example, a common practice in many engineering colleges is to develop a course that mimics the working environment of a practicing engineer [8, 12, 19]. Evaluations of these courses are intended to examine how well prepared students are to function as professional engineers. High scores on the assessment activity should suggest high performance outside the classroom or at the future workplace.


 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement

Content provided in partnership with ProQuest