In recent years, in an effort to avoid the degradation of instruction and inflation of test scores that often occurred when educators were held accountable for scores on multiple-choice tests, policymakers have experimented with accountability systems based on performance assessments. The Kentucky Instructional Results Information System (KIRIS), which rewarded or sanctioned schools largely on the basis of changes in scores on a complex, partially performance-based assessment, was an archetype of this wave of reform. It is not a given, however, that performance assessment can avoid the inflation of scores that arises when teachers and students focus too narrowly on the content of the assessment used for accountability rather than focusing on the broad domains of achievement the assessment is intended to measure. Accordingly, this study evaluated the extent to which the large performance gains shown on KIRIS represented real improvements in student learning rather than inflation of scores. External evidence of validity--that is, comparisons to other test data--suggests that KIRIS gains were substantially inflated. Even though KIRIS was designed partially to reflect the frameworks of the National Assessment of Educational Progress (NAEP), large KIRIS gains in fourth-grade reading from 1992 to 1994 had no echo in NAEP scores. Large KIRIS gains in mathematics from 1992 to 1994 in the fourth and eighth grades did have some echo in NAEP scores, but Kentucky's NAEP gains were roughly one-fourth as large as the KIRIS gains and were typical of gains shown in other states. The large gains high-school students showed on KIRIS in mathematics and reading were not reflected in their scores on the American College Testing (ACT) college-admissions tests. KIRIS science gains were accompanied by ACT gains only one-fifth as large. Internal evidence of validity--that is, evidence based on patterns within the KIRIS data themselves--was more ambiguous but also provided some warning of likely inflation, particularly in mathematics. For example, schools that showed large gains on KIRIS also tended to show larger than average discrepancies in performance between new and reused test items, suggesting that teachers had coached students narrowly on the content of previous tests. The findings of this study indicate that inflation of scores remains a risk in assessment-based accountability systems even when they rely on test formats other than multiple choice. There is a clear need to evaluate the results and effects of assessment-based accountability systems, and better methods for evaluating the validity of gains need to be developed.
In an effort to avoid the degradation of instruction and inflation of test scores that often occurred when educators were held accountable for scores on multiple-choice tests. The findings of this study indicate that inflation of scores remains a risk in assessment-based accountability systems even when they rely on test formats other than multiple choice.