Not only are the grades valid, both contend, but the reported grade changes between 2010-11 and 2011-12 show real improvements in education performance.
In fact, they show nothing of the kind.
The claims of “improvement” reflect nothing more than the tactic of citing hand-picked positive results as evidence of progress. When one examines grade changes for all New Mexico schools (as this writer has done), a very different picture emerges:
⋄ Of the 821 schools that PED graded for both 2010-11 and 2011-12, 510 schools did not receive the same grade for both years. The grades of 232 schools went up; those of 278 schools went down. The grades of 112 schools changed by two levels or more – e.g., from A to C or F to B.
⋄ Of the 73 schools graded A for 2010-11, only 17 also received an A for 2011-12. The grades of 39 schools initially rated A dropped to B, 15 to C, and 2 to D.
⋄ Within the larger group of 264 schools graded either A or B for 2010-11, only 140 schools also were rated A or B for 2011-12. Of the remaining 124 schools, 88 were downgraded to C, 31 to D, and 5 to F.
⋄ Only 23 of the 88 schools graded F for 2010-11 also received an F for 2011-12 (though they were joined by 45 other schools previously graded B, C or D).
What these results mainly demonstrate is not that schools got better or worse between the two years but rather that the grades are highly unstable and hence of dubious reliability. This finding raises doubts about the soundness of the underlying methodology and points to the need for a full, independent review.
Unfortunately, much of the information needed for a thorough review remains under wraps. PED’s “Technical Guide” is opaque, cryptically written and seriously incomplete. Conspicuously missing, for example, are the numerical results PED statisticians obtained when they applied value-added models to data on test scores and student characteristics.
Those results are essential for assessing the explanatory power of the value-added models and the adequacy of PED’s adjustments for inter-school differences in student characteristics.
Instead of unveiling their methodology, PED officials have claimed absurdly that “only a few people in the world” would be able to understand it. In fact, PED’s approach is neither new nor esoteric.
Value-added models, some considerably more complex than PED’s, have been applied in education for over a decade. PED makes itself seem out of touch when it suggests that its work is too rarefied to be understood by anyone outside its own small coven of statistical wizards.
The limited information now available does suffice, however, to identify many questions that a full review would address. Most concern technical aspects of statistical procedure but others are broader questions about what is being measured.
Why, for example, has PED based grades partly on a proficiency indicator not adjusted for student characteristics, knowing that this will make some schools look like poor performers simply because their students come from less advantaged backgrounds? When and if PED reveals more, additional questions will undoubtedly emerge.
Finally, examining PED’s methodology is important because school grading foreshadows another item on the Martinez/Skandera agenda: the use of value-added models to grade individual teachers.
School-level modeling, its complexities notwithstanding, is simple compared with teacher-level modeling.
Until PED can demonstrate mastery of the former, it should not attempt the latter. The stakes, if it proceeds, will be high: teachers’ jobs and salaries might depend on the results. So the scrutiny now being given to school grades would pale in comparison to what PED can expect when teachers become the statisticians’ targets.
Stephen M. Barro is a retired economist and policy analyst with more than 30 years experience in education research and education policy studies.