In a recent post, Kevin Drum of Mother Jones discusses his growing skepticism about the research behind market-based education reform, and about the claims that supporters of these policies make. He cites a recent Los Angeles Times article, which discusses how, in 2000, the San Jose Unified School District in California instituted a so-called “high expectations” policy requiring all students to pass the courses necessary to attend state universities. The reported percentage of students passing these courses increased quickly, causing the district and many others to declare the policy a success. In 2005, Los Angeles Unified, the nation’s second largest district, adopted similar requirements.
For its part, the Times performed its own analysis, and found that the San Jose pass rate was actually no higher in 2011 compared with 2000 (actually, slightly lower for some subgroups), and that the district had overstated its early results by classifying students in a misleading manner. Mr. Drum, reviewing these results, concludes: “It turns out it was all a crock.”
In one sense, that’s true – the district seems to have reported misleading data. On the other hand, neither San Jose Unified’s original evidence (with or without the misclassification) nor the Times analysis is anywhere near sufficient for drawing conclusions – “crock”-based or otherwise – about the effects of this policy. This illustrates the deeper problem here, which is less about one “side” or the other misleading with research, but rather something much more difficult to address: Common misconceptions that impede deciphering good evidence from bad.
In the case of San Jose, regardless of how the data are coded or how the results turn out, the whole affair turns on the idea that changes in raw pass rates after the “high expectations” policy’s implementation can actually be used to evaluate its impact (or lack thereof). But that’s not how policy analysis works. It is, at best, informed speculation.
Even if San Jose’s pass rates are flat, as appears to be the case, this policy might very well be working. There is no basis for assuming that simply increasing requirements would, by itself, have anything beyond a modest impact. So, perhaps the effect is small and gradual but meaningful, and improvements are being masked by differences between cohorts of graduates, or by concurrent decreases in effectiveness due to budget cuts or other factors. You just can’t tell by eyeballing simple changes, especially in rates based on dichotmous outcomes. (And, by the way, maybe the policy led to improvements in other outcomes, such as college performance among graduates.)
Conversely, consider this counterfactual: Suppose the district had issued “accurate” data, and the LA Times analysis showed pass rates had increased more quickly than other districts’. Many people would take this as confirmation that the policy was effective, even though, once again, dozens of other factors, school and non-school, artificial and real, in San Jose or statewide, might have contributed to this observed change in raw rates.
These kinds of sloppy inferences play a dominant role in education debates and policy making, and they cripple both processes. Virtually every day, supporters and critics of individuals, policies, governance structures and even entire policy agendas parse mostly-transitory changes in raw test scores or rates as if they’re valid causal evidence, an approach that will, in the words of Kane and Staiger, eventually end up “praising every variant of educational practice.” There’s a reason why people can – and often do – use NAEP or other testing data to “prove” or “disprove” almost anything.
Nobody wins these particular battles. Everyone is firing blanks.
Back to Kevin Drum. He presents a list of a few things that set off his skepticism alarms. Some of them, like sample size and replication, are sensible (though remember that even small samples, or just a few years of data from a single location, can be very useful, so long as you calibrate your conclusions/interpretations accordingly).
His “alarm system” should not have allowed the LA Times analysis to pass through undetected, but his underlying argument – that one must remain “almost boundlessly and annoyingly skeptical” when confronted with evidence – is, in my view, absolutely correct, as regular readers of this blog know very well (especially when it comes to the annoying part).
The inane accusations that this perspective will inevitably evoke -e.g., “protecting the status quo” – should be ignored. Policy makers never have perfect information, and trying new things is a great and necessary part of the process, but assuming that policy changes can do no harm (whether directly or via opportunity costs) is as wrongheaded as assuming they can do no good.
Still, this caution only goes so far. We should always be skeptical. The next, more important step is knowing how to apply and resolve that skepticism.
And this is, needless to say, extraordinarily difficult, even for people who have a research background. There’s a constant barrage of data, reports and papers flying around, and sifting through it with a quality filter, as well as synthesizing large bodies of usually mixed evidence into policy conclusions, are massive challenges. Moreover, we all bring our pre-existing beliefs, as well as other differences, to the table. There are no easy solutions here.
But, one useful first step, at least in education, would be to stop pointing fingers and acknowledge two things. First, neither “side” has anything resembling a monopoly on the misuse of evidence. And, second, such misuse has zero power if enough people can identify it as such.
- Matt Di Carlo
For its part, the Times performed its own analysis, and found that the San Jose pass rate was actually no higher in 2011 compared with 2000 (actually, slightly lower for some subgroups), and that the district had overstated its early results by classifying students in a misleading manner. Mr. Drum, reviewing these results, concludes: “It turns out it was all a crock.”
In one sense, that’s true – the district seems to have reported misleading data. On the other hand, neither San Jose Unified’s original evidence (with or without the misclassification) nor the Times analysis is anywhere near sufficient for drawing conclusions – “crock”-based or otherwise – about the effects of this policy. This illustrates the deeper problem here, which is less about one “side” or the other misleading with research, but rather something much more difficult to address: Common misconceptions that impede deciphering good evidence from bad.
In the case of San Jose, regardless of how the data are coded or how the results turn out, the whole affair turns on the idea that changes in raw pass rates after the “high expectations” policy’s implementation can actually be used to evaluate its impact (or lack thereof). But that’s not how policy analysis works. It is, at best, informed speculation.
Even if San Jose’s pass rates are flat, as appears to be the case, this policy might very well be working. There is no basis for assuming that simply increasing requirements would, by itself, have anything beyond a modest impact. So, perhaps the effect is small and gradual but meaningful, and improvements are being masked by differences between cohorts of graduates, or by concurrent decreases in effectiveness due to budget cuts or other factors. You just can’t tell by eyeballing simple changes, especially in rates based on dichotmous outcomes. (And, by the way, maybe the policy led to improvements in other outcomes, such as college performance among graduates.)
Conversely, consider this counterfactual: Suppose the district had issued “accurate” data, and the LA Times analysis showed pass rates had increased more quickly than other districts’. Many people would take this as confirmation that the policy was effective, even though, once again, dozens of other factors, school and non-school, artificial and real, in San Jose or statewide, might have contributed to this observed change in raw rates.
These kinds of sloppy inferences play a dominant role in education debates and policy making, and they cripple both processes. Virtually every day, supporters and critics of individuals, policies, governance structures and even entire policy agendas parse mostly-transitory changes in raw test scores or rates as if they’re valid causal evidence, an approach that will, in the words of Kane and Staiger, eventually end up “praising every variant of educational practice.” There’s a reason why people can – and often do – use NAEP or other testing data to “prove” or “disprove” almost anything.
Nobody wins these particular battles. Everyone is firing blanks.
Back to Kevin Drum. He presents a list of a few things that set off his skepticism alarms. Some of them, like sample size and replication, are sensible (though remember that even small samples, or just a few years of data from a single location, can be very useful, so long as you calibrate your conclusions/interpretations accordingly).
His “alarm system” should not have allowed the LA Times analysis to pass through undetected, but his underlying argument – that one must remain “almost boundlessly and annoyingly skeptical” when confronted with evidence – is, in my view, absolutely correct, as regular readers of this blog know very well (especially when it comes to the annoying part).
The inane accusations that this perspective will inevitably evoke -e.g., “protecting the status quo” – should be ignored. Policy makers never have perfect information, and trying new things is a great and necessary part of the process, but assuming that policy changes can do no harm (whether directly or via opportunity costs) is as wrongheaded as assuming they can do no good.
Still, this caution only goes so far. We should always be skeptical. The next, more important step is knowing how to apply and resolve that skepticism.
And this is, needless to say, extraordinarily difficult, even for people who have a research background. There’s a constant barrage of data, reports and papers flying around, and sifting through it with a quality filter, as well as synthesizing large bodies of usually mixed evidence into policy conclusions, are massive challenges. Moreover, we all bring our pre-existing beliefs, as well as other differences, to the table. There are no easy solutions here.
But, one useful first step, at least in education, would be to stop pointing fingers and acknowledge two things. First, neither “side” has anything resembling a monopoly on the misuse of evidence. And, second, such misuse has zero power if enough people can identify it as such.
- Matt Di Carlo
No comments:
Post a Comment