Assessing teaching

Ever since Malcolm Gladwell’s amazing article comparing the difficulty in predicting teacher success to that of predicting the NFL success of quarterbacks, I’ve been fascinated by how we evaluate teachers.  Among the most memorably aspects of the article– nothing that we currently use to evaluate/reward teachers (i.e., certification, Master’s degrees, etc.) actually corresponds to better teaching.  The latest trend, and that advocated by Gladwell, is value-added assessment.  Look at how much better the students got over the course of a year with that particular teacher.  Thus, you control for all the socioeconomic factors which make comparing inner-city teachers and wealthy, suburban area teachers pretty much pointless.

Via Kevin Drum, we learn that good as this metric is, it is significantly flawed:

But a new EPI report says that value-added sucks anyway:

One study found that across five large urban districts, among teachers who were ranked in the top 20% of effectiveness in the first year, fewer than a third were in that top group the next year, and another third moved all the way down to the bottom 40%. Another found that teachers’ effectiveness ratings in one year could only predict from 4% to 16% of the variation in such ratings in the following year. Thus, a teacher who appears to be very ineffective in one year might have a dramatically different result the following year. The same dramatic fluctuations were found for teachers ranked at the bottom in the first year of analysis.

Much like democracy, though, it is the least bad of the alternatives:

Education expert Kevin Carey agrees that value-added is a lousy metric:

But, and this is an enormous caveat, everything else we currently use is worse. A teacher’s years of experience, their education credentials, their certification status, the prestige of their college or their college GPA, even in-class observations. None of these measures does as good of a job at predicting a student’s academic growth as a teacher’s value-added score. Yet, we continue to use these poor proxies for quality at the same we have such passionate fights about measures of actual performance.

It’s pretty clear to me, the the take-home lesson is that we should continue to use this metric, but simply be smart about it’s limitations. It’s clearly a blunt instrument.  Is a teacher who scores 2 standard errors below the mean 2 years in a row a bad teacher?  Almost assuredly.  Is a teacher who scores modestly above the mean in a given year better than a teacher who scored modestly below?  Who knows.  What we can do, though, is clearly take action to improve, and if necessary, eliminate teachers who consistently fall at the bottom and to reward teachers who consistently place at the top.  It’s really not that complicated.

Advertisements

About Steve Greene
Professor of Political Science at NC State http://faculty.chass.ncsu.edu/shgreene

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: