Sunday, December 25, 2016

You Think You Understand the New Eval System? You Don't. Here's Why

Attending a monthly Chapter Leader meeting during the weaning days of Joel Klein's Chancellorship felt a lot like visiting the crew who was stuck inside the walls of the Alamo. When there weren't complaints about the Quality Review or complaints about Klein's new emphasis on enforcement (dubbed 'Gotacha Squad'), there were complaints -howls, actually- about the capricious "U" ratings that teachers across the city were getting. 

Apparently, Klein had rigged the whole process so that any teacher whom one of his Leadership Academy principals didn't like simply got a U rating for no real reason. The backdrop to this, an appeal process with a three-person panel (one picked by the union, one picked by the city and one picked in some other Byzantine way) had been completely compromised and 99.9% of these U ratings were upheld on appeal many times with no evidence at all to support it. I can't tell you how many hours I spent in monthly Chapter Leader meetings listening to uncomfortable details about how this process was unfolding. I was CL for exactly two years and almost 1/2 of the time of each meeting back then was devoted to this one topic. 

So now, almost eight years later (and six years after Klein lost interest and moved on), the UFT has finally arrived at a defense for these capricious U ratings: The new new new new teacher evaluation system of 2014 2015 2016.

Groups ranging from Students' First to the UFT MORE caucus are objecting to it and the DoE and UFT's Unity Caucus is heralding it. 

Both sides are completely wrong. The simple fact is that the newest version of the evaluation is rooted in recent history, the result of scars still held by the people who run the city union and, at the end of the day, neither good or bad. It causes neither harm or pain and will not advance or regress our profession or our students in the slightest way. It is as useless, and as useful, as a teat on a very old bull.

Now if everyone can now stop clapping and stop objecting, that'd be great. Thanks.

The essence of the system is two words: Multiple Measures. Once you understand the concept of multiple measures, then you'll understand how utterly neutral this new agreement is. ICEUFT blog, written by a long time chapter leader who understood the world of the S/U rating system has objected that the union agreed to too many classroom observations (three times the amount other districts in the state gets). I was surprised at that objection because he, of all, chapter leaders, had to deal with high stakes classroom observations that resulted in unfair U ratings -rating that many times killed a teacher's whole career. The more classroom visits we have, the more watered down each visit's rating becomes and the less easy it is wreck a teacher's career.  Most teachers will have 4-6 observations over the course of the entire year. Thats 4-6 measures.

Fact: The more observations that are required, the harder an administrator has to work to hurt a teacher. It's easy to wreck a teacher's career when you have only 40 observations to perform per year. Try keeping on top of wicked task that with 250 observations to perform, data to enter in detailed manner and 'on the record' reports to generate for each and every observation. It's a lot harder to go after a teacher under that process. 

That's not to say that it can't be done. Of course it can. But when it can be done, the observations had better align with the teachers' test results. Because if the two measures don't match the teachers' test scores, then the teacher still escapes with his or her middle finger fully in tact. 

And that test may now count for as much as half of the rating. That's unbelievably not good. However under the previous system, where testing counted for forty percent (just under 1/2), the tests themselves were cut into two different categories.  For HS teachers, this meant that how your students did counted for 20% and (in many cases), how all of the school's students performed on tests in your department counted for another 20%. That's 2 more measures. These became combined and when they did, it looked real bad for your capricious, abusive administrator if they did not match their observations of your teaching.  

That's not to say this system isn't bad. The truth is that no one pays attentions to Danielson at all until the teacher has been "I" rated for close to a year. Something (like Danielson) that no one understands can easily be used as a stick to beat someone with. But this stick is more reversible. It has a built-in review process and almost all of the APPR grievances in my building result in the observation being removed. Why? Because the process is so difficult to keep up with, it's almost impossible to do without breaking some rule of some sort. The best job in the NYC DoE is that of tenured AP. They don't want to ruin all that cushiness just to go after a teacher. 

 However, I was VERY surprised at the UFT leadership for bragging about the result Matrix to determine final ratings. They spoke about it as though it was both simple AND fair. Let's be clear: The matrix neither simple or fair. (The whole system is neither simple or fair.) True, the Matrix doesn't' default to ineffective if your test scores suck (and it doesn't default to ineffective if your observations suck). But the Matrix and a Danielson process so complex that no one pays attention to it does nothing to improve teaching. And teachers who don't understand the process will have a more difficult time preparing to defend their jobs.

Final thought: This is where we'll all be left for whole generation of teachers. While it doesn't hurt too much and doesn't help at all, the current evaluation system does haves the benefit of operating from political consensus up in Albany. Because of this, this current system isn't going anywhere (in any substantial way) for the rest of our careers. This is it, folks. This (finally) is the hand we've been dealt: A system that is difficult to maneuver, where administrators will have to work their assess off to end our careers and where data and tests scores (many of which is generated by us) will dictate much of our review scores. 

No one happy. No one hurt. "Welcome. To the real world."


  1. I admire you thought process here. However, I disagree on two points. One, the reason many people are not happy with this new evaluation is that the number of observations is 4 and that means that it is 2 more than the state requires. What that means is the aspect of FEAR is spread out over a much longer part of the year. If a principal only has to do one formal observation which is announced and one that is unannounced, that pretty much means that a teacher is only going to face one "gotcha" observation and thus once that informal observation is over, the fear subsides 100% till next year. Second point is that I believe that you underestimate how smart many of these principals are in using Danielson as a weapon. Yes, the have to write up a lot of observation reports which is work for them, but they know exactly how to attack a teacher and only write up the bad aspects of the observation. On the flip side, I do like that the matrix ups the score of MOSEL and observation part of the evaluation.

  2. I really appreciate your post. It is rationale and realistic. I have always chosen the option of a full period observation. For two reasons, one I know how to plan and execute a lesson that we definitely result in an overall effective rating with at least some highly effective elements. Secondly, this keeps the admins busy which is a strategy that must be employed. I have the biggest problem with the drive by observations. People who wanted two instead of four, it would not even matter because they would probably make you wait until April for the gotcha ob, keeping you on edge the majority of the schools year. What we should be asking for in our next evaluation, probably in about 18 months, is an option of three full periods and one drive by. I would have no problem with that because I know even if the drive by was horrible I still would be ok because the ones I planned would be fine. I don't care if they stay the whole period. I don't understand people who do care. I do not like the unknown. I can do my job and hit every note for danielson when I know. No one can be HE everyday all the time. On every given day, I am all over that rubric from I to D to E to HE. I want more full periods with a pre and a post.

  3. Just saw this today. Let's set the record straight.

    There were very few U ratings at Jamaica High School under the old system. You can look that up. We were a very strong chapter. That's one of the reasons we had to go. However, under the 2013 system, most of us who remained at Jamaica (all but one) were rated ineffective or developing due to the test scores. You can look that up too.

    Most teachers hate observations. Observations serve no purpose other than intimidating teachers in many schools. The UFT is supposed to represent us. Telling us it's good for us to have more so their impact is diluted kind of rings hollow with teachers who dread seeing the administrators coming in with their laptops or clipboards. We could have written into the agreement more observations for teachers in danger. That would have been simple and gotten some relief for the vast majority of teachers.

    A teacher could have been rated unsatisfactory a hundred times but the burden of proof in dismissal hearings was still on the Board of Ed in the old system for tenured teachers. Now with two ineffective ratings, it switches to the the teacher. Good luck trying to win that. It is not that hard to manipulate things in a school where kids are not prepared.

    Watching what my wife Camille has had to go through with a viscous administration at Humanities and the Arts Magnet High School makes me angry. Saying it's easy to remove observations is just not true if administration is careful. We can only fight procedures and after a while, they learn.

    Yes I would rather have a u and take my chances with the burden of proof on the Board in an incompetence dismissal hearing. Look up the facts again on how few tenured teachers were terminated for incompetence in the old system.

    Now teachers rated developing are brought up on incompetence charges. Teachers without two ineffective ratings have been terminated. Burden of proof is on Board as it used to be for everyone.

    I did a little survey on the ICEUFT blog and the results were that all of the readers who responded prefer the S or U system. Are all of them just poorly informed?

    1. I just saw this myself. Look, there is way too much of your comment to unpack over 3 hours of coffee and beer at the diner, much less in a comment. Throwing a few facts out there as well: 1) I agree with you about how easily this can be used to beat a teacher. ' that no one understands can easily be used as a stick to beat someone with" i believe is how i put it. 2) There was no appeal process under the old system. Hw many documents of appeals did you read where the Tweed head (usually Shael Suranski) wrote 'appeal denied' or 'evaluation upheld' despite having no data in the folder during the appeal hearing to back it up. I stopped counting at around 24. 24!! Now I don't know how many of those were at Jamaica HS, but I do know that there came a point, around 2008. where the rubber rooms throughout the city were completely filled. I know you know that too. So it occurs to me that there really isn't that much proof needed. If the board can throw you into a 3020 at any time, then 'burden of proof' doesn't really matter as much as it seems. Case in point; can you point to more than 5 teachers who have been taken off of payroll for incompetence SINCE 2010 for ineffective ratings? I cannot. So there sum results seem, to me, to be the same. I didn't' realize what was happening to your wife. Super sorry. Any way to alter that behavior there?

  4. Let's compare numbers citywide. Ask the average teacher if they want Danielson or the old system, most I know would say give them S or U any day of the week.

    1. In what way would that be comparing numbers?

  5. I'm re reading your original comment.
    "A teacher could have been rated unsatisfactory a hundred times but the burden of proof in dismissal hearings was still on the Board of Ed in the old system for tenured teachers"

    This really upsets me. You, of all people in this city, know full well and were around for the teacher witch hunts of 2007-2008, 2009 and 2010. You saw the rubber rooms filled to capacity and you know damn well that the 2010 "compromise" to close the rubber rooms spelled doom for a lot of good teachers.

    Like you, I am no fan of this current eval system. But it has MANY advantages over the old system. 1) Admins have to work 4x as hard to observe us to slip the noose on (that's at least twice the amount of observations and a complex rubric which I count as twice as hard as nothing). They must stick to observed data, data that can be challenged with competing data, and they have to come up with a TIP if we're not "effective" or above. That's more work for them, work they did not have before. Finally, if they do rate us "I", and far less teachers face an "I" rating than faced "U" ratings under the old system, we have an albeit limited process for appealing the "I" and hell on them if our testing "objective" measures don't meet the same level of "I" as their observations -and they know it. These are protections that we just didn't have to before.

    I will remind you that the first gotcha squad Klein sent was *not* for un-teacherly conduct cases. They were incompetency 3020As that began it and only after a year or so was it expended to include a full fledged gotcha squad on un teacherly like conduct. They went after people. Plenty of them. They jacked up the appeal process and put Eric S. and later Shael S in charge of rubber stamping each and every U rating that was given. During the d79 CL meetings, Marc Koreshan used to delineate the amount of empty folders that they Ps would send down to the appeal hearing ONLY to have the U rating affirmed. And why was it affirmed with no evidence? Because the DOE co-opted 2 out of 3 officers. I will not advocate that we go back to that system and hell if more teachers lack the institutional memory to recall how bad it had become. It was plenty damn bad.

    And again, this isn't that much better.