The Metrics of Trust: How Teacher Evaluation Reshaped the Classroom

Series: This post is part of Policy in the Classroom. See the full series →

Two Lessons, Years Apart

One afternoon at Delaware Technical Community College, I sat across from a colleague in our shared office, papers spread everywhere. We were filling out end-of-semester reports — the kind that tried to translate our teaching into metrics: retention rates, learning outcomes, rubric averages. He looked up and sighed. “I spend more time proving I’m teaching than actually teaching.”

We both laughed, but it wasn’t a funny laugh. It was the weary kind that comes when the joy of the work is slowly replaced by documentation of the work.

That was higher education in the accountability era — data dashboards, assessment reports, and performance reviews that often felt detached from the classroom itself.

And though the setting was different, the feeling wasn’t new.

Years earlier, in my high school classroom at Mount Pleasant High School, , the same logic had taken hold.

It was the second period on a gray Thursday morning when the principal slipped quietly into the back, clipboard in hand. I took a slow breath, smiled, and began the day’s lesson on motivation. The students were curious and engaged, but part of me wasn’t with them.

Half of my attention followed their ideas; the other half ran down a mental checklist:
Objective posted? Evidence of engagement? Lesson aligned to rubric?

Teaching, once fluid and relational, had started to feel like a performance.
That day, I realized how much of our professional trust — in ourselves, in our administrators, in the craft itself — had quietly been replaced by compliance.

The Policy Turn

When No Child Left Behind arrived in the early 2000s, it carried with it a powerful but simplistic idea: if we could measure learning, we could measure teachers. It sounded so simple.

Soon came Race to the Top, value-added models, and a host of mandated observation rubrics like the Danielson Framework.

In theory, these teacher evaluation policies would raise quality and weed out weak instruction.

In practice, they reshaped school life for everyone — teachers and administrators alike.

Principals were suddenly responsible for conducting dozens, even hundreds, of detailed evaluations each year, each one requiring multiple classroom visits, lengthy scoring forms, and post-observation conferences. I remember, we were supposed to have three observations a year, some years I barely barely had one. Many received minimal training in using complex rubrics or interpreting value-added data that even researchers debated.

The result was predictable: rushed observations, overworked leaders, and teachers left feeling unseen.

One RAND study found that administrators themselves reported insufficient time and support to implement evaluations meaningfully — turning what might have been professional dialogue into a bureaucratic exercise.

So while teachers felt scrutinized, principals felt stretched thin.
Both sides were doing what policy demanded, not what professional growth required.

When Accountability Feels Like Surveillance

Teaching has always been both art and science.
But when the art is constantly audited, the science becomes cold.

Many teachers describe the shift as a quiet form of surveillance — the sense that someone is always watching, even when the clipboard isn’t in the room. Based on a checklist, hurried observations, and incomplete evaluations many teachers felt targeted.

Psychologists call it surveillance anxiety: the awareness of being measured that quietly erodes creativity and intrinsic motivation.

But it wasn’t just teachers who carried the strain.

Administrators, too, found themselves trapped between mandates and meaning.
They were told to “drive instructional excellence,” but also to meet quotas, verify compliance, and upload evidence into state systems with little time to build trust.

The Frontiers in Education review, The Sizzle and Fizzle of Teacher Evaluation, noted that many systems “failed not for lack of effort but for lack of capacity.” The expectations were unrealistic — especially in schools already managing staff shortages, new curricula, and testing pressures.

We were all trying to make an impossible system humane.

The Psychology Behind Motivation

Psychologists Edward Deci and Richard Ryan describe three core human needs for sustained motivation: autonomy, competence, and relatedness.

In classrooms, those apply not just to students — but to teachers and principals as well.

Accountability reforms often undermined all three:

Autonomy disappeared when teachers and leaders were boxed in by state rubrics and pacing guides.
Competence was questioned when metrics tied to test scores overrode professional judgment.
Relatedness — that essential trust between teacher and administrator — eroded under the weight of paperwork and policy.

Some programs showed pockets of success.
The federal Impact Evaluation of Teacher and Leader Performance Systems found small gains when evaluation included high-quality feedback and leadership support.

But those effects vanished when principals were overburdened or when systems lacked coaching structures.

Simply put: the system demanded accountability, but starved everyone of time to make it meaningful.

Reframing Evaluation: From Judgment to Dialogue

So what would it look like if we reimagined the whole idea?
What if evaluation wasn’t about scoring performance but fostering reflection? What if time was given for administrators and teachers to sit and discuss performance?

Some districts are moving that way.

Peer observations and instructional rounds shift focus from compliance to collaboration.

Reflective supervision meetings emphasize dialogue — What did you notice? What did students show you? What might you try differently next time?

These models lighten the administrative load and restore a sense of shared purpose.
The Getting Down to Facts II report from Stanford found that teacher evaluation systems work best when they’re “embedded in supportive professional learning structures” — where both teachers and principals have time, training, and trust to engage in real feedback cycles.

That’s not what most policy frameworks created.
They measured everything but meaning.

Rebuilding that trust requires acknowledging how flawed design — not flawed people — drained the life out of teaching.

A Quiet Memory, and a Hope

Years later, after retirement, I ran into an old colleague at a luncheon.
We laughed about those “clipboard years” — the tension, the endless scoring forms, the forced smiles during post-observation meetings. Then she said, “You remember Mr. Fantine? He actually talked afterward. He’d ask what I thought went well.”

That one memory stuck with me.

He was still bound by policy, but he made space for humanity within it.

Policy will always shape schools. But the real work — the trust, the connection, the meaning — still happens in the quiet spaces between people.

If reformers truly want to improve teaching, they might start not with new rubrics, but with renewed relationships.

Trust — like learning — can’t be mandated.
It has to be grown.

(Related post: [The Weather of Policy])

References

Bleiberg, J. et al. (2024). Do Teacher Evaluation Reforms Improve Student Outcomes? Annenberg Institute Working Paper.
Education Week (2021). Efforts to Toughen Teacher Evaluations Show No Positive Impact on Students.
RAND Corporation (2018). Teacher Effectiveness and Evaluation Reform: Lessons from Implementation.
RAND (2017). School Leaders and Evaluation Burdens: Implementation and Capacity Challenges.
Deci, E. L., & Ryan, R. M. (2000). Self-Determination Theory and the Facilitation of Intrinsic Motivation, Social Development, and Well-Being.
Taylor, E. S., & Tyler, J. H. (2012). The Effect of Evaluation on Teacher Performance. American Economic Review.
U.S. Department of Education, IES (2019). Impact Evaluation of Teacher and Leader Performance Evaluation Systems.
Frontiers in Education (2023). The Sizzle and Fizzle of Teacher Evaluation in the United States.
Getting Down to Facts II (2018). The Effect of Teacher Evaluation and Support Systems on Teacher Practice. Stanford University.
Danielson, C. (2013). The Framework for Teaching: Evaluation Instrument