Woman standing in doorway of classroom

At the Network for Educator Effectiveness, we are continuously thinking about how to best support authentic leadership and build leadership capacity in schools. We seek to support educators in becoming true instructional leaders because instructional leadership is the anchor that drives strong schools, strong teacher performance, and strong student outcomes. 

With that understanding, NEE has explored various ways to strengthen instructional leadership. Previous research has found:

  • Instructional leadership efficacy has more impact on student performance than other often-cited leadership characteristics.
  • Teacher perceptions of feedback from their principal are a catalyst for teacher growth.
  • The type of feedback comments that principals provide influences teacher growth.

Another area we are investigating is the relationship between artificial intelligence and instructional leadership. Since 2023, NEE has been involved with a multi-faceted research team to learn how generative AI tools interact with educator evaluation data.  

This research has uncovered multiple implications for leadership and how schools utilize artificial intelligence to enhance or support evaluation practices. Although the field is new and changing rapidly, let’s explore what we have discovered so far.

NEE Research Findings: Using AI in Educator Evaluations

Independent research led by Dr. Seth Baxter Hunter at George Mason University used NEE data to analyze approximately 1,400 classroom observations from 636 unique principals in 68 Missouri school districts over multiple years. The analysis reviewed three scoring approaches – human supervisors, item response theory, and AI/LLM – and their impact on five school-based outcomes: achievement, teacher climate, student survey, staff retention, and value-added.

Main findings include:

  1. AI models are reliable but not necessarily valid. The research suggests AI models are reliable and internally consistent when used for evaluations (i.e., when given different sets of instructions for scoring the same documents, the scores generated are consistent). However, validity remains the key question. Even with a reliable and consistent score, it is unknown what AI is actually measuring and whether those results are accurate, fair, and unbiased.
  2. Not all AI setups are equal. Some models produce scores that “look better on paper” but might not be accurate. School leaders should understand that “AI scoring” does not mean one thing; it depends heavily on how an AI tool is trained and configured.
  3. Human/IRT scores connect to teacher perceptions of principal performance and school achievement gains. AI/LLM’s only significant connection was to staff stability — specifically teacher retention – which could be an organizational health signal, not necessarily a leadership signal. Well-resourced schools may produce better scores and keep more teachers. The AI may be picking up on school infrastructure, not principal leadership per se.
  4. No evaluation method reliably predicted future school outcomes. None of the evaluation approaches tested can predict where a school is headed. There is no empirical basis in the data or in broader literature that evaluation technologies show predictive validity.
  5. Research on the use of AI in educator evaluations is limited. A systematic literature search found zero empirical studies (other than Dr. Hunter’s work) that have directly tested AI as a rater of K-12 principal performance in field settings. Other studies involve lab prototypes, non-K-12 settings, or measure AI readiness rather than AI-scored performance. The field is new, and the evidence is thin.

Understanding the AI Temptation

For instructional leaders, time constraints and management issues can often feel as if they are ceaseless obstacles. There are myriad responsibilities and directions that pull at principals’ attention, and it makes sense to look for assistance to ease the pressures of daily tasks.

AI can offer a wide range of assistance, and tools are available to help principals transcribe classroom observations, suggest feedback for teachers, write formal and personal communications, and even score observations and other evaluation documents.

While AI’s quick processing may be a tempting way to ease time constraints, concerns about its technical precision require careful consideration. Research findings suggest limited understanding of how AI models interpret data, what specifically they measure, and murky-at-best contextual understandings that are central to school dynamics.

There are legitimate concerns about AI “deskilling” professional practices, steadily chipping away at highly nuanced skills due to an over-reliance on non-human outputs. AI “hallucinations” also cause concern as models can create inaccurate or nonsensical results based on patterns they perceive that do not actually exist.

When it comes to AI tools, discretion is key. AI might be used in low-stakes processes, such as brainstorming ideas and refining communications. However, as the stakes increase – as the relationships involved increase in complexity – the use of AI risks undermining authentic leadership. We say this not just because of the limited understanding of AI functioning in high-stakes processes but because high stakes and complex relationships are where authentic leadership thrives.

Developing Authentic Instructional Leadership in the AI Era

Man and woman high five while sitting at a table with papers spread in front of them

Leadership is a skill that is grown through every decision made. When leadership decisions are given to AI, leaders lose opportunities for reflection, refinement, and development.

It is important for school leaders to trust in their own leadership, build upon past experiences, and rely on the modeling and mentorship of the humans in their network.

Even in the AI era, school leaders must continue to develop:

  1. A deep expertise in observations and feedback. NEE trains principals, not algorithms, on high-quality observation practices and feedback that builds teacher and school leader capacity. NEE training addresses effective feedback processes using feedback paths, impact statements, and affirmative feedback to tailor feedback to teachers’ specific needs and to support ongoing professional learning. Through training, school leaders get quicker, more efficient, and more skilled, developing trust in themselves as instructional leaders without relying on AI.
  2. Trust and relationships. Evaluation processes only work when teachers trust the system. So far, there is no evidence that AI scoring is accurate, valid, and unbiased. Effective feedback from a human leader who knows the teacher’s classroom environment will continue to be the best driver of trust, relationships, and true professional growth. 
  3. Evaluation processes based on a proven, research-aligned framework. NEE standards and indicators are aligned with research-based frameworks, providing school leaders with a level of rigor and stability that is currently lacking in AI models.
  4. Platforms built on transparency. The NEE platform provides school leaders with clear, documented, and defensible tools for educator evaluation. There is no ambiguity in how scores are generated and determined. Furthermore, teachers also have access to see scores and feedback immediately.

AI and Instructional Leadership from the Systems Level

Relying too much on AI stifles the opportunity for leaders to build authentic leadership. This is problematic in individual practice, and it becomes more problematic at the system level.

School systems rely on the capacity of their leaders. Principals are catalysts for school, teacher, and student growth. Luckily, instructional leadership is a skill that can be learned, strengthened, and sustained through a commitment to practice.

As a system, NEE is designed to strengthen instructional leadership in schools. Independent research shows that the NEE system is associated with enhanced student performance, especially in high-needs settings. We believe the reason is twofold: our commitment to design a research-driven evaluation experience and the consistency of implementation within our partner districts.

NEE has multiple processes that support implementation fidelity at the school system level. NEE uses research-backed observation rubrics, evidence-based indicators of highly effective teaching, reliable and valid surveys, narrative-based organizers that emphasize planning and reflection, intensively aligned professional learning, and principal training to build accuracy in evaluation and feedback. These practices, repeated consistently and purposefully, build leadership efficacy and support the entire system.

Classroom observations, feedback, and teacher professional development are taught and practiced skills that improve with repetitive and mindful human practice. In the era of AI, it is more important than ever that school leaders sharpen their skills through practice, relationships, and a commitment to mindful decision-making.

Tom Hairston is the Managing Director of the Network for Educator Effectiveness and has worked with NEE since 2011. Prior to his work with NEE, he worked as a Positive Behavioral Interventions & Supports Consultant for the Heart of Missouri Regional Professional Development Center at the University of Missouri. He began his career in education as a high school special education and language arts teacher and football coach at Moberly High School in Moberly, Mo. Tom received his PhD in Educational Leadership and Policy Analysis from the University of Missouri in 2012.


The Network for Educator Effectiveness (NEE) is a simple yet powerful comprehensive system for educator evaluation that helps educators grow, students learn, and schools improve. Developed by preK-12 practitioners and experts at the University of Missouri, NEE brings together classroom observation, student feedback, teacher curriculum planning, and professional development as measures of effectiveness in a secure online portal designed to promote educator growth and development.