Executive Gov
  • Home
  • Acquisition & Procurement
  • Agencies
    • DoD
    • Intelligence
    • DHS
    • Civilian
    • Space
  • Cybersecurity
  • Technology
  • Awards
  • News
  • About
  • Wash100
  • Contact Us
    • Advertising
    • Submit your news
No Result
View All Result
Executive Gov
  • Home
  • Acquisition & Procurement
  • Agencies
    • DoD
    • Intelligence
    • DHS
    • Civilian
    • Space
  • Cybersecurity
  • Technology
  • Awards
  • News
  • About
  • Wash100
  • Contact Us
    • Advertising
    • Submit your news
No Result
View All Result
Executive Gov
No Result
View All Result
Home Artificial Intelligence

NIST Publishes New Guidance to Strengthen AI Benchmark Evaluations

by Miles Jamison
February 25, 2026
in Artificial Intelligence, News
NIST logo. NIST issued new guidance aimed at strengthening the statistical validity of AI benchmark evaluations.

The National Institute of Standards and Technology has issued new guidance aimed at strengthening the statistical validity of artificial intelligence benchmark evaluations.

The National Institute of Standards and Technology has issued new guidance aimed at strengthening the statistical validity of artificial intelligence benchmark evaluations.

Table of Contents

    • You might also like
    • William Pulte Named Acting Director of National Intelligence
    • Supriya Ahuja Assumes Acting Deputy CISO Role at DHS
    • Executive Order Advances AI Cybersecurity, Frontier Models
  • What Problem Is NIST Addressing?
  • How Does the New Framework Enhance Evaluation?
  • NIST Seeks Public Input on Automated LLM Benchmarking

You might also like

William Pulte Named Acting Director of National Intelligence

Supriya Ahuja Assumes Acting Deputy CISO Role at DHS

Executive Order Advances AI Cybersecurity, Frontier Models

NIST Publishes New Guidance to Strengthen AI Benchmark Evaluations

Register for the Potomac Officers Club’s 2026 Artificial Intelligence Summit on March 18 to explore real-world strategies and applications of AI, machine learning and automation.

What Problem Is NIST Addressing?

NIST said Thursday its new publication, Expanding the AI Evaluation Toolbox with Statistical Models, addresses shortcomings in common benchmark evaluation practices. These often rely on implicit assumptions, conflate different measures of system performance or fail to adequately quantify uncertainty. Such gaps can complicate interpretation and hinder decision-making based on reported results.

How Does the New Framework Enhance Evaluation?

The NIST AI 800-3 publication introduces a formal modeling framework to clarify how AI benchmark results are interpreted and how uncertainty is measured. It distinguishes between benchmark accuracy, which measures performance on a fixed set of benchmark questions, and generalized accuracy, which estimates performance across a broader population of similar questions. NIST notes that the two measures may differ and require distinct calculation methods.

The publication highlights the use of generalized linear mixed models, or GLMMs, to estimate AI performance and gain insights into benchmark composition and large language models, or LLMs. While regression-free approaches remain common with evaluators, GLMMs can more precisely quantify uncertainty and provide additional explanatory insights when correctly specified.

NIST Seeks Public Input on Automated LLM Benchmarking

In a similar move, NIST is seeking public feedback on a related draft framework focused on automated benchmarking practices for LLMs. The Center for AI Standards and Innovation released an initial public draft of NIST AI 800-2, Practices for Automated Benchmark Evaluations of Language Models. This aims to provide guidance on how automated benchmarks are designed, implemented and applied to evaluate LLMs.

Share5Tweet19

Recommended For You

William Pulte Named Acting Director of National Intelligence

by Jane Edwards
June 3, 2026
William Pulte. The Federal Housing Finance Agency director has been named acting director of national intelligence.

Trump has tapped William Pulte to serve as acting DNIPulte's appointment follows the resignation of Tulsi Gabbard as DNIThe 2026 Intel Summit on Sept. 24 will examine data,...

Read moreDetails

Supriya Ahuja Assumes Acting Deputy CISO Role at DHS

by Kristen Smith
June 3, 2026
Supriya Ahuja. DHS appointed cybersecurity leader Supriya Ahuja as acting deputy chief information security officer.

Supriya Ahuja was named acting deputy chief information security officer at DHSThe cybersecurity leader brings extensive experience in risk management, compliance and vulnerability programsThe appointment follows Ahuja's years...

Read moreDetails

Executive Order Advances AI Cybersecurity, Frontier Models

by Jane Edwards
June 3, 2026
Artificial intelligence. President Trump signed an executive order to drive AI innovation to strengthen U.S. cybersecurity.

Trump's new executive order has ordered agencies to prioritize AI-related cybersecurity measuresThe EO calls for the establishment of an AI cybersecurity clearinghouseThe 2026 Homeland Security Summit will examine...

Read moreDetails

Army PM EW&C Releases Forecast of Up to $2.8B in Contract Opportunities

by Charles Lyons-Burt
June 3, 2026
Army Project Manager Electromagnetic Warfare and Collection logo. The office issued a planning guide with contract info.

The Army's Project Manager Electromagnetic Warfare and Collection office has outlined several potential acquisition opportunities worth billions of dollars in a recently released FY2027 strategic planning guide, offering...

Read moreDetails

GAO Flags Federal EHR Cybersecurity Coordination Gaps

by Kristen Smith
June 3, 2026
GAO logo. The watchdog has called for stronger cybersecurity oversight of the federal EHR system.

GAO has called for stronger cybersecurity oversight of the federal electronic health record systemThe audit found gaps in how agencies define and measure joint cybersecurity and privacy effortsThe...

Read moreDetails
Sign Up For Our Newsletter
Subscribe to our mailing list to receives daily updates direct to your inbox!
Invalid email address
Your privacy is guranteed.
Thanks for subscribing!

Sponsors

About ExecutiveGov

ExecutiveGov, published by Executive Mosaic, is a site dedicated to the news and headlines in the federal government. ExecutiveGov serves as a news source for the hot topics and issues facing federal government departments and agencies such as Gov 2.0, cybersecurity policy, health IT, green IT and national security. We also aim to spotlight various federal government employees and interview key government executives whose impact resonates beyond their agency.

CATEGORIES

  • Acquisition & Procurement
  • Announcements
  • Articles
  • Artificial Intelligence
  • Awards
  • Big Data & Analytics News
  • C4ISR
  • Civilian
  • Cloud
  • Contract Awards
  • Cybersecurity
  • Defense And Intelligence
  • Defense Security Cooperation
  • DHS
  • Digital Modernization
  • DoD
  • Events
  • Executive Moves
  • Executive Spotlights
  • Federal Civilian
  • Financial Reports
  • General News
  • GovCon Expert
  • Government Technology
  • GSA
  • Healthcare IT
  • Industry News
  • Intelligence
  • Legislation
  • M&A Activity
  • National Security
  • News
  • Policy Updates
  • Press Releases
  • Profiles
  • Space
  • Videos
  • Wash100
Sign Up For Our Newsletter
Subscribe to our mailing list to receives daily updates direct to your inbox!
Invalid email address
Your privacy is guranteed.
Thanks for subscribing!

Copyright 2026 Executive Mosaic. All Rights Reserved.

No Result
View All Result
  • Home
  • Acquisition & Procurement
  • Agencies
    • DoD
    • Intelligence
    • DHS
    • Civilian
    • Space
  • Cybersecurity
  • Technology
  • Awards
  • News
  • About
  • Wash100
  • Contact Us
    • Advertising
    • Submit your news

Copyright 2026 Executive Mosaic. All Rights Reserved.

Get your free GovCon news!

Get your latest GovCon news and insights. Become a VIP and subscribe to the GovConWire Daily News.

Invalid email address
We promise not to spam you. You can unsubscribe at any time.
Thanks for subscribing!