Improving Verifiability in AI Development

We’ve contributed to a multi-stakeholder report by 58 co-authors at 30 organizations, including the Centre for the Future of Intelligence, Mila, Schwartz Reisman Institute for Technology and Society, Center for Advanced Study in the Behavioral Sciences, and Center for Security and Emerging Technologies. This report describes 10 mechanisms to improve the verifiability of claims made about AI systems. Developers can use these tools to provide evidence that AI systems are safe, secure, fair, or privacy-preserving. Users, policymakers, and civil society can use these tools to evaluate AI development processes.

Read Report

While a growing number of organizations have articulated ethics principles to guide their AI development process, it can be difficult for those outside of an organization to verify whether the organization’s AI systems reflect those principles in practice. This ambiguity makes it harder for stakeholders such as users, policymakers, and civil society to scrutinize AI developers’ claims about properties of AI systems and could fuel competitive corner-cutting, increasing social risks and harms. The report describes existing and potential mechanisms that can help stakeholders grapple with questions like:

Can I (as a user) verify the claims made about the level of privacy protection guaranteed by a new AI system I’d like to use for machine translation of sensitive documents?
Can I (as a regulator) trace the steps that led to an accident caused by an autonomous vehicle? Against what standards should an autonomous vehicle company’s safety claims be compared?
Can I (as an academic) conduct impartial research on the risks associated with large-scale AI systems when I lack the computing resources of industry?
Can I (as an AI developer) verify that my competitors in a given area of AI development will follow best practices rather than cut corners to gain an advantage?

The 10 mechanisms highlighted in the report are listed below, along with recommendations aimed at advancing each one. (See the report for discussion of how these mechanisms support verifiable claims as well as relevant caveats about our findings.)

Institutional Mechanisms and Recommendations

Third party auditing. A coalition of stakeholders should create a task force to research options for conducting and funding third party auditing of AI systems.
Red teaming exercises. Organizations developing AI should run red teaming exercises to explore risks associated with systems they develop, and should share best practices and tools.
Bias and safety bounties. AI developers should pilot bias and safety bounties for AI systems to strengthen incentives and processes for broad-based scrutiny of AI systems.
Sharing of AI incidents. AI developers should share more information about AI incidents, including through collaborative channels.

Software Mechanisms and Recommendations

Audit trails. Standard setting bodies should work with academia and industry to develop audit trail requirements for safety-critical applications of AI systems.
Interpretability. Organizations developing AI and funding bodies should support research into the interpretability of AI systems, with a focus on supporting risk assessment and auditing.
Privacy-preserving machine learning. AI developers should develop, share, and use suites of tools for privacy-preserving machine learning that include measures of performance against common standards.

Hardware Mechanisms and Recommendations

Secure hardware for machine learning. Industry and academia should work together to develop hardware security features for AI accelerators or otherwise establish best practices for the use of secure hardware (including secure enclaves on commodity hardware) in machine learning contexts.
High-precision compute measurement. One or more AI labs should estimate the computing power involved in a single project in great detail and report on lessons learned regarding the potential for wider adoption of such methods.
Compute support for academia. Government funding bodies should substantially increase funding for computing power resources for researchers in academia, in order to improve the ability of those researchers to verify claims made by industry.

We and our co-authors will be doing further research on these mechanisms and OpenAI will be looking to adopt several of these mechanisms in the future. We hope that this report inspires meaningful dialogue, and we are eager to discuss additional institutional, software, and hardware mechanisms that could be useful in enabling trustworthy AI development. We encourage anyone interested in collaborating on these issues to connect with the corresponding authors and visit the report website.

Read Report

Report Authors

Miles Brundage OpenAI
Shahar Avin Centre for the Study of Existential Risk, Leverhulme Centre for the Future of Intelligence
Jasmine Wang Mila, University of Montreal
Haydn Belfield Centre for the Study of Existential Risk, Leverhulme Centre for the Future of Intelligence
Gretchen Krueger OpenAI

(Equal contribution)

Gillian Hadfield OpenAI, University of Toronto, Schwartz Reisman Institute for Technology and Society
Heidy Khlaaf Adelard
Jingying Yang Partnership on AI
Helen Toner Center for Security and Emerging Technology
Ruth Fong University of Oxford
Tegan Maharaj Mila, Montreal Polytechnic
Pang Wei Koh Stanford University
Sara Hooker Google Brain
Jade Leung Future of Humanity Institute
Andrew Trask University of Oxford
Emma Bluemke University of Oxford
Jonathan Lebensold Mila, McGill University
Cullen O’Keefe OpenAI
Mark Koren Stanford Centre for AI Safety
Théo Ryffel École Normale Supérieure (Paris)
JB Rubinovitz Remedy.AI
Tamay Besiroglu University of Cambridge
Federica Carugati Center for Advanced Study in the Behavioral Sciences
Jack Clark OpenAI
Peter Eckersley Partnership on AI
Sarah de Haas Google Research
Maritza Johnson Google Research
Ben Laurie Google Research
Alex Ingerman Google Research
Igor Krawczuk École Polytechnique Fédérale de Lausanne
Amanda Askell OpenAI
Rosario Cammarota Intel
Andrew Lohn RAND Corporation
David Krueger Mila, Montreal Polytechnic
Charlotte Stix Eindhoven University of Technology
Peter Henderson Stanford University
Logan Graham University of Oxford
Carina Prunkl Future of Humanity Institute
Bianca Martin OpenAI
Elizabeth Seger University of Cambridge
Noa Zilberman University of Oxford
Seán Ó hÉigeartaigh Leverhulme Centre for the Future of Intelligence, Centre for the Study of Existential Risk
Frens Kroeger Coventry University
Girish Sastry OpenAI
Rebecca Kagan Center for Security and Emerging Technology
Adrian Weller University of Cambridge, Alan Turing Institute
Brian Tse Future of Humanity Institute, Partnership on AI
Elizabeth Barnes OpenAI
Allan Dafoe Future of Humanity Institute
Paul Scharre Center for a New American Security
Ariel Herbert-Voss OpenAI
Martijn Rasser Center for a New American Security
Shagun Sodhani Mila, University of Montreal
Carrick Flynn Center for Security and Emerging Technology
Thomas Gilbert University of California, Berkeley
Lisa Dyer Partnership on AI
Saif Khan Center for Security and Emerging Technology
Yoshua Bengio Mila, University of Montreal
Markus Anderljung Future of Humanity Institute

(Descending contribution)

OpenAI

Vedere AI

Improving Verifiability in AI Development

Institutional Mechanisms and Recommendations

Software Mechanisms and Recommendations

Hardware Mechanisms and Recommendations

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.