Safety Measures Strengthened: OpenAI Grants Board with Final Authority over Risky AI

OpenAI is expanding its internal safety processes to fend off the threat of harmful AI. In-production models are governed by a “safety systems” team; this is for, say, systematic abuses of ChatGPT that can be mitigated with API restrictions or tuning. Frontier models in development get the “preparedness” team, which tries to identify and quantify risks before the model is released. So, only medium and high risks are to be tolerated one way or the other. For that reason OpenAI is making a “cross-functional Safety Advisory Group” that will sit on top of the technical side, reviewing the boffins’ reports and making recommendations inclusive of a higher vantage.

OpenAI is taking steps to ensure the safety of its internal processes in the face of potential dangers posed by artificial intelligence. The company has recently formed a “safety advisory group” that will have authority over the technical teams and provide recommendations to leadership. While it remains to be seen if the group will utilize its veto power, it is evident that OpenAI is taking the threat of harmful AI seriously.

In most cases, the intricacies of policies like these do not attract much attention. They usually involve private meetings and complex responsibilities that are not easily accessible to outsiders. However, recent developments and ongoing discussions about AI risk make it imperative to examine how the leading AI development company in the world is addressing safety concerns.

In a new document and blog post, OpenAI outlines its updated “Preparedness Framework.” This framework has likely been reworked following the November shake-up that saw two members of the board, Ilya Sutskever and Helen Toner, removed. The main goal of the update is to provide a clear process for identifying, analyzing, and deciding how to handle “catastrophic” risks associated with their models. As defined by OpenAI, these risks include events that could result in massive economic damage or cause harm or death to numerous individuals, including existential risks such as the rise of advanced AI models.

The company has divided its models into three categories: “safety systems” for models in production, “preparedness” for those in development, and “superalignment” for theoretical guidance on “superintelligent” models. The first two categories, being real and not fictional, have a clear system for risk evaluation. Each model is assessed based on four risk categories: cybersecurity, “persuasion” (manipulation), model autonomy, and CBRN (chemical, biological, radiological, and nuclear threats). After considering existing mitigations, any model with a “high” risk cannot be deployed, and any model with “critical” risks will not be pursued further.

The framework outlines the risk levels in detail, leaving no room for ambiguity or discretion on the part of engineers or product managers. For instance, in the cybersecurity category, it is considered a “medium” risk to increase the productivity of operators on specific tasks. In comparison, a “high” risk model would have the ability to identify and develop exploits against secure targets without human intervention. A “critical” risk model would be capable of devising and executing novel strategies for cyberattacks on secure targets without any human guidance. Clearly, such a model would pose a significant threat and, therefore, would not be released by OpenAI.

OpenAI also has a “critical” approach to managing high-risk models. The individuals creating these models are not always the best qualified to evaluate and make recommendations on their potential risks. Therefore, a “cross-functional Safety Advisory Group” has been formed to review reports and provide recommendations from a higher level. This will hopefully uncover any “unknown unknowns,” which are notoriously difficult to catch. The group’s recommendations will be simultaneously sent to the board and leadership for consideration. Any decision to proceed with a high-risk model will ultimately rest with the leadership, but the board has the power to reverse these decisions.

This process aims to prevent another incident like the one rumored to have happened before the board shake-up, where a high-risk product was greenlit without the board’s knowledge or approval. However, the result of the shake-up was the removal of two critical voices and the addition of individuals with a financial background but no expertise in AI, such as Bret Taylor and Larry Summers.

It’s worth noting that the framework does not address transparency, except for a promise that OpenAI will seek audits from independent third parties. Furthermore, the company has been known to boast about its powerful models, even opting not to release them due to their immense capabilities. However, there is no guarantee that this will continue to be the case if a model is deemed to have a “critical” risk. Whether this is a mistake or a deliberate decision is left open for interpretation.