Investors were in for a shock when Sam Altman, the former chief executive officer of OpenAI, was forced out of the company. But little did they know, Altman was already devising a plan to make his grand return. Meanwhile, OpenAI’s Superalignment team was focused on finding ways to manage and regulate superintelligent AI systems.
Or at least, that’s what they want you to believe.
“This week, I had the opportunity to speak with three members of the Superalignment team – Collin Burns, Pavel Izmailov and Leopold Aschenbrenner – who were attending NeurIPS, the annual conference for machine learning in New Orleans. They were there to present OpenAI’s latest work on ensuring AI systems behave as intended.” – Author
The Superalignment team was formed in July with the mission of finding ways to steer, regulate, and govern superintelligent AI systems. These are theoretical systems with intelligence that surpasses that of humans.
“Right now, we are able to align models that are dumber than us – maybe even at human-level intelligence. But aligning a model that is smarter than us is a much more complex problem. How do we even begin to do that?” Burns explained.
Leading the Superalignment team is OpenAI’s co-founder and chief scientist, Ilya Sutskever. This may not have raised any eyebrows in July, but it certainly does now given Sutskever’s involvement in Altman’s departure. While some speculate that Sutskever is in limbo after Altman’s return, OpenAI’s public relations team has confirmed that as of now, Sutskever is still leading the Superalignment team.
The concept of Superalignment is a sensitive subject within the AI research community. Some argue that it is still premature, while others believe it is a distraction from more pressing issues within AI, such as algorithmic bias and toxicity.
Altman has often drawn comparisons between OpenAI and the Manhattan Project, assembling a team to investigate potential catastrophic risks such as chemical and nuclear threats. However, many experts argue that there is not enough evidence to suggest that AI systems will reach world-ending, superintelligent capabilities anytime soon. Claims of imminent superintelligence are often used to deflect attention from more pressing regulatory issues within the field.
Despite this, Sutskever remains firm in his belief that AI – not necessarily that of OpenAI – could become an existential threat in the future. In fact, he reportedly commissioned and destroyed a wooden effigy at a company retreat to demonstrate his commitment to preventing harm from befalling humanity. He also holds a significant amount of OpenAI’s compute – 20% of its current computer chips – for the Superalignment team’s research.
“It’s clear that AI progress has been accelerating rapidly, and this shows no signs of slowing down. We can expect to reach human-level intelligence soon, but it won’t stop there. We will continue to advance towards superhuman intelligence. So how do we align these superintelligent AI systems and make sure they are safe? This is a problem that affects all of humanity, and it may be the most significant technical challenge of our time.” Aschenbrenner stated.
The Superalignment team is currently focused on creating governance and control frameworks that could be applied to future powerful AI systems. However, it is not a straightforward task, as there is much debate over the definition of “superintelligence” and whether a specific AI system has achieved it. For now, the team’s approach involves using a less sophisticated AI model (such as GPT-2) to guide a more advanced model (GPT-4) towards desired outcomes and away from undesirable ones.
“A lot of what we are trying to do is give the model instructions and make sure it follows them. How can we get the model to only help with tasks that are true and not make things up? How can we make sure the model tells us if the code it generates is safe, rather than harmful? These are the types of tasks we hope to achieve through our research.” Burns explained.
But you may be wondering, what does AI guiding AI have to do with preventing a potential doomsday scenario? Well, it’s just an analogy. The weaker model represents human supervisors, while the stronger model represents superintelligent AI. Just like how humans may not fully understand a superintelligent system, the weaker model cannot comprehend all the intricacies of the stronger model. This setup serves as a useful tool for testing superalignment hypotheses, according to the Superalignment team.
“To give you an idea, it’s like a sixth-grade student trying to supervise a college student. The sixth-grader may have some knowledge about the task at hand, but the college student is better equipped to solve it. Similarly, our weak model can guide the stronger model towards the intended outcome, even if the labels it produces may contain errors and biases.” Izmailov explained.
The weak-strong model approach may also lead to breakthroughs in the field of hallucinations, according to the team.
“Hallucinations are an interesting topic because the model actually knows whether what it is saying is true or not. But the way these models are trained, humans reward them based on a simple “thumbs up” or “thumbs down” for their responses. This can inadvertently lead to rewards for false information or things the model may not even understand. If our research is successful, we may be able to develop techniques that can summon the model’s knowledge and determine whether something is fact or fiction – ultimately reducing hallucinations.” – Aschenbrenner
However, this analogy is not perfect. That’s why OpenAI is calling for ideas from the community.
To facilitate this, they are launching a grant program of $10 million to support research on superalignment. Portions of this grant will be reserved for academic institutions, non-profit organizations, individual researchers, and graduate students. OpenAI also plans to host an academic conference on superalignment in 2025, where they will share and promote the work of the grant recipients.
Interestingly, some of the funding for this grant program will come from former Google CEO and chairman, Eric Schmidt. Schmidt, a strong supporter of Altman, is becoming a prominent figure in the AI community with his doomerism perspective – he believes that dangerous AI is upon us, and regulators are not prepared enough. However, some speculate that Schmidt’s motivations may not be entirely altruistic. It has been reported that he stands to benefit significantly from the US government implementing his proposed plan to boost AI research.
“AI and other emerging technologies are transforming our society and economy. Ensuring they align with our human values is crucial, and I am proud to support OpenAI’s grant program in promoting responsible research for the benefit of the public.” – Eric Schmidt
As a result, his involvement in the grant program may be seen as virtue signaling by some. However, OpenAI has assured that all research, including code, will be shared publicly, and this applies to the work of grant recipients as well.
“Contributing not just to the safety of our models, but to the safety of all advanced AI is part of our mission. It is at the core of our goal to build AI that benefits all of humanity safely. We believe that conducting this research is crucial to achieving a positive and safe future with AI.” Aschenbrenner stated.
With the involvement of a figure with clear commercial motivations, it raises questions about whether OpenAI’s research will be made available for public use. However, the Superalignment team remains committed to their mission of promoting the safe development of AI for the benefit of all.