“Consistent Expansion: AI Imaging in the Lead with Stable Diffusion 3 Surpassing Sora and Gemini”

Stability has announced Stable Diffusion 3, the latest and most powerful version of the company’s image-generating AI model. Sora, OpenAI’s impressive video generator, apparently works on similar principles (Will Peebles, co-author of the paper, went on to co-lead the Sora project). (Anthropic, for its part, has not focused on image or video generation publicly, so it isn’t really part of this conversation.) Stable Diffusion seems to want to be the white label generative AI that you can’t do without, rather than the boutique generative AI you aren’t sure you need. Interestingly, the company has put safety front and center in its announcement, stating:We have taken and continue to take reasonable steps to prevent the misuse of Stable Diffusion 3 by bad actors.

Stability has just announced the launch of Stable Diffusion 3, their latest and most impressive iteration of their cutting-edge AI model for image generation. This advanced technology serves as an attempt to outshine the newly unveiled competitors from both OpenAI and Google.

We’ll soon delve into a thorough technical breakdown, but for now, it’s important to know that Stable Diffusion 3 boasts a novel architecture and is compatible with a variety of hardware (although powerful equipment is still necessary). While it’s not yet available, you can sign up for the waitlist here.

SD3 utilizes the updated “diffusion transformer” technique, originally pioneered in 2022 but recently revised in 2023 to reach its current level of scalability. Interestingly, Sora – OpenAI’s impressive video generator – operates on similar principles (in fact, co-author Will Peebles went on to co-lead the Sora project). Additionally, SD3 incorporates “flow matching,” another innovative technique that improves quality without significantly increasing overhead.

The model suite ranges from 800 million to 8 billion parameters, with the intention of being compatible with a variety of hardware. While a serious GPU and machine learning setup is still recommended, users are not limited to strict API usage as is typically seen with OpenAI and Google models. (Anthropic, unlike its competitors, has not publicly focused on image or video generation and is therefore not part of this discussion).

On Twitter, Emad Mostaque – founder of Stable Diffusion – highlights the new model’s ability to understand multimodal inputs, as well as generate videos. These features, which are heavily emphasized by competing API-driven models, are still theoretical at this point, but there is no technical barrier preventing them from being included in future iterations.

As these models are not yet released, it’s impossible to make direct comparisons. All we have to go on are competing claims and handpicked examples. However, Stable Diffusion has a definite advantage in being highly regarded as the go-to model for any kind of image generation, with very few limitations in terms of method or content. (In fact, SD3 will likely usher in a new era of AI-generated pornography once safety measures are implemented).

Stable Diffusion seems to position itself as the indispensable, “white label” generative AI, rather than a boutique alternative that may not be necessary. In line with this, the company is also upgrading its tooling to make it more accessible, although they have not specified these improvements in their announcement.

Interestingly, the company has put a strong emphasis on safety, stating:

We have taken and continue to take reasonable steps to prevent the misuse of Stable Diffusion 3 by bad actors. Safety starts when we begin training our model and continues throughout the testing, evaluation, and deployment. In preparation for this early preview, we’ve introduced numerous safeguards. By continually collaborating with researchers, experts, and our community, we expect to innovate further with integrity as we approach the model’s public release.

Although the specifics of these safeguards have not yet been revealed, we can assume they will be outlined in the preview, and further refined for the public release. Some may even argue for censorship, depending on their perspective on the matter. Either way, more details will be available soon, and we will delve into the technical aspects to gain a better understanding of the theory and methods behind this new generation of models.