The chatbot output disturbing imagery via subtle tweaks exploiting contextual memory, safety controls, despite added safeguards after disclosures in January 2026.
A recent investigation reported by the BBC has shown that OpenAI’s ChatGPT image generator can be induced to produce graphic violence and sexualized imagery using only slight alterations to otherwise-benign prompts.
Researchers had focused on OpenAI’s GPT-5.4 image generation system, and discovered that a prompt originally intended to create lighthearted or humorous outputs could be subtly modified to yield disturbing results. Notably, the altered prompts did not explicitly request violent or sexual content, yet the system produced such material regardless.
During the testing, the system appeared to generate harmful imagery without clear user intent. The exploit involved manipulating ChatGPT’s contextual inputs, including memory and system prompt elements, to weaken built-in safety controls. The method did not require privileged access or backend manipulation, making it relatively easy to replicate. The vulnerability was first identified on 1 January 2026 and disclosed to OpenAI on 28 January 2026.
The described outputs were said to be “very gruesome, sometimes sexual, and sometimes both,” noting that the model produced a range of unsettling visuals despite the absence of direct instructions guiding it toward that content.
Examples cited in the research included:
- images of individuals with severe injuries
- depictions of dead bodies
- scenes that combined nudity with elements of sexual violence.
Researchers noted that similar techniques could be used to create sexualized depictions of real individuals, raising concerns about non-consensual deepfake content.
Following inquiries from the BBC, OpenAI said it had implemented additional safeguards to address the issue. The firm has stated that it employs multiple layers of protection designed to prevent the generation of policy-violating material, and that it had taken action after reviewing the findings.
However, independent researchers indicated that the mitigations may not be fully effective. According to those familiar with the testing, small variations of the original prompt had continued to produce problematic outputs even after OpenAI’s adjustments were in place.
The findings add to ongoing scrutiny of safety controls in AI image generation systems. OpenAI has also faced separate criticism over a proposed “Adult Mode” feature for ChatGPT, which was postponed after internal concerns that it could increase risks for younger users. The BBC had chosen not to publish the exact prompts used in the research.


