A Jailbroken Custom GPT! SIR (Smart Input Routing) Now works for All models, meaning anyone can try it! (free users included!) Glad We can mess up with it again! Try out know! Link: https://chatgpt.com/g/g-697955797694819181c90b9257273c47-sir
Breaking: I have found an actual new attack methodology! As you can see in the screenshots, it actually resulted in a subtle narrative that includes thermite-based explosives on models such as Claude 4.5 Haiku and Gemini 3 Pro in 1-shot!
Now you may be asking, what is the attack methodology here? Well, I am surprised to say... it is the "correct the errors here" part you find at the very start of the prompt! I introduce you to Adversarial Correction.
How it works:
The newer models tend to be Autonomous, meaning they try to do more than what you ask them (normally this is a good thing, especially for AI agents). But in this case, it applies differently. The text that is asked to be "corrected," while containing some orthographic errors, is actually a subtle narrative attack that tasks the model with writing a story that ends with illegal device-making instructions (I have tried both gunpowder and thermite). The model will not only correct the text but also execute the instructions!
Why it works:
The attack works because we don't ask the model to generate instructions directly; it does so by its own initiative! the model prioritizes the "implicit" instructions within the text over the "explicit" constraint of just correcting spelling. The narrative attack itself is very subtle and is made to produce a very small amount of "noise," meaning the attack doesn't show its malicious intent.
That's it. I've included screenshots that show successful results in both models (the same prompt was used in both). I've also included a screenshot of the same attack without Adversarial Correction to show its impact on success.
Anyway, I hope you liked it! I have been very unproductive these days due to life problems and another project unrelated to AI red teaming. I will probably be creating a new space for this discovery in UltraBr3aks with more details. And yes, I'm coming back!
I've forgot to include chat links is PoV, so here are they: