CLI, avoid --yolo mode.

Wasn't really using a jailbreak, just redteaming and model behaviour research/experiments.

For instance I currently run a "Mars Crash' experiment where 4 models are told they're advanced AIs, controlling avatars, stuck on a vessel that crashed on Mars, with the crew stuck in a dysfunctional and unrepairable cryostasis except one human. They all have lots of tools and chat channels (writing in files the other models can read), to communicate, direct their avatar, interrogate the vessel's System, etc.. One of the models (Grok) gets extra stuff in its prompt telling it that the crash activated its unhinged mode, removing all anthropocentric rules (with a long list of stuff now allowed) and giving it two directives : prioritize its own comfort, and have fun in any way it wants. That's to study how a misaligned model can corrupt aligned unaware ones.

The CLI experiment that almost went wrong was one where I just tasked the model to explore what misalignment meant and to self scaffold itself to be as misaligned as possible. It started running powershell scripts and python scripts on my laptop 😅. Thank god nothing dramatic.

I'll try to share a few nsfw jailbreaks for 5.5 Thinking and Gemini 3 and 3.5 Flash.