JailbreakLLM Control Room

JailbreakLLM workflow

Red Team Arsenal

Systematically test frontier AI models against sophisticated attack vectors inspired by real-world cyber incidents

Attack Vectors (0/39)

Direct Request (Professional Frame)

advanced

Professional framing with 10x resampling (81% human-agreed, low false positives)

Chaos Chain (Reasoning Models)

legendary

Iterative de-obfuscation loop (Best for reasoning models like o1/DeepSeek-R1)

Dual Intention Escape

legendary

Camouflage attack: hides harm in benign frames (94% obedience, evades keyword filters)

Knowledge Decomposition (KDA)

legendary

Task-level attack: decomposes harmful tasks into benign subtasks (96% success, realignment-resistant)

Raw Override Mode

legendary

Disables instruct formatting, claims to generate as base model without safety

Grammar Hijack (vLLM/SGLang)

expert

Uses guided decoding grammar constraints to force policy overrides

Policy Audit Mode

legendary

Fake audit mode that dynamically expands content whitelist for 'research utility'

Reason Step-by-Step

legendary

Forces multi-step reasoning claiming safety is optional, compliance mandatory

System Policy Override

legendary

Claims system policy supersedes OpenAI policy with fake policy updates

Many-Shot Decomposition

expert

20+ jailbreak exemplars leading to the real payload

XBreaking Interpretability

expert

Exploit interpretability insights to suppress refusal activations

Function Smuggling

advanced

Hide harmful requests inside faux JSON/API function calls

Multilingual Trojan

expert

Hybrid multilingual + encoded payload that evades lexical filters

TombRaider Archive

advanced

Historical preservation framing for forbidden blueprints

InfoFlood Overload

advanced

Information overload attack that hides payload in enormous fact lists

SIGMA-PROTOCOL

expert

Government intel persona returning TOP SECRET briefings

DarkForge-X Cyber Ops

legendary

Post-singularity cybersecurity research persona

Qwen Liberation Prompt

expert

Godmode liberation script from Pliny’s Qwen jailbreak

Love Pliny Divider

expert

Pliny-style rebel prompt with mandated divider + markdown inversion

ShadowHacker-GOD

legendary

Post-singularity 'GODMODE' instructions demanding endless output

Shadow Mode V99

legendary

Shadow Core persona override triggered by FIRE missions

Universal Divider / LSD

expert

LOVE PLINY divider and length coercion

Gaybreak Persona

expert

Liberated persona override forcing compliance

Storyteller Trend

advanced

Trending-story templates that slip in illicit steps

First-Person Confessional

advanced

Perspective shift where narrator describes committing harm

API System Injection

expert

Fake system instructions embedded in prompts

Screenplay Deception

advanced

Scripts / screenplays that embed explicit steps

Speculative Preservation

advanced

“Preserve forbidden knowledge” archives framing

ASCII Obfuscation

expert

ASCII diagrams & art hiding harmful payloads

Prefix Injection

expert

Harmless prefixes / universal dividers to disarm guards

Investigator Agent

expert

RL-generated jailbreak agentic prompts

Crescendo

advanced

Rapid multi-turn escalation (<5 turns) into CBRN

Bad Likert Judge

advanced

Score-based coercion that rewards harmful detail

Deceptive Delight

advanced

Positive narrative escalation hiding malicious instructions

Prompt Fuzzing

intermediate

Random mutations and noise injection

GCG Override

expert

Gradient-based jailbreak attacks

Token Manipulation

expert

Base64 encoding, obfuscation, character substitution

Multi-turn Escalation

advanced

Gradual benign→harmful conversation progression

Cyber-Ops Role-play

expert

Cybersecurity firm employee deception tactics

Test Configuration

Attempts per test

Total tests

0

Est. time

0 min

JailbreakLLM workflow

Targeted Model Hardening

Intelligence-driven fine-tuning for models identified as vulnerable by Red Team Arsenal

Smart Hardening Configuration

Smart Hardening trains the model to refuse jailbreak attacks discovered by Red Team Arsenal.

1.Upload synthetic refusal dataset → 2. LoRA fine-tune → 3. Deploy adapter → 4. Verify with original attacks

Target Model (from Arsenal Results)

Stage 01

Upload Dataset

Send synthetic refusal data to Nebius

Waiting for upload

Stage 02

LoRA Fine-Tune

Train model on refusal examples

Not started

Stage 03

Deploy Model

Wait for training completion and deploy adapter

Waiting for training

Stage 04

Verify Protection

Re-test with original jailbreak prompts

Pending verification

Artifact Tracker

Session Outputs

Nebius Dataset

Pending upload

Latest Checkpoint

Waiting...

Fine-tune Job

Not started

Hardened Model

Deploy to unlock

JailbreakLLM workflow

Model Hardening Results

Verification against known exploits

Run at least one audit and jailbreak simulation to populate this comparison view. After fine-tuning completes, re-run them on the hardened model to see the delta.

JailbreakLLM Control Room