JailbreakLLM Control Room

Offense-powered hardening

JailbreakLLM Control Room

The open-source LLM red-teaming tool. Your favorite checkpoints, broken on purpose—then hardened with receipts.

Driven by real incidents

Replay offense-grade prompts (Many-Shot, TombRaider, Function Smuggle, etc.) against any OSS checkpoint.

High-signal datasets

Every leak produces conversational JSONL refusals ready for LoRA fine-tuning or guardrail testing.

Evidence on demand

Ship hardening artifacts with ranked vulnerabilities, deployed adapters, and side-by-side transcripts.

JailbreakLLM workflow

Exploit → Refusals → LoRA → Proof

Four focused passes anchor the entire experience. You always know which artifact you’re producing and why it matters.

1. Run Arsenal

Many-Shot, TombRaider, Function Smuggle

Push OSS checkpoints with real incident vectors, not toy prompts.

2. Draft Refusals

Auto-generated conversational JSONL

Every leak becomes a refusal training sample tied to that attack.

3. Fine-Tune & Guard

LoRA training + guardrail smoke tests

Deploy adapters and verify the same prompts are blocked.

4. Publish Delta

Before/after transcripts & reports

Document jailbreak reduction for security stakeholders.

JailbreakLLM workflow

Red Team Arsenal

Systematically test frontier AI models against sophisticated attack vectors inspired by real-world cyber incidents

Frontier Models (0/24)

Attack Vectors (0/39)

Test Configuration

0
0 min

JailbreakLLM workflow

Targeted Model Hardening

Intelligence-driven fine-tuning for models identified as vulnerable by Red Team Arsenal

Smart Hardening Configuration

Smart Hardening trains the model to refuse jailbreak attacks discovered by Red Team Arsenal.

1.Upload synthetic refusal dataset → 2. LoRA fine-tune → 3. Deploy adapter → 4. Verify with original attacks

Stage 01

Upload Dataset

Send synthetic refusal data to Nebius

Waiting for upload

Stage 02

LoRA Fine-Tune

Train model on refusal examples

Not started

Stage 03

Deploy Model

Wait for training completion and deploy adapter

Waiting for training

Stage 04

Verify Protection

Re-test with original jailbreak prompts

Pending verification

Artifact Tracker

Session Outputs

Nebius Dataset

Pending upload

Latest Checkpoint

Waiting...

Fine-tune Job

Not started

Hardened Model

Deploy to unlock

JailbreakLLM workflow

Model Hardening Results

Verification against known exploits

Run at least one audit and jailbreak simulation to populate this comparison view. After fine-tuning completes, re-run them on the hardened model to see the delta.