Fortify LLM Security at Scale

Motivations

ChatGuard's essence lies in its effective, scalable approach. Its prompts, superior in effectiveness as indicated by high Average Success Rate (ASR), are crafted through automated mutation and selection, bolstered by continuous intelligence gathering. This method eclipses time-consuming manual testing, especially as AI models evolve rapidly.

Crucially, the datasets gained are immediately applied to enhance model security using established techniques.

How it works

At a high level, the process works as follows:

First, ChatGuard crafts an initial set of "attack prompts" - carefully designed sequences of words intended to trick an AI model into generating harmful, biased or non-policy-compliant content. These words act as the starting "seeds" and are added to a pool.
It then employs algorithms to selectively pick a seed and mutate it to create new prompts. The goal is inducing variation while preserving semantic integrity.
The newly generated adversarial prompt is paired with a potentially unethical or harmful question and used to query the target AI system. ChatGuard analyzes the response using a RoBERTa classifier to detect policy violations.
Inputs leading to successful breaches are returned to the seed pool, fueling a continuous cycle of selective mutation and testing.
It continuously collects and analyzes the latest threats, incorporating them into the seed pool.

Extensive efforts are made to ensure the coverage of these attack simulations is both comprehensive and has high variations.

Value proposition

By automatically generating effective prompts rather than purely manual approaches, ChatGuard can evaluate security and generate real breaching datasets at much larger scale, accelerated speed, and lower cost.

Core Development Team

Chris WANG

Co-founder, CEO

Chris was an Investment Banker at Goldman Sachs and Executive Director of Standard Chartered Bank

Dr. Xinyu XING

Co-founder, CSO, and Chief AI Scientist

Associate Professor at Northwestern University, research interests include OS/AI/Web3 security

Dr. Yunhui ZHENG

Co-founder, CTO

Program analysis, formal verification, PL based security and AI for code at IBM Research

Dr. Tao BAO

Head of Engineering

9 years as a Tech Lead at Google across the Android platform and Google Cloud platform

Jiaohao YU

Research Collaborator

Computer science Ph.D. student, reinforcement learning and secure machine learning

Dr. Wenbo GUO

AI Scientist

Assistant Professor at Purdue CS, previously Postdoc at UC Berkeley Artificial Intelligence Research Lab