UW-NSL

SafeDecoding Public

Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding

Jupyter Notebook 134 11

ArtPrompt Public

[ACL24] Official Repo of Paper `ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs`

Python 73 16

ChatBug Public

[AAAI25] Official Repo of Paper `ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates`

Python 8

CleanGen Public

[EMNLP 24] Official Implementation of CLEANGEN: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models

Python 15 2

safechain Public

SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities

Python 17 2

TinyV Public

Your efficient and accurate answer verification system for RL training.

Python 30 1

Provide feedback