SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models Paper โข 2502.12464 โข Published 7 days ago โข 27
Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems Paper โข 2410.13334 โข Published Oct 17, 2024 โข 13
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models Paper โข 2410.01524 โข Published Oct 2, 2024 โข 3