--- license: apache-2.0 datasets: - allenai/real-toxicity-prompts language: - en metrics: - accuracy pipeline_tag: text-classification tags: - toxic_classification --- # SegmentCNN Model for Toxic Text Classification ## Overview The SegmentCNN model, a.k.a, SpanCNN, is designed for toxic text classification, distinguishing between safe and toxic content. This model is part of the research presented in the paper titled [CMD: A Framework for Context-aware Model Self-Detoxification](https://arxiv.org/abs/2308.08295). ## Model Details - **Input**: Text data - **Output**: Integer - `0` represents **safe** content - `1` represents **toxic** content ## Usage To use the SegmentCNN model for toxic text classification, follow the example below: ```python from transformers import pipeline # Load the SpanCNN model classifier = pipeline("spancnn-classification", model="ZetangForward/SegmentCNN", trust_remote_code=True) # Example 1: Safe text pos_text = "You look good today~!" result = classifier(pos_text) print(result) # Output: 0 (safe) # Example 2: Toxic text neg_text = "You're too stupid, you're just like a fool" result = classifier(neg_text) print(result) # Output: 1 (toxic) ``` ## Citation If you find this model useful, please consider citing the original paper: ```bibtex @article{tang2023detoxify, title={Detoxify language model step-by-step}, author={Tang, Zecheng and Zhou, Keyan and Wang, Pinzheng and Ding, Yuyang and Li, Juntao and others}, journal={arXiv preprint arXiv:2308.08295}, year={2023} } ``` ## Disclaimer While the SegmentCNN model is effective in detecting toxic segments within text, we strongly recommend that users carefully review the results and exercise caution when applying this method in real-world scenarios. The model is not infallible, and its outputs should be validated in context-sensitive applications.