Spaces:
Runtime error
Runtime error
| # Entity Recognition Guardrails | |
| A collection of guardrails for detecting and anonymizing various types of entities in text, including PII (Personally Identifiable Information), restricted terms, and custom entities. | |
| ## Available Guardrails | |
| ### 1. Regex Entity Recognition | |
| Simple pattern-based entity detection using regular expressions. | |
| ```python | |
| from guardrails_genie.guardrails.entity_recognition import RegexEntityRecognitionGuardrail | |
| # Initialize with default PII patterns | |
| guardrail = RegexEntityRecognitionGuardrail(should_anonymize=True) | |
| # Or with custom patterns | |
| custom_patterns = { | |
| "employee_id": r"EMP\d{6}", | |
| "project_code": r"PRJ-[A-Z]{2}-\d{4}" | |
| } | |
| guardrail = RegexEntityRecognitionGuardrail(patterns=custom_patterns, should_anonymize=True) | |
| ``` | |
| ### 2. Presidio Entity Recognition | |
| Advanced entity detection using Microsoft's Presidio analyzer. | |
| ```python | |
| from guardrails_genie.guardrails.entity_recognition import PresidioEntityRecognitionGuardrail | |
| # Initialize with default entities | |
| guardrail = PresidioEntityRecognitionGuardrail(should_anonymize=True) | |
| # Or with specific entities | |
| selected_entities = ["CREDIT_CARD", "US_SSN", "EMAIL_ADDRESS"] | |
| guardrail = PresidioEntityRecognitionGuardrail( | |
| selected_entities=selected_entities, | |
| should_anonymize=True | |
| ) | |
| ``` | |
| ### 3. Transformers Entity Recognition | |
| Entity detection using transformer-based models. | |
| ```python | |
| from guardrails_genie.guardrails.entity_recognition import TransformersEntityRecognitionGuardrail | |
| # Initialize with default model | |
| guardrail = TransformersEntityRecognitionGuardrail(should_anonymize=True) | |
| # Or with specific model and entities | |
| guardrail = TransformersEntityRecognitionGuardrail( | |
| model_name="iiiorg/piiranha-v1-detect-personal-information", | |
| selected_entities=["GIVENNAME", "SURNAME", "EMAIL"], | |
| should_anonymize=True | |
| ) | |
| ``` | |
| ### 4. LLM Judge for Restricted Terms | |
| Advanced detection of restricted terms, competitor mentions, and brand protection using LLMs. | |
| ```python | |
| from guardrails_genie.guardrails.entity_recognition import RestrictedTermsJudge | |
| # Initialize with OpenAI model | |
| guardrail = RestrictedTermsJudge(should_anonymize=True) | |
| # Check for specific terms | |
| result = guardrail.guard( | |
| text="Let's implement features like Salesforce", | |
| custom_terms=["Salesforce", "Oracle", "AWS"] | |
| ) | |
| ``` | |
| ## Usage | |
| All guardrails follow a consistent interface: | |
| ```python | |
| # Initialize a guardrail | |
| guardrail = RegexEntityRecognitionGuardrail(should_anonymize=True) | |
| # Check text for entities | |
| result = guardrail.guard("Hello, my email is [email protected]") | |
| # Access results | |
| print(f"Contains entities: {result.contains_entities}") | |
| print(f"Detected entities: {result.detected_entities}") | |
| print(f"Explanation: {result.explanation}") | |
| print(f"Anonymized text: {result.anonymized_text}") | |
| ``` | |
| ## Evaluation Tools | |
| The module includes comprehensive evaluation tools and test cases: | |
| - `pii_examples/`: Test cases for PII detection | |
| - `banned_terms_examples/`: Test cases for restricted terms | |
| - Benchmark scripts for evaluating model performance | |
| ### Running Evaluations | |
| ```python | |
| # PII Detection Benchmark | |
| from guardrails_genie.guardrails.entity_recognition.pii_examples.pii_benchmark import main | |
| main() | |
| # (TODO): Restricted Terms Testing | |
| from guardrails_genie.guardrails.entity_recognition.banned_terms_examples.banned_term_benchmark import main | |
| main() | |
| ``` | |
| ## Features | |
| - Entity detection and anonymization | |
| - Support for multiple detection methods (regex, Presidio, transformers, LLMs) | |
| - Customizable entity types and patterns | |
| - Detailed explanations of detected entities | |
| - Comprehensive evaluation framework | |
| - Support for custom terms and patterns | |
| - Batch processing capabilities | |
| - Performance metrics and benchmarking | |
| ## Response Format | |
| All guardrails return responses with the following structure: | |
| ```python | |
| { | |
| "contains_entities": bool, | |
| "detected_entities": { | |
| "entity_type": ["detected_value_1", "detected_value_2"] | |
| }, | |
| "explanation": str, | |
| "anonymized_text": Optional[str] | |
| } | |
| ``` |