Skip to content

Customization & Rules

PII Eraser works out-of-the-box for general use cases, but regulated industries often require fine-tuning to handle domain-specific terminology.

Allow Lists (Whitelisting)

Sometimes, the model might be too aggressive. For example, it might mistake a project code name for a person's name, or a generic business term for an organization.

You can add terms to the allow_list in config.yaml or per-request. Matches are case-insensitive.

# config.yaml
allow_list:
  - "Amazon"  # Don't redact Amazon
  - "Python"
  - "Gemini"

Block Lists (Custom Entities)

You can force specific terms to be detected as specific entity types using the block_list. This is useful for internal project names, SKU codes, or competitors.

# config.yaml
block_list:
  PROJECT_CODENAME:
    - "Project Apollo"
    - "Operation Titan"
  COMPETITOR:
    - "Acme Corp"

Confidence Thresholds

Every detected entity has a confidence score (0.0 to 1.0).

  • Lowering the score_threshold increases recall (catches more PII) but may increase false positives.
  • Raising the threshold reduces false positives but might miss obscure PII.

The default threshold is optimized for general use, but you can override it in the configuration or in the API request body.

```