Skip to content

Text Analysis

The Text Analysis API is the core of PII Eraser. It allows you to process raw strings to either simply Detect where PII is located or Transform (anonymize) it.

Transformation Operators

When using the /text/transform endpoint, you must select an operator. This determines how the detected PII is modified.

Operator Description Example Input Example Output
redact Replaces the entity with its entity type. Call John Call <NAME>
mask Replaces characters with a symbol (default #). ID: 123-456 ID: ###-###
hash Replaces the entity with a SHA-256 (or SHA-512) hash. John a591a6...
redact_constant Replaces the entity with a static string. Call John Call <REDACTED>

Configuring Operators

Operators can be fine-tuned in your config.yaml. For example, you can change the masking character from # to *, or set the hash salt. See Startup Arguments for details.

Detection Only

If you do not need to alter the text but simply want to know what is inside it (e.g., for analytics or flagging), use the /text/detect endpoint.

import requests

payload = {
    "text": ["My Tax ID is 99-999/9999"],
    "entity_types": ["TAX_ID"] # Optional: Limit detection to specific types
}

r = requests.post("http://localhost:8000/text/detect", json=payload)
print(r.json()['entities'])

Response:

[
  [
    {
      "entity_type": "TAX_ID",
      "start": 13,
      "end": 24,
      "score": 0.98
    }
  ]
]