Text Analysis
The Text Analysis API is the core of PII Eraser. It allows you to process raw strings to either simply Detect where PII is located or Transform (anonymize) it.
Transformation Operators
When using the /text/transform endpoint, you must select an operator. This determines how the detected PII is modified.
| Operator | Description | Example Input | Example Output |
|---|---|---|---|
redact |
Replaces the entity with its entity type. | Call John |
Call <NAME> |
mask |
Replaces characters with a symbol (default #). |
ID: 123-456 |
ID: ###-### |
hash |
Replaces the entity with a SHA-256 (or SHA-512) hash. | John |
a591a6... |
redact_constant |
Replaces the entity with a static string. | Call John |
Call <REDACTED> |
Configuring Operators
Operators can be fine-tuned in your config.yaml. For example, you can change the masking character from # to *, or set the hash salt. See Startup Arguments for details.
Detection Only
If you do not need to alter the text but simply want to know what is inside it (e.g., for analytics or flagging), use the /text/detect endpoint.
import requests
payload = {
"text": ["My Tax ID is 99-999/9999"],
"entity_types": ["TAX_ID"] # Optional: Limit detection to specific types
}
r = requests.post("http://localhost:8000/text/detect", json=payload)
print(r.json()['entities'])
Response: