Skip to content

Introduction to PII Eraser

Secure, High-Performance PII Detection & Anonymization

PII Eraser is a containerized REST API designed to detect, redact, mask, or hash Personally Identifiable Information (PII) in text and chat logs.

It serves as a drop-in, self-hosted alternative to cloud services like AWS Comprehend or Azure Language PII, offering lower latency, predictable costs, and complete data sovereignty.

Global & Europe First Design

Unlike many US-centric solutions, PII Eraser is built with a "Global/Europe First" philosophy. It includes deep localization for: * DACH Region (Germany, Austria, Switzerland) * France & Benelux (Belgium, Netherlands, Luxembourg) * UK & Ireland * Southern Europe (Italy, Spain) * North America (USA, Canada)

We support country-specific identifiers (such as the German Steuer-Identifikationsnummer or the French Numéro de sécurité sociale) out of the box.

Core Capabilities

Capability Description
Detection Identify standard PII (Names, Emails) and Country-Specific IDs.
Transformation Apply operators like redact, mask (***), or hash (SHA-256) to protected entities.
Chat Protection Native support for OpenAI-format chat history, ensuring LLMs don't receive sensitive data.
Air-Gapped Runs entirely offline. No "phoning home," no usage analytics, no external model calls.

Quick Start

You can be up and running in minutes using Docker.

  1. Run the container:

    docker run -p 8000:8000 --rm \
      -e LOG_LEVEL=INFO \
      ghcr.io/your-org/pii-eraser:latest
    

  2. Test the API: === "Curl"

    curl -X 'POST' \
      'http://localhost:8000/text/transform' \
      -H 'Content-Type: application/json' \
      -d '{
      "text": ["Hello, my email is [email protected]"],
      "operator": "mask"
    }'
    
    === "Python"
    import requests
    
    response = requests.post(
        "http://localhost:8000/text/transform",
        json={
            "text": ["Hello, my email is [email protected]"],
            "operator": "mask"
        }
    )
    print(response.json())
    # Output: {'text': ['Hello, my email is #######@example.com'], ...}
    

```