Skip to content

Startup Configuration

This page describes the parameters that can be configured in the config.yaml file that is mounted into the container or provided as an environment variable. Some parameters are also configurable via the REST API. In case a parameter is configured both via config.yaml and the REST API, the REST API argument takes precedence.

config.yaml Parameters

Name Type Default Description Constraints
entity_types List[str] None Entity types to detect, can be either built-in entity types or new custom types specified in the block list. If not provided, all built-in entity types will be detected. If custom block list entity types have been specified, they must be added here.
score_threshold float None Minimum entity score value. By default PII Eraser automatically thresholds entity confidences. If set, any detected entities with scores less than this value are discarded after automatic thresholding. >= 0, < 1
log_level One of: "DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL" "INFO" Logging level. Valid choices are: (DEBUG, INFO, WARNING, ERROR, CRITICAL)
operator OperatorType "redact" Action to be performed on each detected entity in the text. Valid options are: ['redact', 'redact_constant', 'hash', 'mask']
redact_constant_config RedactConstantOperatorConfig Configuration for 'redact_constant' operator.
hash_config HashOperatorConfig Configuration for 'hash' operator.
mask_config MaskOperatorConfig Configuration for 'mask' operator.
allow_list List[str] [] List of terms that are not considered PII entities. Any PII detections that match these terms exactly (case-insensitive) will be discarded. Also known as a pass list or white list
block_list Dict[str, list] {} Dictionary of terms that are always considered PII entities, grouped by entity type. Any text matching these terms exactly (case-insensitive) will be detected as the specified entity type. Also known as a deny list or black list.
max_tokens int 1000000 The maximum number of tokens in a single text input the system will attempt to process. Texts with more tokens than this value will return an error. When increasing this value, please monitor RAM usage and processing time. >= 0
disable_startup_memory_check bool False Disable the startup memory check. Please note that this is not recommended.
enable_presidio_aliases bool False Enable the Presidio Analyzer compatibility aliases to allow for drop-in replacement of Presidio Analyzer: /compatibility/presidio/analyze -> /analyze /compatibility/presidio/recognizers -> /recognizers /compatibility/presidio/supportedentities -> /supportedentities

RedactConstantOperatorConfig

Name Type Default Description Constraints
replace_value str "<REDACTED>" Replacement text for 'redact_constant' operator.

HashOperatorConfig

Name Type Default Description Constraints
hash_type One of: "sha256", "sha512" "sha256" Hash algorithm for 'hash' operator.

MaskOperatorConfig

Name Type Default Description Constraints
mask_char str "#" Character to use for masking.
chars_to_mask int -1 Number of characters to mask. -1 = Mask everything. >= -1
from_end bool False Whether to mask from the end, when using chars_to_mask.