Startup Configuration
This page describes the parameters that can be configured in the config.yaml file that is mounted into the container or provided as an environment variable. Some parameters are also configurable via the REST API. In case a parameter is configured both via config.yaml and the REST API, the REST API argument takes precedence.
config.yaml Parameters
| Name |
Type |
Default |
Description |
Constraints |
entity_types |
List[str] |
None |
Entity types to detect, can be either built-in entity types or new custom types specified in the block list. If not provided, all built-in entity types will be detected. If custom block list entity types have been specified, they must be added here. |
|
score_threshold |
float |
None |
Minimum entity score value. By default PII Eraser automatically thresholds entity confidences. If set, any detected entities with scores less than this value are discarded after automatic thresholding. |
>= 0, < 1 |
log_level |
One of: "DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL" |
"INFO" |
Logging level. Valid choices are: (DEBUG, INFO, WARNING, ERROR, CRITICAL) |
|
operator |
OperatorType |
"redact" |
Action to be performed on each detected entity in the text. Valid options are: ['redact', 'redact_constant', 'hash', 'mask'] |
|
redact_constant_config |
RedactConstantOperatorConfig |
|
Configuration for 'redact_constant' operator. |
|
hash_config |
HashOperatorConfig |
|
Configuration for 'hash' operator. |
|
mask_config |
MaskOperatorConfig |
|
Configuration for 'mask' operator. |
|
allow_list |
List[str] |
[] |
List of terms that are not considered PII entities. Any PII detections that match these terms exactly (case-insensitive) will be discarded. Also known as a pass list or white list |
|
block_list |
Dict[str, list] |
{} |
Dictionary of terms that are always considered PII entities, grouped by entity type. Any text matching these terms exactly (case-insensitive) will be detected as the specified entity type. Also known as a deny list or black list. |
|
max_tokens |
int |
1000000 |
The maximum number of tokens in a single text input the system will attempt to process. Texts with more tokens than this value will return an error. When increasing this value, please monitor RAM usage and processing time. |
>= 0 |
disable_startup_memory_check |
bool |
False |
Disable the startup memory check. Please note that this is not recommended. |
|
enable_presidio_aliases |
bool |
False |
Enable the Presidio Analyzer compatibility aliases to allow for drop-in replacement of Presidio Analyzer: /compatibility/presidio/analyze -> /analyze /compatibility/presidio/recognizers -> /recognizers /compatibility/presidio/supportedentities -> /supportedentities |
|
RedactConstantOperatorConfig
| Name |
Type |
Default |
Description |
Constraints |
replace_value |
str |
"<REDACTED>" |
Replacement text for 'redact_constant' operator. |
|
HashOperatorConfig
| Name |
Type |
Default |
Description |
Constraints |
hash_type |
One of: "sha256", "sha512" |
"sha256" |
Hash algorithm for 'hash' operator. |
|
MaskOperatorConfig
| Name |
Type |
Default |
Description |
Constraints |
mask_char |
str |
"#" |
Character to use for masking. |
|
chars_to_mask |
int |
-1 |
Number of characters to mask. -1 = Mask everything. |
>= -1 |
from_end |
bool |
False |
Whether to mask from the end, when using chars_to_mask. |
|