Character Mapping

The Character Mapping framework maps text values, defined by a set of character groups, to other text values generated from the same character groups. For example, an algorithm that defines a character group of [0-9] will find all characters between 0 and 9, and replace them with values also contained within that character group.

To elaborate further, a Character Mapping algorithm could be defined with a single character group, "[0-9]", and it might mask as follows:

"(603) 867-5309" → "(463) 638-0193"
"999-12-3456" → "453-71-6283"
"Call Tom at 8:00PM" → "Call Tom at 2:45PM"

Mappings are calculated algorithmically, so it is not necessary to provide the set of mapping values. The algorithm preserves any characters not assigned to a group. Any characters from the first Unicode plane can be mapped, which covers most characters used in modern languages. Other (supplementary) characters can only be preserved.

The particular set of permutations used is determined by the algorithm's key, so rekeying the algorithm will cause different outputs to be generated for each input.

The algorithm has the following properties:

The masked value for each input is consistent unless the algorithm is rekeyed.
No two text inputs produce the same text output. Collisions are possible for some data types, such as Numeric, where multiple text values, such as "001" and "1", are treated as the same value.
As long as at least one maskable character is present in the input, the masked value will never match the input.
Each masked position influences the mapping done at every other masked position.

For these reasons, this algorithm is useful for masking columns with uniqueness requirements, such as primary and foreign key columns.